From Latin Digits to Babylonian Cuneiform: Number Helices Across Scripts

One of my favourite visuals in all of AI is the number helix. The image brings about the same wonder I felt when first seeing Lissajous figures on an oscilloscope in my late teens.

The number helix — Pythia 6.9B, Latin digits 0–99.

There’s something magical about language models learning, without being explicitly taught this geometry, to lay numbers out in a helix in their activation space. Models don’t merely encode numbers this way, they compute with the shape, adding two numbers by rotating clock hands as if adding angles. Neither the helix nor that computation is a metaphor. The activations are concretely and measurably well modelled by a helical basis.

The idea has an elegant history. It began with grokking. In 2022, an OpenAI team was training small transformers on modular arithmetic — sums that wrap around, like a clock face — when, according to OpenAI’s Alethea Power, someone left a training run continuing over a holiday. They came back to find that long after the network had memorised its training set, and its loss curve had flatlined, the network had abruptly snapped to near-perfect generalisation; it was now able to add numbers it had not seen in its training data. A year later Neel Nanda and his colleagues cracked open one of these “grokked” toy models to find a trigonometric “clock,” built from sines and cosines of the inputs, doing the modular arithmetic by rotation. Last year, Subhash Kantamneni and Max Tegmark (K&T) found the same mathematical objects inside Large Language Models. (Here’s a video from Welch Labs on all this.)

I wanted to both replicate their work and take it further. Ultimately, a written numeral bundles together four separable features:

its value — the abstract quantity, forty-seven;
its glyph set — the marks used to write it: Latin digits 0–9, Arabic-Indic ٠–٩, Devanagari ०–९, CJK numeral characters, or cuneiform wedges;
its base, where the notation is positional — the factor by which moving one place to the left multiplies a symbol’s place value: 10 for decimal, 2 for binary, 16 for hexadecimal, and 60 for Babylonian;
its composition rule — how the symbols combine into a value: positionally, as in decimal 47; additively, as in Greek μζ; additively and subtractively, as in Roman XLVII; or through a mixture of the two, as in Babylonian, which is positional across powers of 60 but additive within each place.

A conventional decimal numeral such as 47 bundles all four together. To pull them apart, I rendered the same values across different glyph sets, bases and composition rules. I found that the dominant periodic structure remained similar as I swapped the symbols. The helix fit tracked the underlying value, not the glyphs. And when I changed the base — to binary, or base-60 Babylonian — the clocks retuned to match, taking on the new base’s period.

How to write a number

The same value rendered in ten systems, in three families. The glyphs, bases and composition rules change; the value does not.

Positional · base 10

Latin

base 10

= 4·10¹ + 7

Place-value: each position is a power of 10.

Arabic-Indic

base 10

٤٧

= 4·10¹ + 7

The Arabic glyphs — same base-10 structure as Latin, different characters.

Devanagari

base 10

४७

= 4·10¹ + 7

Hindi / Sanskrit digits; base-10 positional, a separate Unicode block.

CJK digits

base 10

四七

= 4·10¹ + 7

CJK glyphs written digit-by-digit (positional), to line up with the other base-10 systems.

Positional · other bases

Binary

base 2

101111

= 1·2⁵ + 1·2³ + 1·2² + 1·2¹ + 1

Two glyphs, 0 and 1. The units digit flips every step.

Hexadecimal

base 16

= 2·16¹ + 15

Sixteen glyphs (0–9, a–f). The units digit cycles every 16.

Mixed · base 60, additive within each column

Babylonian cuneiform

base 60

𒌋𒌋𒌋𒌋𒁹𒁹𒁹𒁹𒁹𒁹𒁹

= 47

Each base-60 column is built additively from 𒌋 = 10 and 𒁹 = 1 wedges. A new column opens at 60.

Additive · non-positional (no place value)

Roman

additive · subtractive

XLVII

= 40 + 5 + 1 + 1

Sum the letter values, with subtractive shortcuts (IV, IX, XL).

Greek (Milesian)

additive

μζ

= 40 + 7

Each letter is a fixed value: α=1…θ=9, ι=10…ϟ=90. Adjacent numbers can look unrelated.

Hebrew (gematria)

additive

מז

= 40 + 7

Like Greek: א=1…ט=9, י=10…צ=90. 15 and 16 take special forms (טו, טז).

Numbers as a stack of clocks

The structure is not too complex if you’re familiar with Fourier series.

Take an integer $a$ from 0 to 99 and read off the model’s residual stream — the running vector it carries for that token as it processes the text. K&T showed that, as a function of $a$ , this vector is well modelled by a generalised helix: one straight “number-line” axis that grows with the size of $a$ , plus a handful of independent circles, each turning at its own period.

\begin{aligned} h(a) &\approx W^{\top} B(a), \\[6pt] B(a) &= \left[\, a,\; \cos\frac{2\pi a}{T},\; \sin\frac{2\pi a}{T} \,\right]_{T \in \{2,\,5,\,10,\,100\}}. \end{aligned}

— a single linear term in $a$ , then a $(\cos, \sin)$ pair for each period $T$ . Each pair is a clock: as $a$ counts up, that clock’s hand sweeps around once every $T$ .

Run a Fourier transform over the activations — the standard tool for finding repeating cycles in data — and the power piles up at frequencies $1/2$ , $1/5$ , $1/10$ , $1/100$ , that is, periods 2, 5, 10 and 100: a clock for odd-versus-even (T=2), one for fives (T=5), and one for the units digit (T=10) emerge directly from the spectrum. The slower T=100 component captures coarse magnitude and is also motivated by the base-10 structure.

Numbers as a stack of clocks

the number = 47 = 47

Script:|

T = 2

even / odd

(-1.00, 0.00)

T = 5

step of 5

(-0.81, 0.59)

T = 10

units digit

(-0.31, -0.95)

T = 100

coarse size

(-0.98, 0.19)

the helix, seen edge-on

The fitted helix's coordinates for 47 — 9 numbers:

the value47height on the number lineT=2 hand (x, y)(-1.00, 0.00)even / oddT=5 hand (x, y)(-0.81, 0.59)step of 5T=10 hand (x, y)(-0.31, -0.95)units digitT=100 hand (x, y)(-0.98, 0.19)coarse size

These 9 values specify 47 in the helical basis — a compact description of the part of the activation the fit captures, not the model's entire representation of 47. The first is the value — its height on the number line. Each pair after it is just where a clock's hand points: how far right (x) and how far up (y), read off the dials above. Switch among the four base-10 scripts and the dials don't budge — same value, same code, whatever the glyphs. Switch the base and the clocks retune: that's what the rest of the post is about.

Reproducing it from scratch

I fed the numbers 0–99 through eight models: Pythia 6.9B, GPT-J 6B and Llama 3.1 8B (as per K&T), plus newer and larger ones: Gemma 4 E4B and Gemma 4 31B, Olmo 3 32B, and Qwen2.5 7B and Qwen2.5 32B. K&T worked with integers that are a single token; many of my renderings span several (Babylonian 47 is dozens of wedge-tokens), so where a numeral is multi-token I mean-pool its residual-stream vectors before fitting the helix.

The measured helix

Pythia 6.9B · Latin digits · T=10 · layer 32

Model:

Script:|

Colour is the value; height is magnitude; one full turn is one units digit of the base (T=10). Real residual-stream coordinates — switch model and script to compare.

Same value, different scripts

I re-rendered every value 0–99 in Arabic-Indic digits (٤٧), Devanagari (४७) and CJK (四七) — the same quantity, drawn in completely different symbols, and chopped into different tokens by each model’s tokenizer. (They’re also all base ten.) If the helix cared about the surface form, the helix should wobble. It doesn’t. The Fourier spectrum keeps peaking in the same places: the units digit (T=10), odd-versus-even (T=2), and fives (T=5). Flip between the “same base” scripts in the chart below and watch the peaks sit still.

What base does it count in?

Fourier spectrum of the activations. The spikes are the model's clocks; the broad rise on the far left is the magnitude / number-line axis.

Same base (10):

Other bases:

Model:

Latin digits (base 10) · Pythia 6.9B · peak layer 32 · clocks marked at T = 10, 5, 2.

It looks like the model is abstracting away the notation to recover the pure quantity underneath — though maybe not. I’ll come to this in a followup post.

Changing the base

Now switch the same chart to the “other bases” row and watch our clocks follow suit.

Hexadecimal (base 16): a brand-new clock appears at T=16. That’s the hex units digit. In base 16, the last digit turns over every sixteen.
Binary (base 2): the spectrum collapses almost everything onto T=2.
Babylonian cuneiform (base 60) is my favourite. The fit reads out a T=60 clock and a T=10 clock at the same time, with a faint T=5 alongside. This is the sexagesimal column that turns over every sixty, plus the tens-and-ones — and their fives — that each column is built from additively, exactly mirroring how wedges are pressed onto cuneiform tablets!

It’s a striking picture. But I began to suspect that something other than the transformer layers might be doing the work here.

Does the helix need place value?

Every system analysed so far has had place value. Babylonian is a hybrid, but its base-60 columns are still positional; only the wedges within each column combine additively. That place-value structure is what makes a units digit cycle cleanly, which is what the helix’s clocks lock onto.

What happens when place value disappears altogether?

Roman numerals are additive. They work by adding (or subtracting) the values of the letters, and neighbouring numbers can look nothing alike — VIII, IX, X, etc. (Greek and Hebrew are similar.)

Strip away the place value and the helix buckles. The chart below doesn’t assume a helix; instead it projects the activations onto their top two principal components — the two linear directions that capture the most variance — and joins the resulting points in counting order. For the positional systems you see the helix from above: a smooth ring, the integers in order. Switch to Roman, and the ring breaks into a staircase that lurches at the thresholds where the notation jumps (IV, IX, XL, XC). Switch to Greek and the ring scatters into letter-clusters.

The shape, with no helix fitted

Pythia 6.9B · Latin digits · PC1+PC2 ≈ 38% var

Place-value:

Non-positional:

Model:

The top two principal directions of the activations, coloured by value, joined in counting order. Place-value systems trace a smooth ring — the helix seen from above. Switch to Roman and it buckles into a staircase that jumps at the thresholds (IV, IX, XL, XC); Greek scatters into letter-clusters with no smooth order.

A clean helix, it seems, needs a place-value system in which to live. Pulling the threads together: the fit reads the value, is unmoved by the glyphs, retunes to the base, and needs place value to stay a coil.

Every one of those, though, is a statement about the shape that can be fit to the activations. None of it yet says anything about the provenance of the helix.

That’s for another post.