Our brain processes speech in layers, much like AI language models
12-14-2025

Our brain processes speech in layers, much like AI language models

The brain’s timing during speech comprehension matches the stepwise layers of modern, large language models (LLMs), according to a new study.

The evidence comes from direct brain recordings collected while people listened to a single, 30-minute story.

The brain recordings were analyzed alongside model representations from systems like GPT 2 and Llama 2.

The work observed later peaks in language regions when comparing deeper model layers, a pattern that suggests more integrated processing at those points.

Collaborators in Jerusalem, Princeton, and industry labs worked on the study, which concentrated on certain brain regions, including Broca’s area and the superior temporal gyrus.

How meaning builds in the brain

Researchers used electrocorticography, which involves electrical recording from thin grids placed on the cortex during clinical monitoring. This technology captures fast activity linked to local neural firing. 

There is longstanding evidence that high-frequency power in these recordings tracks nearby neuronal activity.

In Broca’s area, the peak match between brain signals and model layers shifted forward in time as layers deepened.

There was a reported correlation of 0.85 between layer depth and latency. This pattern points to a temporal build-up of information rather than a single-stage jump.

The work was led by Dr. Ariel Goldstein from the Hebrew University of Jerusalem. His research focuses on how the brain encodes natural language and the links to deep language models.

“What surprised us most was how closely the brain’s temporal unfolding of meaning matches the sequence of transformations inside large language models,” said Dr. Goldstein.

“Even though these systems are built very differently, both seem to converge on a similar, step-by-step build-up toward understanding.”

Layers reveal signals in the brain

The team saw the clearest temporal progression in higher-order language areas and not in the early auditory cortex

That makes sense because these later regions integrate context that accumulates over several hundred milliseconds.

Words that the model predicted well showed stronger and earlier alignment than words it did not, implying shared expectations about what comes next. 

Processing time across the network

In the temporal pole, the separation between the earliest and latest layer-aligned peaks exceeded 500 milliseconds. This suggests longer time spans near the apex of the language pathway.

These results echo prior work on temporal receptive windows, durations over which prior input shapes a response in different parts of the cortex.

The study revealed a gradual lengthening of processing windows from sensory cortex to narrative hubs, a hierarchy that reappears in the new data.

Across the ventral language stream, the anterior superior temporal gyrus and the temporal pole exhibited steeper timing gradients than the middle superior temporal gyrus.

That pattern fits a hierarchy in which representations stretch over longer spans as processing climbs the pathway.

Context and word meaning

Classical symbolic features did not predict the brain’s time-locked activity very well. This included phonemes, the perceptually distinct sound units that differentiate words, and morphemes, the smallest units of meaning.

Contextual embeddings provided a stronger match. These are vector representations that encode word meaning with surrounding context. 

That does not make rules irrelevant, but it does suggest that distributed context may carry the heavier load during natural listening.

Limitations of the study

Independent evidence has already linked brain signals to next-word prediction, surprise, and contextual representations, using similar podcast-style stimuli. 

That work helps explain why a layered sequence appears in the new data without implying identical architectures in cortex and in transformers.

Similarity is not identity. Transformers are engineered to process long stretches in parallel during training, while cortical circuits operate with biological constraints and serial timing.

One should be cautious about declaring equivalence. Limits matter. The sample came from nine epilepsy patients with electrodes placed for clinical reasons, and coverage varies across individuals. 

Future tests that manipulate predictability and control acoustic detail will help separate true anticipation from simple carryover of prior context.

Ideas across layers in the brain

Alongside the paper, the authors released a public dataset from the nine participants, with direct recordings aligned to every word in a 30-minute story.

The dataset anchors the claims in shareable evidence and invites head-to-head comparisons of symbolic and learning-based theories.

Benchmarks shape progress when they are clear and accessible. By pairing natural speech with sub-second neural dynamics and open model-sampling code, this one turns theories into testable claims – allowing better ideas to be proven, not just proposed.

The study is published in the journal Nature Communications.

—–

Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates. 

Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.

—–

News coming your way
The biggest news about our planet delivered to you each day
Subscribe