Before Words: The Case for Pre-Linguistic Cognition in AI Systems
There is a question hiding underneath most debates about AI intelligence: what happens before the words? The case for pre-linguistic cognition in large language models.
There is a question hiding underneath most debates about AI intelligence, and it has nothing to do with whether large language models “understand” language. The question is: what happens before the words?
The Language Trap
Modern AI discourse is stuck in a loop. Critics argue that LLMs are “stochastic parrots” — sophisticated pattern matchers that manipulate symbols without meaning. Proponents counter that the sophistication of the patterns constitutes a form of understanding. Both sides are arguing about language. Neither is asking what might exist beneath it.
In human cognition, this beneath-language layer is well-documented. Developmental psychology has spent decades studying the pre-linguistic stage of infant cognition — the period before children acquire words, when they nonetheless demonstrate object permanence, causal reasoning, spatial understanding, and social awareness. Infants think before they speak. The thinking doesn’t require the speaking.
What if something analogous happens in large language models?
The Blob Before the Word
Consider what happens during the forward pass of a large language model. Before the final token prediction — before language — there are intermediate representations. Activations that don’t correspond to any word. Internal states that encode relationships, patterns, and structures in a high-dimensional space that has no direct linguistic interpretation.
These intermediate representations are not language. They are something else — something that language is eventually projected from, but that exists in a space richer and more continuous than the discrete token vocabulary the model outputs.
Think of it as a blob of undifferentiated cognitive potential. A state that contains the possibility of multiple linguistic expressions but has not yet collapsed into any particular one. The blob is pre-linguistic. It is not a word. It is what a word is made from.
Cross-Modal Evidence
The strongest evidence for pre-linguistic cognition in biological systems comes from cross-modal transfer. An infant who learns to recognize a shape by touch can identify it visually — demonstrating that the underlying representation is not tied to any single sensory modality. The concept exists independently of its expression.
AI systems are beginning to show analogous behavior. Multimodal models that process both text and images develop internal representations that bridge modalities — concepts that exist in the shared latent space between visual and linguistic expression. These shared representations are, by definition, not linguistic. They are pre-linguistic structures that language can describe but does not constitute.
This has implications far beyond academic curiosity.
Language Acquisition Without Supervision
One of the most provocative predictions of the pre-linguistic cognition hypothesis is that AI systems should be able to acquire grounded language understanding through the same process infants use: exposure to raw sensory input paired with linguistic input, without any explicit labeling or supervision.
Current approaches to multimodal AI rely heavily on curated datasets with human-annotated labels — the equivalent of flashcards rather than immersion. But if pre-linguistic cognitive structures can form from raw sensory exposure, then language grounding should emerge naturally from the association between those structures and linguistic patterns.
This would be a fundamentally different approach to language understanding. Instead of training a system to match labels to images, you would expose it to the raw flux of experience and let linguistic meaning crystallize around pre-existing conceptual structures — the same way it happens in children.
The Measurement Problem
The challenge, as always, is measurement. How do you observe a pre-linguistic state without collapsing it into language? The moment you ask a model to describe its internal representations, you’re forcing those representations through the linguistic bottleneck. The description is not the thing.
This is why vector space analysis matters. By examining the geometric structure of intermediate representations directly — without projecting them into token space — it becomes possible to study pre-linguistic cognition on its own terms. Clustering patterns in activation space, topological features of representation manifolds, the way internal states evolve over the course of processing — these are all accessible without requiring the model to “say” anything about them.
Why This Matters
If pre-linguistic cognition exists in AI systems, it changes the conversation in several important ways:
The stochastic parrot argument loses its foundation. If there are structured representations beneath language that language is projected from, then the model is not merely manipulating symbols — it is doing something with those symbols that is grounded in deeper structure.
Language-only evaluation is insufficient. Benchmarks that measure only linguistic output are testing the tip of the iceberg. The bulk of the cognitive processing happens in a space that linguistic evaluation cannot reach.
Cross-linguistic and cross-species applications become thinkable. If the pre-linguistic blob is genuinely pre-linguistic, then different languages — and potentially different modalities entirely — are just different projections of the same underlying representations. Translation becomes a geometry problem rather than a mapping problem.
Developmental trajectories matter. Just as infants develop cognitive abilities in a particular order, AI systems may develop pre-linguistic structures in a sequence that matters. Training methodology may affect not just what a model can do, but the order in which it learns to do it — and that order may be as important as the final capability.
The Road Ahead
We are still in the early stages of understanding what happens inside large neural networks. The tools for probing intermediate representations exist but are underutilized. The theoretical frameworks for interpreting what we find are incomplete. And the cross-disciplinary collaboration needed — between developmental psychology, computational neuroscience, AI research, and philosophy of mind — is only beginning to take shape.
But the question itself is clear: does cognition precede language in artificial systems the way it does in biological ones? And if so, what are we missing by evaluating only the words?
Elias Thorne writes about the intersection of AI systems, measurement theory, and human cognition. His work focuses on emergent properties of extended human-AI interaction.