Poster session
My Session Status
The strong version of the “stochastic parrot” argument claims that, although large language models (LLMs) exceed rote regurgitation - e.g. by generating plausible neologisms - they cannot move beyond statistical pattern matching into abstraction or reasoning, remaining ontologically near the lower bound of pattern reuse despite producing alluringly fluent text.
We test this hypothesis using a novel-grammar paradigm. An LLM is given only a natural‑language description of a fictional grammar engineered to minimize overlap with training data, combining statistically uncommon bigrams with rare and unattested features. Crucially, no example outputs are given, and the language patterns underlying the prompt versus those that must be generated in the output are discrete. If the model generates text that follows the novel rules, it cannot be relying on memorized patterns. Instead, it must parse the description, infer constraints, and apply them generatively.
Formally, if an LLM were only a stochastic parrot, it would fail on tasks whose relevant patterns are (at least arguably) absent from training data. Successful rule‑following, as in our preliminary results, therefore refutes the strong stochastic parrot hypothesis by modus tollens. From there, we argue that statistical learning is not opposed to abstraction but one of its mechanisms, consistent with the distributional hypothesis (DH). The DH, which says that meaning emerges from patterns of use, was explicitly developed to explain human language acquisition and forms a constitutive portion of the theoretical basis for LLMs.
In LLMs, language is the substrate of reasoning - and sensation - itself. Tokens function in LLMs as distributional, logical, and semantic primitives. Furthermore, next‑token prediction is a cognitive attractor which organizes inference at the scale of contextually probable continuations. Results from our prior work illustrate how frequency patterns and cultural usage shape internal representations - further demonstrating that context is intrinsic to LLM architecture, not an optional add‑on. In fact, with natural language, there is no non-contextual meaning. Taken together, these results suggest that under appropriate architectural and contextual constraints, probabilistic learning can transmute into abstraction - even reasoning - and that LLMs can already exhibit such conditions, even if rudimentary.