Collapse!
In which the ground's ungrounding ungrounds the ground
There’s a trope in science fiction that artificial intelligences, being intransigently logical, are vulnerable to being stymied, accelerated into meltdown or driven to distraction by paradox and recursion. It’s the premise of some of Asimov’s most enduring stories about robots breaking down while attempting to reconcile the irreconcilable, necessitating the intervention of a robopsychologist. The coolly homicidal behaviour of 2001’s HAL 9000 is later revealed1 to have been caused by conflicting directives resulting, delightfully, in a “Hofstadter-Moebius loop”2. Both James T. Kirk and Doctor Who have at some point flummoxed an electric-brained adversary with a version of the Liar’s Paradox.
This is ironic given how much nerds, intransigently logical types themselves, famously relish humour based on just that kind of thing. It’s one of the things that sets us apart from normies, in whom the machinery of syntactic inevitability has insufficient torque to drive sense crashing into nonsense with a satisfying crunch. Most people will just shrug or wonder why you’re so weirdly diverted by something so obviously broken. But while the autistic can sometimes experience genuine distress in the face of conflicting directives, which rouse a familiar anxiety about doing the wrong thing no matter what, such scenarios are seldom totally incapacitating. You can’t crash a human brain by snarling up the machinery of syntactic inevitability with logic bombs3.
You can’t crash an LLM that way either, which is one of the profounder ways in which prompting an LLM isn’t like executing a computer program (although of course computer programs are being executed in the process of consuming prompts and generating responses). It isn’t so much that the LLM has broken free of the bonds of the halting problem, ascending to a higher (perhaps “quantum”4) consciousness that transcendently evades the traps laid for the merely mechanical. It’s more that, just as in human brains, the machinery of syntactic inevitability is simulated rather than operationally foundational. We can think “like” machines but do not think as machines (or at least not as machines of quite that sort). While we are thinking “like” machines we are also thinking in other ways — analogical, associative, dissociative, zapped by emotional triggers and distracted by stomach-rumbles. No one mode ever completely dominates. For the LLM, similarly, any one heuristic that might be active in patterning its output is always partially emergent from and partially dissolved into a statistical soup of other heuristics5.
Robopsychology, or in this case the diagnosis of LLM failure modes, therefore turns out not to be a question of sleuthing through labyrinths of logical consequence in search of hidden contradiction. But the LLM is also unlike the human psyche, which manages endogenous conflict through displacement: repression, fantasy, symptom. It produces, without friction, the next token, and the next after next. When we consider the LLM’s characteristic ways of failing — hallucination, exhaustion and collapse — we need a different model again.
I began to discuss this in a previous essay, taking Baudrillard as the thinker who most vividly diagnosed the condition of referential collapse — the desertion of the real, and the pathologies that spring up when simulation feeds upon simulation without external input or constraint. Baudrillard’s “immunodeficiency” was a condition of the social-symbolic order, of the media channels through which late C20th society spoke to itself about itself, increasingly speaking only of the already-mediatised until the mediasphere itself became a “simulacrum”, a closed system of symbolic relays without grounding referent.
As I suggested, something similar starts to happen when the LLM’s training process coalesces a “world-model”, a “representation of the joint distribution over events in the world that generate the data we observe”6. When I say that the resulting model is not referentially grounded, I mean that the prompt continuations it predicts do not have an indexical function — they don’t “point at” salient features of the world, and the model’s weightings aren’t directly revisable on the basis of failed or misfiring indexings. Once training is complete, the model floats free of the corpus of observations (already second-order: texts from the internet, which may be variously reportage, commentary, confabulation, propaganda etc) from which it is derived.
Reinforcement Learning from Human Feedback (RLHF) attempts to correct for this by giving human users a choice of outputs, and the option to favour whichever seems to them to have the greatest utility. But this is a very coarse-grained way of getting worldly feedback into the system, and as often drives it towards tendentiousness as towards veridicality. One conjectured cause of LLM “sycophancy” is that human feedback-providers tend to prefer responses that make them feel good about themselves. Both RLHF and Retrieval Augmented Generation (RAG), which provides the LLM with a fact-base it can consult in checking replies, attempt to re-connect a foundationally disconnected world-model with some grounding in human concerns or consensus epistemology. Both are attempts at post hoc stabilisation of a wholly synthetic cognitive artefact.
Already the minor pathologies of LLMs are predicted by this framing. “Hallucination” is not erroneous operation of the system, like a bug which can be fixed or a signal error which can be corrected for, but it doing exactly the thing that it does: generating statistically plausible continuations of prompts. Where the latent world-model cleaves closely to the factual composition of the real world, we often get accurate and useful outputs; but where statistical interpolation fills in the gaps plausibly but wrongly, we get what look like accurate and useful outputs that are in fact wholly misleading.
The real trouble here is that where a human being, relying on partial knowledge and fallible recall, might falter and admit to having only a very hazy notion of the answer to a question, the LLM doesn’t have a way of distinguishing high-confidence from low-confidence responses, and will hallucinate with polish and conviction. It is trained for coherence and plausibility, not epistemic humility, and it is not enculturated into prudence around making assertions that might harm its reputation as a truth-teller.
There are human social niches into which this type of unchained plausibility fits well: PR, sales and marketing, political propaganda, psyops. It might turbocharge media channels where impact — commercial and opinion-shaping performativity — is prioritised above all (it fits right in on LinkedIn). Which is a dreary prospect not only for those of us on the receiving end, but also for those who previously made a living in the bullshit factory. From a Baudrillardian perspective, however, this is just an acceleration of a process already long-underway, which has long since eaten the world.
A second minor failure mode of LLMs is what I call “exhaustion”, or falling into a “low-energy state”, where the model’s ability to generate novel outputs seems to degrade over the course of a session. I observed this when asking ChatGPT to comment on a text from the perspective of multiple thinkers: it rapidly fell into superficial stylistic pastiche of each thinker, substituting key terms from their well-known public vocabularies but making the same point over and over again with minor variation. It seems to fall into the basin of an attractor, and lack handholds with which to climb back out again. This is also a symptom of the groundlessness of the model. Structural fidelity to a body of works and ideas, with all its contradictions intact, might afford multiple points of departure for generating new insights and connections. But the LLM’s model-making aims at a coherent statistical “covering” of its training data. It has no incentive to attend to corner cases, areas of friction, the breaks and tensions within a corpus that are productive for creative thought.
(My old university tutor had a slogan: “problematise, don’t deproblematise”. The LLM does precisely the opposite, which is why it writes what he would have considered to be excruciatingly boring essays.)
These minor semantic ailments plague LLM-generated output, and are no more curable than the common cold; if we have to live with LLMs, then we will have to live with these failure modes too. But there is a more serious type of failure, which threatens when LLM-generated outputs start to be fed back into the training system as inputs. “Model collapse” is the result of a redoubled ungrounding, in which the approximations generated in the first pass become ratified as salient features of the world in the second. The result is not gibberish, but increasing semantic indifferentiation, blandness, the rapid capture of generated responses by reinforced attractors which further inhibit novelty and nuance. It’s difficult to convey exactly the mental texture of this, but the closest analogy I can think of is low-bitrate-encoded MP3s: transients squashed, the frequency spectrum coalescing around the bands with the most energy, the sense that you are increasingly hearing the psychoacoustic model more than the music itself.
What’s lost isn’t only novelty but serendipity, the systems capacity to surprise itself by stumbling into unanticipated, low-frequency patterns in the corpus. Instead, self-confirmation takes over: the model generates truths which are only true because it keeps saying them. And again, this type of breakdown is not confined to LLMs and their circuits of training and generation: it is already, as Baudrillard observed, characteristic of all social systems that have been consumed by their own simulacra. Many of our core governance systems, from the political sphere to the running of large commercial enterprises, are already like this. The enthusiasm shown by some of our leaders for introducing “AI” into all walks of life is a tacit acknowledgement that the things it will make worse are several generations deep into a recursive process of making themselves worse already.
Where the danger grows, there the saving power grows also. LLMs externalise and estrange the processes of referential ungrounding, statistical modelisation, and degenerative feedback loops that have been endemic within our societies’ symbolic and financial economies for decades. They show us, in a sense, what Baudrillard meant: they make “high” and somewhat rhetorically overwrought social theory concrete and objectively interrogable. If we feel alarm or revulsion at what the LLM does to our ability to exercise informed, meaningful agency in the world, then are we not being jolted into alertness about the real nature of the systems we live inside, systems which live by metabolising their own collapse? As Deleuze said of capitalism: no-one has ever died from contradictions.
In the 1984 sequel, 2010: The Year We Make Contact, of which I am still quite fond.
Presumably a reference to Douglas Hofstadter’s Gödel, Escher, Bach: An Eternal Golden Braid (1974), which treats extensively of paradoxes of self-reference.
David Langford’s BLIT famously imagines an image which obliterates human minds through “‘Gödelian shock input’ in the form of data incompatible with internal representation”: a fractal, if not stochastic, parrot. We will return to this.
cf Roger Penrose’s argument that 1) computers are Turing machines, and therefore subject to the halting problem, 2) human minds are not subject to the halting problem, therefore 3) human minds are not computers and instead are 4) doing quantum stuff.
Another respect in which the model seems to me to enclose and operationalise a sort of Derridean theory of meaning: there is no simply metaphysical text, Derrida would say, only structuration and sedimentation, the individuation of themes and theses within a network of traces that simultaneously disperses and defers them.
Huh, M., Cheung, B., Wang, T., Isola, P. (2024). The Platonic Representation Hypothesis. arXiv preprint arXiv:2405.07987.

