Generative AI systems have made leaps of progress in recent years. They can pen poetry, create complex computer code, produce photorealistic images, and even generate full-motion video from text prompts. The results are astonishing.
But will these systems ever fully replicate and replace human beings?
To answer that question, we need to examine AI along three distinct axes: functional capability, economic replacement, and spiritual value. When we do, a clear picture emerges: generative AI is powerful but fundamentally limited. I have published videos on each of these axes (here, here, and here), but in this post we’ll look at why AI won’t replicate and replace humans functionally.
From Claude Shannon to Claude.AI
The roots of modern large language models stretch back to 1948, when Claude Shannon, the father of information theory, conducted a simple but far-reaching experiment. He attempted to build a text prediction engine that would generate English by predicting the next word based on the previous one.
Using what’s called a second-order word approximation — a bigram model — Shannon generated the following text:
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.
It wasn’t meaningful, but there were pockets of local coherence. “FRONTAL ATTACK ON AN ENGLISH WRITER” sounds a lot like a news headline. The system predicted likely next words based on statistical frequency alone, using one of the simplest language models imaginable, yet was capable of producing readable, semi-grammatical text.
But what if we scaled this model up? Instead of conditioning on the previous word, what if we conditioned on the previous 3,000, 30,000, or 300,000 words? We would end up with something closer to Claude or ChatGPT.
These models produce coherent, readable text. For example, consider the following from OpenAI’s GPT-5:
Beneath the spires where scholars tread,
The Cam flows soft by willows spread.
Old stone and ivy guard the past,
Where thoughts like whispered breezes last.Bicycles hum on cobbled ways,
As minds alight with Newton’s gaze.
Twilight wraps the chapel’s grace,
In dreaming towers, time leaves its trace.
It’s not Shakespeare, but it is poetry.
Modern generative AI systems can do much more than write prose and poetry. They can create apps, organize trips, generate lesson plans, prove number theory conjectures, do what appears to be rational inference, produce images and videos, make music, and have conversations. They summarize research papers, materialize podcasts, and provide mis- and disinformation. Generative AI systems are something like a digital wizard’s spellbook, waiting for the right incantation (prompt) to bring to life what formerly took days or months of human effort.
What endows these powers?
How Large Language Models Actually Work
Two core ideas underpin generative Large Language Models (LLMs): attention and embedding.
Text embedding works by analyzing statistical relationships among words. Words that appear in similar contexts are represented as vectors located near each other in a high-dimensional mathematical space. See, for instance, the image here, relating “animals” to “food,” among other things:
- “Fish,” “bear,” and “wolf” cluster together as animals.
- “Salad,” “stew,” and “fish” cluster together as food.
Interestingly, “fish” occupies both semantic neighborhoods, so its embedding vector will be contained in the intersection of them.
Large language models and diffusion models are deep neural networks that convert words and pixels into numbers and vectors. They convert meaning into a problem of mathematical geometry. They encode relationships among symbols.
But here is the crucial point: they know what is statistically similar to what, but they don’t know what anything actually is. They are trapped behind a wall of syntax and numbers.
The Illusion of Reasoning
LLMs appear to reason. But do they?
In 2022, a paper from UCLA’s StarAI Lab showed that models performing near-perfectly on reasoning tasks would collapse when the problems were chosen in slightly different ways (namely, when the problem distribution changed). If the system actually knew formal logic, choosing problems in different ways wouldn’t matter: one would apply the same set of logical rules to each problem. It wouldn’t matter if you selected the problem from a textbook or got it from a friend. But if the systems were doing what appeared to be logical inference based only on surface-level statistical pattern matching, the difference in how problems were chosen would matter immensely. And it did.
In 2024, Mirzadeh et al. further demonstrated that adding irrelevant information to math problems could reduce performance of state-of-the-art models by up to 65 percent.
Consider this example from their paper:
Oliver picks 44 kiwis on Friday, 58 on Saturday, and double Friday’s number on Sunday. Five were smaller on average. How many did he pick?
The irrelevant detail about size led models to subtract five, producing 185 instead of the correct answer: 190. Why would an irrelevant detail derail the mathematical reasoning system?
The answer is that these systems were not reasoning. They were performing statistical pattern matching. If actual patterns similar to the irrelevant information were contained in word problems among their training data, the systems would be conditioned to treat the irrelevant information as relevant. The systems lack understanding, so they were misled by the patterns.
Some might argue that humans can also be misled by statistical patterns. This is true. When we allow ourselves to be misled by spurious statistical regularities, we are no longer performing valid rational inference. We are doing something else entirely.
That is my point.
Model Collapse: When AI Trains on Its Own Outputs
AI may not really perform rational inference, but do they produce information of similar quality to humans?
In 2023, Shumailov et al. showed that models trained on their own outputs begin to degenerate. Repeating this process leads to the phenomenon of model collapse. By the eighth or ninth generation, the systems produce incoherent nonsense.
Thus, there is a fundamental asymmetry between the information humans produce and the information generated by LLMs. If I write text and feed it to a model, the model will get better. If the model trains on its own output, it will get worse.
Why does the model collapse? Because finite data cannot fully represent the richness of a distribution. Finite, approximate models never fully represent the living agents they’re meant to model. At best they remain simplified caricatures.
A principle from information theory called the Data Processing Inequality (DPI) helps us to understand why. It can be summarized as follows:
Clever processing cannot increase the information content of a signal beyond what is already contained in the signal.
No matter how much training data we collect, the data will forever be a limited snapshot of what it means to be human. By the DPI, processing by a large language model cannot improve it; we cannot produce a model of humanity beyond what’s contained in our finite data. Yet that data will always be lacking.
Imagine collecting every text and artifact from the year 1725. That would provide a snapshot of humanity, but an incomplete one. Doing the same in 2026 would still yield an incomplete picture. No matter how many books, blog posts, or images we gather, we can never fully capture in a finite dataset what it means to be human.
Since AI systems are trained only on finite snapshots, they will forever produce incomplete replicas.
Syntax Is Not Semantics
Finite data and imperfect approximation limit AI systems, but a more fundamental limitation haunts them: syntax alone does not give you semantics. The divide between the aether of formal, symbolic, surface-level processing (syntax) and the bedrock of grounded truth and meaning (semantics) is absolute and unshakeable.
This divide is not new. In 1929, René Magritte painted The Treachery of Images — a picture of a pipe labeled (in translation) “This is not a pipe.” The representation is not the thing itself.
Formal logic systems were designed to mechanize reasoning and remove all elements of human subjectivity. Doing so allowed machines to process logical proofs, but turned logical systems into formal games, where the rules of the game (formal syntax) became divorced from the truth and meaning behind the symbols (semantics).
In the 1930s, Kurt Gödel showed that semantics was bigger than syntax. His famous incompleteness theorem proved that any formal system capable of expressing arithmetic would have true mathematical statements that could not be proven within that system.
Gödel did this by constructing a clever self-referential statement, similar to “This statement is unprovable.” He encoded it as a numerical formula within the system.
If provable, the statement becomes false. If unprovable, it is true, meaning there exists a true mathematical statement that cannot be proven within the system.
Therefore, the world of truth is strictly bigger than the world of symbols. Pure syntax cannot fully encompass semantics.
AI systems are formal symbol processors, trapped behind the wall of syntax. Philosophers like John Searle, through his Chinese Room argument, and computational linguists like Emily Bender have argued that symbol manipulation alone does not produce understanding. Nor can it.
Large language models rearrange tokens with fluency. But symbolic fluency is not comprehension. Formal games remain games no matter how well they mimic reality.
Of Models and Minds
Generative AI can simulate an impressive array of human capabilities. It can approximate language, spoof reasoning, and imitate creativity.
But it does not, and cannot, fully replicate humans.
The map is not the territory. The symbol is not the thing. And the model is not the mind.
