A debate I’ve been having far too often is whether Large Language Models (LLMs) are actually intelligent. It reminds me a bit of the Fermi paradox, a pop culture thought experiment regarding intelligent life. The question posed is, in a universe so vast, with trillions of galaxies containing billions or trillions of stars, the probability of other intelligent life is extremely high… so, where is it?
I’ve often joked that Artificial intelligence has a similar paradox. With so many supposedly intelligent models operating on incalculable amounts of information, why have they accomplished so little? Sure, they can churn out ok-ish code, write mediocre articles, and search the internet, but what about their real world impact?
I think a lot of the reason I keep going around in circles on this topic is due to confusion stemming from the fact intelligence is hard to define, and even harder to quantify.
Marketing, Intelligence, and Marketing Intelligence
When OpenAI released their first versions of ChatGPT, they often played up their model’s ability to perform well on human-focused tests like SATs, leetcode puzzles, and the bar exam. At a glance, if we’re comparing the intelligence of AIs to humans, it’d make sense to use human tests…or would it?
Quite early on, a lot of us dismissed these test scores as a cheap marketing gimmick. Since LLMs already contain the answers to the exam questions, it should be expected that they’d perform well on the exam. It doesn’t need to “learn” anything, it already knows.
Most exams can be passed one of two ways: by learning the answers, which requires intelligence, or simply by memorizing them, which does not. LLMs do the latter.
If I were to write a python script where I hardcoded all the answers to an exam, then the script aced the exam, not a single person on earth would be impressed. What makes LLMs seem impressive is the complexity in how they work. They aren’t simple scripts with hardcoded answers, but much more convoluted and opaque systems, which few people understand. This is what gives them their air of mystery, and illusion of intelligence.
The Mystique of LLM Learning
One common argument seems to boil down to the idea that the human brain is complex and poorly understood, so at a certain level of complexity and poorly understood-ness, code transitions from just “memorized answers” to “learned knowledge”. Or in some cases people argue the opposite, that humans are just glorified computers and human learning is no different to the way in which LLMs operate.
Either way, people tend to project human-like characteristics onto LLMs, claiming they are “learning” in the same ways humans do, rather than “knowing” in the same way a piece of paper containing a list of exam answers does.
My personal belief is that LLMs are much closer to a hardcoded list of answers than a human learning new information, but these are questions of philosophy, not technology, thus can likely neither be proven nor disproven.
Though, we can take a step back. Rather than arguing about the internal workings of LLMs and how they compare to the inner mechanics of the human brain, we can instead look at external results. But first, I think it’s important to actually define intelligence.
The Difference Between Knowledge and Intelligence
I think the simplest definition of intelligence I can think of is “the innate ability to work with incomplete information”. Intelligence isn’t what you know, but your ability to use logic and reason to navigate problems using incomplete, incorrect, missing, or conflicting information.
Knowledge is the information itself, while intelligence is one’s ability to leverage it. I suspect a lot of conflation stems from the education system focusing heavily on testing knowledge, not intelligence. Few exams differentiate between knowing something and understanding it. Thus, much of the tests can be easily beaten by simply memorizing information.
Novel Task As A Test Of Intelligence
When it comes to directly testing intelligence, things are far more complex. Performance on almost any task is going to be a combination of both knowledge and intelligence. To give an example, let’s imagine we have a group of people who grew up completely isolated from any source of knowledge. No education, no access to media, but understand language.
We could provide them with a task such as creating fire. Each participant would be provided with some sticks, rocks, string, and kindling, then shown a fire and asked to recreate it. Realistically, it is unlikely that anyone would be able to produce fire.
Next, we can start introducing knowledge. Let’s say we explain that friction is produced by rubbing two stick together, friction creates heat, and heat creates fire. At that point, some participants may be able to figure out how to create fire with the materials provided.
It may not be an exact correlation, but the expect result should be that the more intelligent a participant is, the less information they would need to be provided before they’re able to complete the task.
Similarly, if we give the most intelligent participants little to no information whatsoever, and the least intelligent participant a step-by-step guide, we would expect to see the less intelligent participants outperform the more intelligent ones.
The Difficulty With Testing Intelligence
Since knowledge and intelligence compliment each other, without the ability to fully compensate for knowledge, there can never be a truly fair test of intelligence. Although Some tests may be less sensitive to pre-existing knowledge than others, any test is going to be trivial if you’ve already memorized the answers.
Whilst it may not be easy to develop objective measures of intelligence for comparing intellect across a population, it’s fairly easy to approximate the intelligence of an individual participant by evaluating how they use logic & reasoning to approach a novel task.
There are plenty of challenges which any individual human has never encountered before, but few which are novel to humanity as a whole. Which leads me to the problem of testing LLM intelligence.
LLMs - The Ultimate Memorization Machine
LLMs are trained on incalculable large datasets. Every website, book, podcast, blog post, video, and white paper; absolutely any and every piece of information the AI companies could get their hands on. Additionally, LLMs are refined with copious amounts of human-driven training from processes like RLHF, as well as given the ability to search the internet in realtime. In essence, so much training data goes into building LLMs, that not even their creators know the totality of what’s in their dataset. This leads to two clear problems.
- How do you come up with a novel tasks to test the intelligence of a system which has instantaneous access to almost all of human knowledge.
- How do you confirm that the LLM was actually able to truly solve such a problem, and that the answer wasn’t already in its dataset or bruteforced.
I’d say it’s clearly a mistake to test a system with almost all of human knowledge at its disposal on tests designed for humans with access to limited information, but I think this is very much by design.
The tech companies want to sell investors and consumers on their solutions, so there’s no sense in being realistic about the limitations of their capabilities. Instead, they capitalize on AI’s vast “knowledge” by touting its prowess on tests where information can be used as a substitute for intelligence.
The AI Intelligence Paradox
While we can argue in circles for eternity whether, on a philosophical level, LLMs can reason or understand, there seems to be a pretty easy way to evaluate their intelligence.
Though it may be extremely difficult to create a truly novel task for LLMs to attempt, we already have plenty of known problems for which novel solutions are required. The sciences are littered with unsolved equations, unanswered questions, and incomplete theories.
So then, why aren’t we seeing major scientific breakthroughs coming from LLMs?
If someone like Einstein could revolutionize physics with access to only as much literature as a human at that time could have realistically consumed, why can’t LLMs, with access to a billion trillion times more information, seem to do anything of value at all.
I would expect that if it were possible to increase the capacity of the human brain to contain all information ever recorded, even a person of below average intelligence would be able to accomplish amazing feats. So, then, why can’t LLMs? With the amount of data available to every LLM, I’d expect major scientific breakthroughs to be falling from the sky, but instead, we’re drowning in an endless torrent of low-quality slop.
Even if we concede that LLMs are not autonomous agents, thus are reliant on a human operator, there are plenty of extraordinarily intelligent people using LLMs. Yet still, even with an intelligent operator, why are LLMs not responsible for an abundance of scientific breakthroughs.
I think the reasonable conclusion is that LLMs are simply not intelligent.
Without intelligence, LLMs cannot truly reason; therefore, cannot perform outside the bounds of what’s already known to them through their dataset. At best, they will be able to use pattern recognition to solve new variants of existing problems. I simply don’t believe LLMs are the path to superintelligence, Artificial General Intelligence, or even intelligence at all. What we have is not intelligence. It’s Google on steroids.