Against the "just next token prediction" take

24 Aug, 2024

I dislike the take that neural networks are "just" function approximators, large language models are "just" next token predictors, etc., and that therefore they don't "really" think or speak or whatever. We know that lots of things that are superficially very different are computationally equivalent to each other. Imo this means, we can think of them as just different ways of describing the same thing. When we say LLMs are "just" doing next token prediction and not exercising "real" intelligence, we're assuming something substantial that we don't actually know. We're assuming that "real" intelligence cannot be accomplished by next token prediction. But in fact we don't know this at all. And on the contrary we have very good reason to believe that "real" intelligence can be accomplished by processes that superficially have nothing to do with intelligence -- like the dynamics of neurons firing and wiring. If someone told you the brain isn't "really" thinking, it's just a complex associative learning machine, you would recognize the argument is invalid: the brain thinks by virtue of being a complex associative learning machine (or, at least: we don't know that's not how it thinks). We don't really know how exactly the neural dynamics cooperate to make up thinking, but we pretty much know that they do (maybe mixed with some other stuff).

Maybe the skeptical take is right! Maybe there's some deep reason that "just" next token prediction in-principle cannot be intelligence. I doubt it but I'm going to admit it as a possibility because it's hard to understand what exactly intelligence is. But to just assert it is to jump right over what is really interesting to think about, to miss what is still mysterious. We don't yet know enough to dismiss the (extremely provocative) possibility that really good next token prediction is or includes or produces intelligence.