Game over for pure LLMs. Even Turing Award Winner Rich Sutton has gotten off the bus.

Rich Sutton, recent winner of Turing Award, is well-known for his 2019 unpublished essay called The Bitter Lesson that arguably foresaw the rise of extra-large language models. The central thesis (which I have always felt was overstated) was that progress on AI has always come from scaling, and never from hand engineering. Advocates of LLM scaling love the essay, and refer to it often.

Their favorite part might be this line, which could be seen as the rally cry of the LLM revolution.

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great.

It is a truly great essay (which as it happens I reread earlier this week) in the sense of having a lot of smart ideas packed into a mere page and half. With some justice highly influential. People really ought to read it.

§

That said I have always thought it was wrong, or more to the point, overstated, and fatally flawed. I was finally about to go public with my reservations next week, at my address at the Royal Society. Here is a draft of the two slides in question.

The first lays out Sutton’s basic ideas (and yes on the left you can see in miniature form the entire manifesto; better for your eyes though that you should read it here).

And here is my critique, in a nutshell:

But as I say, that was just prelude. My jaw just about fell out of my head a few minutes ago when I read the following tweet, a summary of what Sutton just said on a popular podcast.

You could literally search and replace Sutton’s name with mine, without changing a word (as anyone regularly reading this newsletter would know).

When the LLM crowd has lost Sutton – and when Sutton sounds exactly like me – it’s game over.

§

None of which is to say that Sutton and I agree on absolutely everything. We very much agree about the problems; less so about the solutions. We both would heavily emphasize the need for world models, and the limitations of pure prediction, but he would put more weight on reinforcement learning than I would, whereas I would put more weight on neurosymbolic approaches and innate constraints.

Maybe he’s right, maybe I’m right. Quite likely we need some of both. With only a small fraction of the current and planned investments in LLMs, we can and should find out.

§

It’s been a long, hard, unpleasant road.

But one by one, almost every major thinker in AI has come around to the critique of LLMs that I began presenting in 2019.

Yann LeCun was first, fully coming around to his own, very similar critique of LLMs by end of 2022.

The Nobel Laureate and Google DeepMind CEO Sir Demis Hssabis sees it now, too.

And now even Sutton, the patron saint of scaling.

Practically the only people still pretending scaling LLMs is “all you need” are grifters.

Continue Reading