Opinion In human imagination, AIs have been good for two things: trying to take over, and loving a good game. The earliest post-war AI thinkers took it almost for granted that once computers could beat humans at chess, true artificial intelligence would have arrived. Such thinking was disproved 50 years on when IBM’s Deep Blue beat Kasparov in 1997. Computers could be very, very good at chess while still having the IQ of a pebble.
This has in no way dimmed the dysfunctional love affair between playing games and playing AI. Machine learning boosters have trumpeted victories over Go players, as well as AI getting a taste for video games. On the other hand, top-branded generative AIs cannot beat an Atari 2600 at video chess — perhaps it would be kinder to start them off with the 1p24 bytes of eight-bit ZX81 1K Chess. ChatGPT remains gloriously incompetent at tic-tac-toe: fire it up and have a go. That’s a game so simple you can build an unbeatable machine out of a handful of relays and lightbulbs.
This is hilarious. What it isn’t is trivial. The early link between chess and AI was wrong, but it was an important disproof. At the time, the workings of human intellect were understood almost as little as the way computing would develop. That very smart people would think otherwise shows two things, that we intuitively use gaming as a benchmark of prowess, and that it creates a way of talking about AI that guarantees a wide audience. AI benchmarks that get people talking are going to be our best defense against the AI hypocalypse we are currently being urged to welcome.
‘AI is not doing its job and should leave us alone’ says Gartner’s top analyst
READ MORE
Have a look at the most recent study of how well agentile AI actually works. AI agents are being hyped as the new magic, independent assistants that can be asked to do particular work-based tasks typically involving gathering, analyzing and acting on data. Do they work? Mostly, they do not. They do the normal AI things of failing to cope with complexity or context, hallucinating, deceiving and just not completing tasks.
We know this, because researchers from Carnegie Mellon University (CMU) created a fake business environment where they could play at being employees, and deploy AI agents where they could be closely monitored and scored. In other words, a simulation of real life challenges. In other words, a game. That humanizes a technical process, and that matters.
The purpose of gaming in our species isn’t to win, at least not at heart. Most people don’t win most of the time. Games are places to learn skills through experimentation. For us humans, these include the crucial skills of co-operation with and evaluating others. Over-confidence, lack of skill and a preference for deception over reality soon get a team player a reputation that follows them into real life. These are not people sane employers hire: if they do, that sanity is questionable.
AI, especially the sort that claims to be able to act on your behalf, should not get a free ride on promises alone, any more than an actual human assistant should get a post purely on what they claim in their resume. AI makers promise the world, while the AIs themselves are masters of (over)confidence. Just as the interview process is — or should be — a way of evaluating promise and confidence against skills and integrity, benchmarks need to be evolved that can be used by those who will have to work alongside the AIs. That cannot be limited to people with AI evaluating skills. Those skills are rare, where they exist at all.
Which is where gaming comes in. It’s a very human evaluation technique, and the results are very easy to communicate to others. The final score is important, but not as important as the sentiment of playing, and it’s that emotion which drives stories that people care about and want to tell.
If you do ask ChatGPT for a game of tic-tac-toe, you can ask it beforehand what it thinks of its own strengths, and try to explain to it afterwards where it went wrong. You’ll end up with a story about the technology that you can tell to anyone, and will want to.
This is precisely what we need to defend against AI hype. It’s no good talking to IT peers about how rubbish a technology is; it has to get into the culture so deeply that your aunt, your nephews and your CEO know it too. Finding ways to create a game-like environment that you can plug people and AIs into is a challenge, but the CMU paper has plenty of pointers. It’s not as if the gamification of business has no other applications, after all.

Put Large Reasoning Models under pressure and they stop making sense, say boffins
READ MORE
The AI industry, if it had more well-founded confidence than bluster and hope, should be all over this. Previous AI winters happened as much through sentiment as spreadsheets: the perception that AI was on the brink of greatness and more investment would get it there faded as other stories got more compelling. Demonstrating that AI agents are good to work with in ways that people intrinsically appreciate can only be a good thing, right?
That it doesn’t think so is a good story in itself. That it wants to put its technology at the heart of business with flaws so deep it couldn’t get a job as a deputy assistant teamaker is another. Finding a way to tell those stories outside the temples of technology is a very serious business indeed. Game on. ®