Atari wins out

2025-07-18

The Atari 2600 recently trounced GhatGPT in a game of chess. Google's Gemini is now having second thoughts:

"Google’s Gemini refuses to play Chess against the Atari 2600" (The Register)

This is a fun story, as well as a funny story. It's also a lesson:

I get it. It's easy to talk about what your AI system can do. To back up that talk, though, you have to actually test it. Preferably using a test with clear definitions, and that is easy to independently verify.

A competitive arena such as a chess game, or a sports match, or the stock market will make it very clear whether your bot is up to the task.

You don't have to place your AI systems in a public match, either – you can run your tests behind closed doors, where only you and your team will know the results. You do this by defining clear, honest metrics and then sticking to those metrics.

If you do this, and if you require that your bot exceeds some threshold of performance before public release, you can breathe a little easier when it is finally operating in the wild. (Do yourself a favor and keep an eye on it, though. Remember: Never Let The Bots Run Unattended.)

McDonald's genAI chatbot exposes applicant data

Don't collect data you can't protect

Anniversary of the CrowdStrike incident

One year on, how are the lessons holding up?