The Paradox of AI in Gaming

With the rise of artificial intelligence (AI), creating a functional video game has become almost effortless. Tools like Cursor and Claude can generate classic games such as ‘Asteroids’ with a simple prompt. However, the irony lies in the fact that these same systems fail to play the games they create. The challenge is not merely the complexity of the games but the disparate rules governing video games versus the consistent physical laws of the real world.

Understanding the Difficulty: Do, Not Play

Julian Togelius, the director of the Game Innovation Lab at New York University, has conducted extensive research on this phenomenon. He emphasizes that while programming can be seen as a well-designed game with clear success criteria and feedback mechanisms, video games introduce arbitrary rules and immediate or delayed feedback that challenge AI models significantly. His findings indicate unequivocal failures when AI is tasked with gaming.

The Structure of Programming

Programming can be likened to a structured game, where each line of code has an explicit statement and path to success. Large Language Models (LLMs), trained on extensive codebases, excel at this type of structured problem-solving. This explains why many find programming enjoyable; it’s a direct, rule-based challenge.

The Struggles of AI in Gaming

In contrast, video games are complicated ecosystems where each game has its own nuances. When AI models attempt to engage in gaming, the lack of consistent rules leads to “absolute failure.” For example, models like Gemini 2.5 Pro were able to complete titles such as ‘Pokémon Blue’, but only after an abundance of time and with the help of external guides—ultimately proving less efficient than human players.

The Role of Physics in Gaming

Togelius argues that games are incredibly heterogeneous, each with unique rules, spatial logic, and rewards. For instance, the mechanics in a platformer like ‘Super Mario’ differ vastly from those in ‘Tetris’. AI struggles here because spatial reasoning—essential for navigating these games—is often absent in their training datasets.

Real-World Applications: Driving vs. Gaming

Interestingly, self-driving cars, which may seem more complex than video games, perform effectively under these consistent real-world laws. As Togelius notes, driving is a more homogeneous task; the rules governing road safety and vehicle responses remain relatively unchanged globally. This uniformity contrasts sharply with the varied landscapes and mechanics of video games, where a player’s experience in one game doesn’t easily transfer to another.

The new test to discover whether or not an AI model is truly intelligent: play Pokémon

Establishing The Definitive Criterion for AI

To evaluate an AI’s capabilities, Togelius suggests that video games serve as a benchmark. The goal is to see if an AI can complete any game in the top 100 on Steam in a timeframe comparable to a skilled human player, without pre-existing documentation or specific guides. As of now, no AI stands even close to meeting this standard.

This raises an important question about the future of AI in gaming. As models evolve, understanding how they interact with dynamically structured systems like video games may ultimately redefine our expectations of their capabilities.



General News – 2