In 2019, François Chollet, a young researcher, envisioned an AI benchmark that was almost unheard of at the time. The idea was revolutionary, considering the limited tools available to evaluate AI capabilities effectively. Chollet was ahead of his time, anticipating an AI revolution that wouldn’t truly begin until the advent of ChatGPT and the ensuing AI boom three years later.
Challenging AI Through Benchmarks
Subsequent years saw the emergence of various synthetic benchmarks, yet ARC-AGI stood out for its unique approach. Unlike typical benchmarks that focused on a model’s memorization abilities, ARC-AGI emphasized abstract reasoning and generalization abilities across different contexts.
The problems presented in ARC-AGI and its successor, ARC-AGI 2, primarily consist of visual puzzles that, while relatively easy for humans, were nearly insurmountable for machines. Over the last couple of years, AI models have shown remarkable advancements in abstract reasoning and generalization, slowly tackling an increasing number of ARC-AGI puzzles. The downside? The expense incurred in these endeavors.
And that brings us to the transformative technology of GPT-5.2.
Assessing the Cost of AI Performance
Last year, the AI model o3-preview accomplished the impressive feat of solving 87% of ARC-AGI 1 puzzles. This achievement was so significant that it prompted an official announcement from the organizers of the benchmark , showcasing the breakthrough. However, this was achieved at a staggering cost of $456,000 for 100 tasks, translating to $4,560 per task.


Source: ARC-AGI Prize
OpenAI’s most recent model, GPT-5.2, launched recently, has demonstrated exceptional performance across various benchmarks, particularly in ARC-AGI 1. Notably, it achieved a remarkable 90.5% problem-solving rate with its Pro (X-High) version, but what truly sets it apart is the cost efficiency. Each task now costs only $11.65, a reduction of 390 times less than the previous year.
Moreover, an even more economical version, GPT-5.2 (X-High), solved 86.2% of problems at an astonishing rate of just $0.96 per task—an unprecedented achievement in AI performance and cost reduction.


ARC-AGI 2 remains a challenge for most models, but GPT-5.2 has taken another leap in quality. Source: ARC-AGI Prize.
The Future of AI Benchmarking
Anticipating advancements, Chollet and his team introduced ARC-AGI 2 in March 2025 to further challenge AI models. This benchmark currently poses a significant challenge, with even the best models only solving 38% of the tasks (like Claude Opus 4.5). In contrast, GPT-5.2 has managed to resolve approximately 55% of these new problems at a cost of $15.72 per task, showcasing another monumental step forward.
The pattern is evident: AI is not just improving, but doing so at decreasing costs. This trend is encouraging as it counters the notion that scaling efficiency is diminishing, emphasizing that performance gains are increasingly tied to reduced costs instead.
As the AI race reaches a pivotal moment, the pressing question isn’t whether AI can solve challenges, but rather how economically viable it is to do so. The transition to GPT-5.2 reflects a shift toward cost efficiency, which is vital for navigating OpenAI’s precarious economic reality. Moving forward, maintaining affordability and efficiency will be crucial for the company’s sustainability in an evolving competitive landscape.

