Monday, October 20, 2025

Trending

Related Posts

ChatGPT-4o lost to chess engine from 1979

While ChatGPT‑4o impressed with complex reasoning, it struggled with chess: it lost nearly 13% of puzzles—including illegal moves in a benchmark—averaging ~1800 Elo, still below expert level. Surprisingly, classic chess engines from 1979, like Chess 4.8, held their own—playing draws against masters—highlighting how specialized systems still outmatch even modern LLMs.


1. Puzzle Test Reveals Fragility

In a 1,000‑puzzle benchmark, GPT‑4o solved only 50% of scenarios and made illegal moves 12.7% of the time. It reached an amateur-level Elo (~1790), far below grandmaster standards theregister.com.


2. LLM vs LLM: GPT‑4o Loses to Its Predecessor

When compared head-to-head with GPT‑3.5‑turbo‑instruct, GPT‑4o scored just 10 wins and 35 losses in 50 games—indicating GPT‑3.5 outperformed 4o in chess (estimated Elo drop ~191 points)


3. 1979 Vintage Engines Still Competitive

Chess 4.7 lost to IM David Levy in 1978, but Chess 4.8 drew its 1979 televised match (89 moves, CDC Cyber 176). That engine performs decontextualized search far better than GPT‑4o on complex positions


4. Why Engines Outplay LLMs

  • Board-Aware Algorithms: Chess engines use brute-force search and heuristics; LLMs rely on pattern-based text outputs.
  • Illegal Moves Issue: GPT‑4o still produces illegal moves with alarming frequency—unable to reliably track board state .
  • Domain vs General AI: Chess requires precise rule-based reasoning—domain engines excel, LLMs remain general-purpose.

5. Broader Implications for AI

This highlights the importance of specialized agents. While GPT‑4o can explain and analyze chess in text, it cannot reliably play like Chess 4.8 or modern engines. Full game competence needs domain tuning and rules integration—areas where LLMs still lag.


Summary

Despite impressive progress, LLMs like ChatGPT‑4o remain behind even 1979-era chess engines in actual gameplay. The gap underscores that specialized, rule-based systems still outperform general AI in structured tasks—reinforcing why chess remains a domain where engines shine brighter than LLMs.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles