Google’s AI Overview wrong 10% of the times

0
127
AI Overview

A major report published yesterday by The New York Times, in collaboration with AI startup Oumi, has confirmed that Google’s AI Overviews currently carry an error rate of approximately 9% to 10%.

The study used the SimpleQA benchmark—a rigorous test of over 4,000 verifiable factual questions—to measure the accuracy of the summaries that now appear at the top of nearly 50% of all Google searches.


1. The Accuracy Gap: Gemini 2.5 vs. Gemini 3

The report highlights that while Google’s accuracy is improving with newer models, the “confidently wrong” nature of AI still poses a massive scale problem.

Model VersionAccuracy RateError RateStatus
Gemini 2.585%15%2025 Standard
Gemini 391%9%Current (April 2026)

While a 91% accuracy rate sounds impressive, analysts warn that with Google handling over 5 trillion searches per year, even a 9% error rate results in tens of millions of incorrect answers every single day.


2. High-Profile “Hallucinations”

The NYT and Oumi analysis identified several specific instances where AI Overviews provided false information as absolute fact:

  • Bob Marley Museum: When asked when Marley’s home became a museum, Google answered 1987. The correct year is 1986. The AI cited multiple sources, but none supported the 1987 claim.
  • Yo-Yo Ma: For a query about his induction into the “Classical Music Hall of Fame,” Google linked to the organization’s site but claimed the hall of fame did not exist.
  • Dick Drago: The AI correctly stated the baseball player’s age at death but provided a completely fabricated date of death.

3. The “Ungrounded” Citation Problem

Perhaps more troubling than the flat-out lies is the rise of “ungrounded” responses.

  • Definition: These are answers that are factually correct but link to sources that do not contain the information or actually contradict the answer.
  • The Trend: In October 2025, 37% of correct AI answers were ungrounded. By February 2026, that figure rose to 56%. This makes it nearly impossible for users to verify the information without doing a manual search.

4. Google’s Rebuttal: “Flawed Benchmarks”

Google spokesperson Ned Adriance has pushed back against the findings, stating that the study has “serious holes” and does not reflect how people actually use Search.

  • Flawed Data: Google claims the SimpleQA benchmark itself contains inaccurate data and that they use a more strictly validated internal tool called “SimpleQA Verified.”
  • Dynamic Results: Google notes that AI Overviews use different models (some smaller/faster, some larger/smarter) depending on the query, making a single accuracy score misleading.
  • Safety First: The company maintains that its safety systems filter out the most dangerous hallucinations, particularly in “Your Money or Your Life” (YMYL) categories like health and finance.

5. Why It Matters: The “Zero-Click” Risk

Critics argue that the 10% error rate is more dangerous now than in previous years because AI Overviews have led to a “Zero-Click” search culture.

  • Trust Factor: Unlike the old “ten blue links” where users compared sources, AI Overviews present a single, authoritative summary.
  • Organic Drop: Organic click-through rates (CTR) have plummeted by 61% for queries featuring an AI Overview. If the summary is wrong, users are 60% less likely to click a link and discover the truth.

“Google is trading the reliability of the open web for the convenience of a summary,” noted an analyst from Ars Technica. “When 1 in 10 answers is wrong, ‘convenience’ becomes a liability.”

Advertisement

LEAVE A REPLY

Please enter your comment!
Please enter your name here