A major report published yesterday by The New York Times, in collaboration with AI startup Oumi, has confirmed that Google’s AI Overviews currently carry an error rate of approximately 9% to 10%.
The study used the SimpleQA benchmark—a rigorous test of over 4,000 verifiable factual questions—to measure the accuracy of the summaries that now appear at the top of nearly 50% of all Google searches.
1. The Accuracy Gap: Gemini 2.5 vs. Gemini 3
The report highlights that while Google’s accuracy is improving with newer models, the “confidently wrong” nature of AI still poses a massive scale problem.
| Model Version | Accuracy Rate | Error Rate | Status |
| Gemini 2.5 | 85% | 15% | 2025 Standard |
| Gemini 3 | 91% | 9% | Current (April 2026) |
While a 91% accuracy rate sounds impressive, analysts warn that with Google handling over 5 trillion searches per year, even a 9% error rate results in tens of millions of incorrect answers every single day.
2. High-Profile “Hallucinations”
The NYT and Oumi analysis identified several specific instances where AI Overviews provided false information as absolute fact:
- Bob Marley Museum: When asked when Marley’s home became a museum, Google answered 1987. The correct year is 1986. The AI cited multiple sources, but none supported the 1987 claim.
- Yo-Yo Ma: For a query about his induction into the “Classical Music Hall of Fame,” Google linked to the organization’s site but claimed the hall of fame did not exist.
- Dick Drago: The AI correctly stated the baseball player’s age at death but provided a completely fabricated date of death.
3. The “Ungrounded” Citation Problem
Perhaps more troubling than the flat-out lies is the rise of “ungrounded” responses.
- Definition: These are answers that are factually correct but link to sources that do not contain the information or actually contradict the answer.
- The Trend: In October 2025, 37% of correct AI answers were ungrounded. By February 2026, that figure rose to 56%. This makes it nearly impossible for users to verify the information without doing a manual search.
4. Google’s Rebuttal: “Flawed Benchmarks”
Google spokesperson Ned Adriance has pushed back against the findings, stating that the study has “serious holes” and does not reflect how people actually use Search.
- Flawed Data: Google claims the SimpleQA benchmark itself contains inaccurate data and that they use a more strictly validated internal tool called “SimpleQA Verified.”
- Dynamic Results: Google notes that AI Overviews use different models (some smaller/faster, some larger/smarter) depending on the query, making a single accuracy score misleading.
- Safety First: The company maintains that its safety systems filter out the most dangerous hallucinations, particularly in “Your Money or Your Life” (YMYL) categories like health and finance.
5. Why It Matters: The “Zero-Click” Risk
Critics argue that the 10% error rate is more dangerous now than in previous years because AI Overviews have led to a “Zero-Click” search culture.
- Trust Factor: Unlike the old “ten blue links” where users compared sources, AI Overviews present a single, authoritative summary.
- Organic Drop: Organic click-through rates (CTR) have plummeted by 61% for queries featuring an AI Overview. If the summary is wrong, users are 60% less likely to click a link and discover the truth.
“Google is trading the reliability of the open web for the convenience of a summary,” noted an analyst from Ars Technica. “When 1 in 10 answers is wrong, ‘convenience’ becomes a liability.”
