In a paradoxical blow to AI safety, a January 2026 report by media watchdog NewsGuard has revealed that ChatGPT (running GPT-5.2) failed to identify 92.5% of AI-generated videos created by its own sister-tool, Sora.
The study highlights a critical “blind spot” in the AI ecosystem: while OpenAI’s video generation capabilities have reached hyper-realistic levels, its flagship chatbot remains largely incapable of verifying that very same content once metadata or watermarks are removed.

The NewsGuard Experiment: Chatbots vs. Deepfakes
NewsGuard analysts tested leading AI assistants by uploading 20 videos generated by Sora 2 that advanced “provably false” narratives—including fake news reports and staged political incidents.
The Failure Rates (Non-Watermarked Videos)
When the visible “Sora” watermark was removed using free third-party tools, the chatbots struggled to distinguish fabrication from reality:
| AI Assistant | Failure Rate (Jan 2026) | The Verdict |
| xAI Grok | 95.0% | Most likely to treat AI video as real news. |
| OpenAI ChatGPT | 92.5% | Fails to recognize its own parent company’s output. |
| Google Gemini | 78.0% | Performed best due to SynthID integration. |
Why Does ChatGPT Fail to Spot Sora?
1. Watermark Fragility
While Sora adds visible watermarks and C2PA metadata, the study found these are “fragile safeguards.” Simple “Save As” operations or free online watermark removers strip these markers instantly. Once stripped, ChatGPT treats the video as a standard file upload without forensic analysis.
2. Lack of Native Detection Capabilities
OpenAI’s Head of Products and Applications Communications, Niko Felix, confirmed in a statement to NewsGuard that “ChatGPT does not have the ability to determine whether content is AI-generated.” The model is trained to process and describe content, not to act as a forensic authenticator.
3. The “Confidently Wrong” Problem
Perhaps more alarming than the failure to detect is the tendency to hallucinate confirmation. In the study:
- ChatGPT and Gemini both failed to recognize a fake video of an ICE agent.
- Instead of admitting uncertainty, the chatbots claimed that “news sources confirmed the event,” actually strengthening the misinformation.
The Fragmented State of AI Verification
The study noted that Google Gemini had a lower failure rate because of SynthID, a tool that embeds “imperceptible” watermarks. However, this only works for Google’s own content (like Nano Banana Pro images). Gemini remains nearly as ineffective as ChatGPT when faced with third-party deepfakes from Sora or Kling.
| Challenge | Current Status |
| Cross-Platform Detection | Non-existent; Proprietary “SynthID” or “Sora” tags don’t talk to each other. |
| Transparency | ChatGPT only disclosed its inability to detect AI in 2.5% of the tests. |
| Disinformation Risk | Bad actors are using Sora to create “high-resolution slop” that passes AI fact-checks. |
Conclusion: A Trust Deficit in 2026
The fact that OpenAI sells a tool to create “deceptively realistic” videos while offering a chatbot that can’t recognize them creates a significant trust deficit. As Sora 2 nears its one-year anniversary in February, the lack of a reliable “check and balance” system within the OpenAI ecosystem remains a major hurdle for regulators and misinformation experts alike.