OpenAI Says GPT-5.5 Cyber Beats Anthropic’s Mythos on a Security Benchmark
OpenAI has made a new AI model. It is called GPT-5.5-Cyber. An AI model is a computer program that learns to do a task. This one is built to find and fix safety holes in software. OpenAI says it did better than a rival model from Anthropic, called Mythos 5. They tested both on a benchmark. A benchmark is a standard test used to compare AI models on the same job.
The news came out on June 23, 2026. It was reported by a tech site called The Decoder. OpenAI is the company that made ChatGPT. Anthropic is the company that made the Claude AI models. Both want to build AI that can guard computers from hackers. This new model pushes that race forward.
What is GPT-5.5-Cyber?
GPT-5.5-Cyber is a special version of OpenAI’s GPT-5.5 model. It is trained for one main job: cybersecurity. Cybersecurity means keeping computers and data safe from attacks. The model can find weak spots in code. It can suggest fixes, called patches. It can also check that those fixes really work.
A weak spot in code is called a vulnerability. Hackers look for these gaps to break in. OpenAI says GPT-5.5-Cyber can do the whole job. It finds the gap. It writes a patch, which is a small code fix. Then it tests the patch to make sure it holds.
How the benchmark scores compare
OpenAI tested the model on three security tests. The main one is called CyberGym. It checks if an AI can copy known software flaws inside a safe test space. On CyberGym, GPT-5.5-Cyber scored 85.6 percent. Anthropic’s Mythos 5 scored 83.8 percent. So OpenAI’s model won by about two points.
The other two tests are ExploitGym and SEC-bench Pro. ExploitGym checks if the AI can turn a weak spot into a real working attack. This shows defenders how risky the gap is. SEC-bench Pro checks if the AI can find brand new flaws over a longer time. OpenAI only shared Mythos 5’s score for CyberGym. So we cannot compare the two on the other two tests.
Key facts
| Item | Detail |
|---|---|
| Model | GPT-5.5-Cyber (by OpenAI) |
| Main rival | Mythos 5 (by Anthropic) |
| Announced | June 23, 2026 |
| CyberGym score | 85.6% (vs Mythos 5 at 83.8%) |
| ExploitGym score | 39.5% |
| SEC-bench Pro score | 69.8% |
| Commits scanned | Over 30 million across 30,000+ codebases |
| Findings flagged fixed | Over 500,000 (70,000 manually confirmed) |
| Who can access | Verified defenders only |
Benchmarks and specs: GPT-5.5-Cyber vs rivals
Here are the scores across all three tests. Only the numbers OpenAI shared are listed. A dash means no score was given for that model. Higher percentages are better.
| Model | CyberGym | ExploitGym | SEC-bench Pro |
|---|---|---|---|
| GPT-5.5-Cyber | 85.6% | 39.5% | 69.8% |
| Mythos 5 (Anthropic) | 83.8% | – | – |
| GPT-5.5 | 81.8% | 25.95% | 63.1% |
| GPT-5.4 | 79.0% | – | – |
| Claude Opus 4 (Anthropic) | 73.1% | – | – |
What it means: GPT-5.5-Cyber wins on the one test where every model has a score. It also clearly beats OpenAI’s own older models. But its lead over Mythos 5 is small. So the two top models are very close.
Built for defenders, not attackers
A tool that finds security holes could be used by bad people too. So OpenAI lets only “verified defenders” use it. That means the company checks who you are first. Access comes with checks, watching, and guardrails. Guardrails are safety limits that block harmful use.
OpenAI also updated a tool called Codex Security. It first came out as a preview in March. Since then, OpenAI says it has scanned over 30 million commits. A commit is a single saved change to a software project. These came from more than 30,000 codebases. The tool marked over 500,000 findings as fixed. People checked 70,000 of them by hand.
Big partners and a global push
OpenAI says it works with more than 25 security firms. These include big names like Cisco, CrowdStrike, Cloudflare, Palo Alto Networks, and IBM. It also works with many governments. These are Australia, Canada, France, Germany, Japan, South Korea, the UK, and the EU’s cyber agency, ENISA.
There is also a plan called “Patch the Planet.” It works with more than 30 open-source projects. Open-source projects are software whose code is free for anyone to see and use. Many apps are built on top of this shared code. So fixing flaws here helps protect lots of other apps too.
FAQ
What is GPT-5.5-Cyber?
It is a version of OpenAI’s GPT-5.5 model made for cybersecurity. It finds weak spots in software, writes fixes, and checks that the fixes work.
How did it score against Anthropic’s Mythos 5?
On the CyberGym test, GPT-5.5-Cyber scored 85.6 percent. Mythos 5 scored 83.8 percent. OpenAI did not share Mythos 5 scores for the other two tests.
Can anyone use it?
No. OpenAI says only “verified defenders” can use it. The company checks who you are. It also adds watching and safety limits before it lets you in.
What are CyberGym, ExploitGym, and SEC-bench Pro?
They are three security tests. CyberGym checks if the AI can copy known flaws. ExploitGym checks if it can build a working attack. SEC-bench Pro checks if it can find new flaws over time.
Why it matters (especially for India / founders)
India does a huge share of the world’s software work. Many startups and IT firms here write code every day. AI tools that scan code for security holes could save these teams a lot of time and money. A small team could check its software without hiring a big security staff.
For founders, there is also a lesson about trust. OpenAI keeps this power behind strict checks. AI is getting better at both attack and defense. The firms that build safety in from day one will earn customer trust. That is true whether you build apps, run a fintech, or sell to big companies.
The main point is simple. The AI race is no longer just about chatbots. It is now about who can best defend the digital world. OpenAI says it leads on one big test. But Anthropic is close behind. For users and businesses, stronger and safer security tools are the real prize.
Source: The Decoder