OpenAI Unveils GPT-5.5-Cyber to Patch Software Vulnerabilities at Scale

OpenAI has launched GPT-5.5-Cyber, a new AI model built to fix software flaws at scale. A “vulnerability” is a weak spot in code that hackers can use to break in. This model does more than just find those weak spots. It also writes the fix and prepares it for a human to check. OpenAI says finding bugs is good, but landing the fix is what truly protects the world.

This matters because most security tools stop at “here is the problem.” GPT-5.5-Cyber goes further. It scans code, confirms the flaw is real, and then builds a patch. A “patch” is a small code update that closes the weak spot. The goal is to stop bad code from ever reaching the live product. Let us look at what it can do and how well it scores.

What GPT-5.5-Cyber does

The model handles the full job, not just the first step. It scans a codebase, checks if a flaw is truly dangerous, and writes a patch. It then prepares evidence so a human reviewer can approve the fix quickly. In simple terms, it acts like a tireless junior security engineer who never sleeps.

OpenAI put it plainly in its announcement: “Finding vulnerabilities is important, but it’s landing the fix that protects the world.” This is the key shift. Many tools flag thousands of issues but leave humans to fix them all. GPT-5.5-Cyber tries to close that gap by doing the repair work too.

Benchmarks and specs

A “benchmark” is a standard test used to compare AI models. On CyberGym, a test for cybersecurity skills, GPT-5.5-Cyber scored higher than the regular GPT-5.5 model. OpenAI says it also beat a rival model on the same test and did better on two other security benchmarks too.

ModelCyberGym benchmark score
GPT-5.5-Cyber (new)85.6%
GPT-5.5 (standard)81.8%

What it means: a higher CyberGym score means the model is better at spotting and handling security flaws. The new model also topped GPT-5.5 on two more tests, ExploitGym and SEC-bench Pro. OpenAI says it beat a competing model named Mythos 5 on CyberGym as well.

Real-world results so far

OpenAI also shared early results from a related tool, a Codex security plugin. Since its research preview began in March, the numbers are large. The tool has already worked across thousands of code projects and fixed hundreds of thousands of issues automatically.

Codex security pluginFigure since March
Commits scannedOver 30 million
Repositories analysedMore than 30,000
Findings fixed automaticallyOver 500,000

A “commit” is a saved change to code. A “repository” is a project’s full code folder. So the tool reviewed millions of code changes across tens of thousands of projects. That scale is the whole point — humans alone cannot review that much code by hand.

Big partners and programs

OpenAI is not doing this alone. It named a “Daybreak Cyber” partner program that includes major security names like Accenture, Cisco, Cloudflare, CrowdStrike, IBM, Okta, Palo Alto Networks, SentinelOne, Sophos, Wiz, and Zscaler. These are some of the biggest firms in the field.

There is also a “Patch the Planet” effort to fix open-source software. Open-source code is free code that anyone can use, and it powers much of the internet. Partners here include Trail of Bits, HackerOne, and over 30 open-source projects such as Python, Go, and cURL. Several governments are involved too, including Australia, Canada, France, Germany, Japan, South Korea, and the EU’s cybersecurity agency. India’s own push to protect national systems, seen in our report on government agencies adopting sovereign AI, shows why this work matters worldwide.

FAQ

What is a software vulnerability?

It is a weak spot or bug in code that attackers can use to break into a system or steal data. Fixing these flaws quickly is a major part of keeping software safe.

How is GPT-5.5-Cyber different from a normal scanner?

A normal scanner only flags problems. GPT-5.5-Cyber also writes the fix and prepares it for a human to approve. It tries to handle the whole repair, not just the first warning.

Does a human still check the fixes?

Yes. The model prepares the patch and the evidence, but a human reviewer approves it. This keeps a safety check in place before any fix goes live.

Why it matters (especially for India / founders)

For Indian founders and IT teams, this is big. India runs a huge share of the world’s software and IT services. A tool that fixes flaws at scale could save time and reduce risk for these teams. It could also lower costs, since fewer engineers are needed to patch routine bugs.

There is a flip side too. As AI takes over routine patching, the value shifts to skills like reviewing fixes and handling complex threats. Indian startups in security have a chance to build tools and services around this new way of working. The deep demand for chips and memory behind such models also drives deals like Micron’s AI memory partnership with Anthropic.

The takeaway

GPT-5.5-Cyber marks a shift from just finding bugs to actually fixing them. Its benchmark scores beat the standard GPT-5.5, and early real-world use shows huge scale. With big security firms and governments on board, the push to patch the world’s code is serious. A human still has the final say, which keeps the system safe. For anyone who builds or runs software, this is a tool worth watching closely.

Sources

Related coverage