AI in healthcare: new doctor studies show ChatGPT and other AI rival doctors — with big caveats

People are talking a lot about AI in healthcare. AI means computer programs that can think and answer in ways that feel human. In June 2026, the company OpenAI said its new ChatGPT health update gave better answers than real doctors in its own tests. ChatGPT is a popular AI chatbot — a program you can chat with by typing questions. Around the same time, a science journal called Nature shared studies where AI did as well as doctors, or better, on hard cases. That sounds amazing. But the small print matters a lot. This article explains the studies in simple words. It also shares the warnings the researchers gave, and what this means for patients — including people in India.

One word you will see is benchmark. A benchmark is just a test that scores how well an AI does a job. Think of it like an exam for software.

What did OpenAI announce?

On June 18, 2026, OpenAI shared a health upgrade for ChatGPT. It uses a system called GPT-5.5 Instant. This system is the “AI brain” that reads your question and writes the answer. OpenAI says this brain is faster and cheaper, but still as good as its most costly “Thinking” brains on health tests. It is free for all ChatGPT users, but you can only ask so many questions each day.

OpenAI says the upgrade is safer too. It says wrong health answers dropped by 71% over two months. In its tests, the new brain beat the older one (called GPT-4o). It also beat answers written by human doctors in all five scoring areas. To check the answers, OpenAI used more than 260 doctors from 60 countries. These doctors looked at over 700,000 AI answers.

One number shows why this matters. OpenAI says more than 230 million people use ChatGPT every week for health questions. People ask it to explain lab results, get ready for doctor visits, or understand their insurance.

The honest caveat

Here is the catch. These numbers come from OpenAI’s own tests. That is not the same as a fair test done by outsiders. The tests were called HealthBench and HealthBench Professional. OpenAI built these tests itself. So the results look good, but no outside group has proven them yet.

What did the Nature studies find?

The Nature research looked at two different AI systems. Both did well. But the second one comes with a surprising warning.

MIRA — for emergency cases

MIRA stands for Medical Intelligence for Reasoning and Action. Researchers in Germany built it, at TUD Dresden and Heidelberg University. Its job is to find out what is wrong with emergency patients. To do this, it can choose from over 85,000 options across eleven tools. For example, it can order the right scan or test.

MIRA got the right answer in 88.9% of more than 500 emergency cases. In a direct test on 311 cases, MIRA scored 87.8%. Specialist doctors scored 78.1%. A mix of junior and senior doctors scored 71.1%. Best of all, MIRA never missed a patient who really needed to go into hospital.

AMIE — for managing patients over time

AMIE is Google’s system. It helps care for patients across many visits, not just one. In the study, AMIE’s first-visit plans were judged good in 95% of cases. Human doctors scored 72%. AMIE was as good as doctors on choosing treatments. It beat them on making correct plans and on following medical rules. Specialist doctors, and the actors who played the patients, often liked AMIE more than the human doctors.

Key facts at a glance

System / claim	AI score	Doctors	Source
MIRA, 311 emergency cases	87.8%	78.1% specialists / 71.1% mixed	Nature study
MIRA, 500+ cases overall	88.9% correct	—	Nature study
AMIE, first-visit plan appropriate	95%	72%	Nature study
ChatGPT instruction-following	up to 89.9%	—	OpenAI (HealthBench)
ChatGPT wrong-statement drop	down 71% in 2 months	—	OpenAI
Doctors who reviewed ChatGPT	260+ from 60 countries	—	OpenAI

The catch: the tech may not age well

Here is the most interesting finding. Systems like AMIE add extra software around the AI brain. This extra layer is called scaffolding. Think of it as a support frame that helps the AI act like a careful doctor.

This scaffolding helped a lot with an older AI brain (Gemini 1.5 Flash). But when researchers used a newer brain (Gemini 2.5 Flash), the help “almost vanished.” On drug-knowledge tests, newer brains like o3 and GPT-5 already did well on their own. In simple words: as the basic AI gets smarter, the special medical add-ons may stop being useful. They risk becoming “dead weight.” A clever system built today could be out of date fast.

The researchers were honest about other limits too. MIRA still gave care that “deviated from best practices” in a small number of cases — small, but not zero. And outside experts made one key point. These were all simulations. A simulation is a pretend test, not real life. So they are far from the messy, complex world of real, everyday healthcare.

Why it matters (especially for India / founders)

India has far fewer doctors per person than rich countries. Many people also live far from good hospitals. So tools that help sort patients by need, or explain lab reports, could be very useful here. They could help tired, busy doctors — not replace them.

For health-tech founders (people who start health technology companies), the aging warning is the real lesson. If your product is just a thin layer on top of an AI brain, a future brain could wipe out your edge overnight. Lasting value comes from things AI cannot easily copy. These include trusted data, support in local languages, partnerships with doctors, and safe use. Investors (people who put money into companies) are watching this area closely — see related coverage in our roundup on HealthQuad’s healthcare fund.

For patients, the simple rule still holds. AI can help you understand and prepare. But it is not your doctor. Always check anything serious with a trained professional.

FAQ

Can ChatGPT replace my doctor now?

No. The high scores come from controlled tests and OpenAI’s own benchmarks (its own exams for the AI). Real care is messier. Use AI to learn and prepare. Then see a real doctor for diagnosis and treatment.

Did AI really beat doctors in these studies?

On certain tasks, yes. MIRA and AMIE scored higher than doctors on some sets of cases. But experts warn these were simulations (pretend tests), not real patients. So the results may not hold up in the real world.

What does “the tech won’t age well” mean?

Special medical add-ons help today’s AI a lot. But as the main AI gets smarter on its own, those add-ons may stop adding value. They could even become useless extra work.

Takeaway

AI in healthcare is moving fast, and the early scores are truly impressive. But the headlines are ahead of the proof. OpenAI’s numbers come from its own tests. The Nature wins came in simulations. And today’s smartest systems may not stay smart for long. The promise is real. The hype needs a check-up.

Sources: The Decoder — ChatGPT’s new health upgrade and The Decoder — AI systems rival doctors in new Nature studies.

AI in Healthcare: New Doctor Studies Show ChatGPT, MIRA and AMIE Rival Doctors — With Big Caveats

AI in healthcare: new doctor studies show ChatGPT and other AI rival doctors — with big caveats

What did OpenAI announce?

The honest caveat

What did the Nature studies find?

MIRA — for emergency cases

AMIE — for managing patients over time

Key facts at a glance

The catch: the tech may not age well

Why it matters (especially for India / founders)

FAQ

Can ChatGPT replace my doctor now?

Did AI really beat doctors in these studies?

What does “the tech won’t age well” mean?

Takeaway

Related coverage

Leave a Comment Cancel reply

AI in healthcare: new doctor studies show ChatGPT and other AI rival doctors — with big caveats

What did OpenAI announce?

The honest caveat

What did the Nature studies find?

MIRA — for emergency cases

AMIE — for managing patients over time

Key facts at a glance

The catch: the tech may not age well

Why it matters (especially for India / founders)

FAQ

Can ChatGPT replace my doctor now?

Did AI really beat doctors in these studies?

What does “the tech won’t age well” mean?

Takeaway

Related coverage

Related Stories

Waymo Robotaxi Recall: 3,900 Driverless Cars Fixed Over Freeway Risk

StockGro’s Stoxo AI Model Aims to Simplify Trading Decisions

More People Get News From AI Chatbots, But Trust Stays Low

Leave a Comment Cancel reply