In real-world test, an AI model did better than doctors at diagnosing patients

Posted by cuolong

3 Comments

  1. This is relevant because one of the hardest part of the medical profession is differential diagnosis in highly complex cases. If AI can be used to significantly improve patient care, this could be key in both reducing costs and raising standard of care across the world.

    In the head-to-head comparison, the AI demonstrated superior diagnostic precision across every phase of patient care. During the initial interview stage, o1 correctly identified conditions in 67.1% of cases—roughly 7 out of 10 patients—while two human specialists trailed behind at 55.3% and 50%.

    As more clinical data became available, the performance gap widened. When integrated with physician evaluation data, the model’s accuracy climbed to 72.4%. By the critical final stage—determining the necessity for hospitalization or ICU admission—the AI reached an 81.6% accuracy rate, consistently outpacing human counterparts in high-stakes decision-making.

    >Researchers based at Harvard Medical School and Beth Israel Deaconess Medical Center found that an AI reasoning model, developed by OpenAI, excelled at diagnosing patients and making decisions about managing their care. It matched and often outperformed doctors and the earlier AI model, GPT-4.

    Also of note, researchers tested o1-preview, a nearly one-year old reasoning model from OpenAI at this point. I fully expect there will come out a medically specialized LLM, similar to what Opus is for coding, that will be truly transformative.

    Let’s just say The Pitt season 3 might just 12 hours of Dr Robby sitting at a computer reading 20 pages of AI-generated diagnosises.

  2. I’d bet AI is better on average, but is more likely to make a huge error. I wouldn’t mind my doctor asking AI and then using his own human experience to check it

  3. I bet I won’t need to book months in advance to go talk to an AI for ten minutes, either.

Leave A Reply