epocrates logo
epocrates logo
epocrates logo
  • 0

Journal Article Synopsis

JAMA Netw Open

LLMs struggle with clinical reasoning, despite strong final answers

April 14, 2026

card-image

Clinical takeaway: Large language models are not reliable for unsupervised clinical decision-making. Weak performance in differential diagnosis and uncertainty handling limits safe use beyond narrow, supervised tasks.

AI tools are increasingly marketed for clinical use, often highlighting high accuracy. This study tested whether those claims hold up across the full clinical workflow.

In an evaluation of 21 large language models (LLMs) using standardized clinical cases, performance varied by task. Models performed relatively well on final diagnosis and management but consistently struggled with differential diagnosis, where failure rates exceeded 80% across all models.

This gap is clinically important. Generating a differential diagnosis requires managing uncertainty and iteratively refining possibilities, which are core elements of clinical reasoning. Instead, models tended to collapse prematurely to a single answer, bypassing the diagnostic process clinicians rely on.

Overall accuracy appeared high (roughly 80%–90%) but masked these weaknesses. A more comprehensive scoring method showed wider variation and exposed gaps in reasoning that standard benchmarks miss. The analysis covered January through December 2025.

Even newer “reasoning-optimized” models performed better than earlier versions but did not resolve these core limitations. Improvements were incremental, not transformative.

The findings highlight a mismatch between how models are evaluated and how clinicians think. High performance on isolated tasks does not translate to reliable decision-making across a patient encounter.

“Off-the-shelf LLMs have not yet achieved the intelligence required for safe deployment and remain limited in demonstrating advanced clinical reasoning,” the authors conclude.

Source: Rao AS. JAMA Netw Open. 2026 Apr 13. Large language model performance and clinical reasoning tasks

Trending icon

TRENDING THIS WEEK

EPOCRATES CME

View Catalog

view all CME activities
learn more about epocrates plus
Clinical FAQ icon

Clinical FAQs

Check out the answers to frequently asked questions about our clinical content.

Download Epocrates from the App StoreDownload Epocrates from the Play Store
About UsFeaturesBusiness SolutionsHelp & Feedback
© 2026 epocrates, Inc.   Terms of UsePrivacy PolicyEditorial PolicyDo Not Sell or Share My Information