NIH
Patient narratives challenge AI’s diagnostic capabilities
August 15, 2024

NIH researchers found that while large language models (LLMs) like ChatGPT can diagnose rare genetic diseases from textbook descriptions accurately, they struggle with patient-written summaries. By testing 10 LLMs, they found that the best model was only 21% accurate with real patient descriptions, with some models scoring as low as 1%. However, accuracies improved when LLMs were presented with standardized questions about the same conditions. These findings highlight the limitations of LLMs and the need for human oversight in AI healthcare applications.
TRENDING THIS WEEK