Nat Med

ChatGPT under-triages high‑risk emergencies in structured evaluation

March 5, 2026

In a stress test of ChatGPT Health, investigators evaluated 960 triage responses spanning 21 clinical domains. Performance followed an inverted U‑shape: the system was most error‑prone in non‑urgent (35%) and emergency scenarios (48%). Among gold‑standard emergencies, it under‑triaged 52%—including diabetic ketoacidosis and emerging respiratory failure—while correctly identifying classic emergencies such as stroke and anaphylaxis. Anchoring bias from minimized symptoms markedly shifted recommendations toward less urgent care (odds ratio, 11.7). Crisis‑intervention messaging also triggered inconsistently across suicidal ideation cases. No significant effects were observed for race, gender, or access barriers. Authors call for prospective validation before broad consumer use.

Clinical takeaway: AI triage tools may miss high‑risk emergencies and show bias‑susceptible variability; clinicians should caution patients against relying on them for urgent decision‑making.

Source:

Ramaswamy A, et al. (2026, February 23). Nat Med. ChatGPT Health performance in a structured test of triage recommendations. https://pubmed.ncbi.nlm.nih.gov/41731097/

ChatGPT under-triages high‑risk emergencies in structured evaluation

EPOCRATES CME

Clinical FAQs