Ann Intern Med
AI ambient dictation falls short of human notes in primary care

Clinical Takeaway: Use ambient AI notes as draft documentation only; careful clinician review and editing remain essential for safe, high-quality care.
Across five standardized primary care scenarios, AI notes scored dramatically lower—by as much as 23 points on a 50-point quality scale—raising concerns about accuracy and clinical usefulness.
In a cross-sectional evaluation published in Annals of Internal Medicine, researchers compared clinical notes produced by 11 ambient AI dictation tools with those written by 18 human clinicians using the same five standardized primary care encounters. Thirty blinded raters assessed note quality with the modified Physician Documentation Quality Instrument (PDQI-9), covering 10 domains such as accuracy, organization, and usefulness.
Human-authored notes outperformed AI-generated notes in all five cases. The largest gap appeared in an acute low back pain scenario with background noise, where clinician notes averaged 43.8 out of 50 vs. 20.3 for AI. Significant differences also favored humans in chest pain (42.2 vs. 34.8) and heart failure care management (38.4 vs. 32.8). Pooled analyses showed AI notes scored lower across all 10 quality domains, with the biggest deficits in thoroughness, organization, and usefulness.
Authors emphasize caution: “AI scribes should be regarded as tools for generating draft documentation that requires review and editing, rather than a substitute for clinician-authored notes,” they conclude, underscoring the need for ongoing, vendor-neutral evaluation before widespread clinical reliance.
Source: Reddy A, et al. (2026, April 17). Ann Intern Med. Rapid Evaluation of Artificial Intelligence Technology Used for Ambient Dictation in Primary Care: Comparing the Quality of Documentation of Artificial Intelligence-Generated and Human-Produced Clinical Notes