JAMA Netw Open
Repeat psychiatric interviews often shift diagnoses within days

Clinical takeaway: Standardized diagnostic interviews remain useful but should not be treated as a clear gold standard for adult psychiatric assessment, particularly for conditions defined by subjective experience. Pair them with longitudinal clinical judgment rather than relying on a single sitting.
Structured psychiatric assessments have been positioned as the most rigorous way to classify mental and substance use disorders, designed to cut clinician bias and inconsistent application of DSM and ICD criteria. But how stable the resulting diagnoses actually are when the same interview is repeated hasn't been systematically pooled across instruments and disorders.
Overall reliability was moderate, not the near-perfect agreement implied by the gold-standard label. Agreement landed at 0.69, with considerable variation between studies and across disorders. A score of 1 would mean perfect agreement.
The interviews worked better for conditions defined by observable behavior or clear timelines than for those resting on subjective experience. Substance use disorders pooled at 0.72, with mental health disorders at 0.65.
Within mental health disorders, nonaffective psychosis was the least consistent diagnosis at 0.55 and bipolar had the most agreement at 0.74. Anxiety, depressive, and personality disorders clustered in the low to mid 0.60s. Among substance use disorders, opioid use disorder agreed most reliably at 0.81 and hallucinogen use disorder the least at 0.59.
For substance use disorders, ICD-10 and older DSM revisions agreed more reliably than DSM-5, possibly because DSM-5 collapsed prior abuse and dependence categories into a single diagnosis.
The meta-analysis pooled 46 studies and 8,146 adults across 26 countries, spanning 17 different structured interview tools. Test-retest reliability was defined as agreement when the same interview was repeated under similar conditions, typically 7 to 14 days apart. Most samples came from high-income countries.
"Our findings show that these interviews are not as reliable or consistent as many people believe," said Laura Duncan, PhD, assistant professor in the Department of Psychiatry and Behavioral Neurosciences at McMaster University and senior author of the study. "If we give the same interview to the same person twice, we would like to think the interview would produce the same result, but that's not always the case."
"These differences suggest that structured interviews work better for conditions with clearer behaviors or timelines than for disorders that rely heavily on personal experiences and interpretation," she continued. "But we should reconsider treating them as a 'gold standard' of assessment. Reliable diagnosis likely requires combining standardized tools with knowledge about the course and complexity of disorders that could impact how reliably they can be assessed."
Source: Xie W. JAMA Netw Open. 2026 May 28. Test-Retest Reliability of Standardized Diagnostic Interviews for Common Adult Psychiatric Disorders