Article image

AI beats doctors in clinical reasoning but is more error-prone

In a recent study, experts at the Beth Israel Deaconess Medical Center (BIDMC) demonstrated that artificial intelligence (AI), via the ChatGPT-4 program, can outperform attending physicians in clinical reasoning – a core aspect of medical practice.

The study pits the advanced capabilities of a large language model (LLM) against the trained expertise of medical professionals, revealing the potential and limitations of AI in healthcare.

AI’s clinical reasoning against humans

Dr. Adam Rodman, a seasoned internal medicine physician at BIDMC, and his team led the investigation, embarking on a journey to compare the clinical reasoning abilities of AI with those of humans.

The researchers used a validated assessment tool known as the revised-IDEA (r-IDEA) score, designed to measure the clinical reasoning skills of physicians.

The study engaged 21 attending physicians and 18 residents from two academic medical centers. Each participant was tasked with tackling one of 20 clinical cases, which unfolded over four stages of diagnostic reasoning.

In parallel, the AI (ChatGPT-4) underwent the same tests, demonstrating its prowess in medical data processing and clinical reasoning.

Diagnostic reasoning

Dr. Stephanie Cabral, who is currently in her third year of internal medicine residency at BIDMC and also serves as the study’s lead author, offered an in-depth explanation of the diagnostic reasoning process.

Dr. Cabral detailed the first step of the diagnostic journey: triage data collection. Here, vital information is gathered directly from the patient. In the next phase, system review, additional details from the patient are compiled.

Then, the process is followed by a thorough physical examination to observe clinical signs. Ultimately, the process reaches its culmination with diagnostic testing and imaging, which are essential for confirming the initial hypotheses.

This approach to diagnosis is meticulously structured, highlighting the process’s intricate complexity. It showcases the significant depth of clinical reasoning that is indispensable in the field of medicine.

Clinical reasoning scores

The results were telling. AI achieved the highest clinical reasoning scores, indicating superior clinical reasoning capabilities. It earned a median score of 10 out of 10. In comparison, attending physicians scored 9, and residents scored 8.

However, the study also highlighted a crucial caveat. While the AI demonstrated exceptional reasoning skills, it was significantly more prone to errors. These errors, or “just plain wrong” instances, occurred more frequently than with its human counterparts.

Study significance

This finding serves as a reminder of the dual nature of AI in clinical reasoning and overall medicine. On one hand, it holds the promise of augmenting human expertise, potentially acting as a valuable checkpoint to ensure no detail is missed in the diagnostic process.

On the other, its susceptibility to errors underscores the irreplaceable value of human judgment and the nuanced understanding of medicine that professionals bring to patient care.

The study’s authors, including notable contributors from Massachusetts General Hospital and Brigham and Women’s Hospital, emphasize the importance of further research to integrate AI seamlessly into clinical practice.

They envision a future where AI not only enhances the efficiency of healthcare delivery but also enriches the patient-physician interaction by allowing more focus on direct communication and care.

A vision for improved healthcare

Dr. Rodman expressed optimism about the role of AI in healthcare. “Early studies suggested AI could make diagnoses with all the information provided. What our study shows is that AI demonstrates real reasoning – maybe even better reasoning through multiple steps of the process,” said Dr. Rodman.

“We have a unique chance to improve the quality and experience of healthcare for patients.”

Harvard Catalyst, The Harvard Clinical and Translational Science Center, along with Harvard University and its affiliated academic healthcare centers, actively support this pioneering work through financial contributions.

AI-assisted healthcare

In conclusion, the study marks a significant step forward in understanding AI’s potential role in healthcare. By demonstrating AI’s ability to outperform humans in clinical reasoning, the research highlights an important facet. At the same time, it underscores AI’s limitations.

Together, these insights lay the groundwork for a future. In this future, AI and human expertise work in tandem to deliver superior healthcare outcomes.

The study is published in the journal JAMA Internal Medicine.


Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates. 

Check us out on EarthSnap, a free app brought to you by Eric Ralls and


News coming your way
The biggest news about our planet delivered to you each day