
A sweeping meta-analysis of 15 studies found patients consistently rate AI chatbots as warmer and more empathic than real clinicians.
In the largest dataset of 2,164 live patient interactions, and across multiple smaller studies, tools like ChatGPT edged out doctors and nurses by roughly two points on 10-point empathy scales.
Overall, artifical intelligence had a 73% chance of being judged more empathic in head-to-head comparisons.
“In text-only scenarios, AI chatbots are frequently perceived as more empathic than human HCPs,” noted the researchers.
The work, led by teams at the Universities of Nottingham and Leicester, undercuts a confident 2019 UK government assertion that empathy is an essential human skill that AI cannot replicate. In text, at least, it looks like AI can.
Across nine separate studies spanning cancer care, thyroid disease, mental health, autism, and general medical questions, ChatGPT-4 routinely outscored licensed clinicians.
On thyroid surgery questions, the AI’s empathy ratings sat 1.42 standard deviations above human surgeons. On mental health queries, 0.97 standard deviations higher than credentialed professionals.
When responding to patient complaints routed through hospital departments, the gap widened dramatically: 2.08 standard deviations in favor of the AI over patient-relations staff.
Crucially, this wasn’t just a patient-only effect. In a lupus set, physicians themselves rated the AI’s tone as more empathic than their peers’ responses to the same questions.
For multiple sclerosis, trained patient representatives using a validated empathy instrument also favored the AI over neurologists.
Dermatology was the outlier. In two studies focused on skin complaints, dermatologists beat ChatGPT-3.5 and Med-PaLM 2 on empathy. Researchers didn’t pinpoint why this specialty bucked the trend, but it’s a useful reminder that the effect isn’t universal.
Every study in the review assessed text-only interactions. Even when one group generated audio from AI text, raters scored the written transcript, not the voice.
In real clinics, a doctor’s nod, eye contact, or silence can communicate care as powerfully as phrasing does. So, these results tell us about written bedside manner, not the full tapestry of human presence.
Most studies also relied on proxy raters, such as doctors, medical students or patient reps, not the patients who would eventually act on the advice.
And while many used 1–5 or 1–10 empathy scales, only one employed the validated CARE tool designed specifically for therapeutic empathy. Thus, what the experts measured was perceived empathy rather than clinical impact.
The authors are clear on that point. Empathic wording correlates with lower pain and anxiety, better adherence, and higher satisfaction. Yet this review didn’t test whether AI’s “empathy edge” improves outcomes. It simply shows readers prefer its tone.
Part of the AI advantage is structural. Large language models are trained on oceans of human conversation and can default to patient-centered phrasing: validating feelings, summarizing concerns, and offering clear next steps.
They don’t get rushed, burned out, or defensive, and they can apply best practice wording consistently.
Meanwhile, clinicians answering inbox messages are juggling time pressure, medicolegal caution, and the messy context of a person’s chart.
With that being said, empathic delivery means little if the facts are wrong. Hallucinations, omissions, or outdated guidance can evaporate any goodwill. The meta-analysis treats empathy ratings and accuracy as separate questions. Both matter.
Rather than replacing clinicians, the authors argue for a collaborative workflow: doctors write the core medical advice while AI polishes tone, adds clear, validating phrasing, and anticipates common fears. Then, clinicians review this data and send it to patients.
This could lighten inbox burden, reduce terse replies that sour relationships, and lift patient satisfaction without sacrificing accuracy.
Such approaches are already creeping into daily practice. About 20% of UK GPs report using generative AI for tasks like patient letters, and NHS mental health services have deployed AI companions (Wysa alone reports interactions with 117,000+ patients).
Voice is the next frontier: telephone consults account for 26% of UK GP appointments, and voice-enabled AI promises to capture emotional nuance.
But the review notes a blank spot – no head-to-head voice studies yet. If AI’s empathy advantage survives the leap from text to speech, the implications for triage lines and virtual follow-ups could be huge.
Researchers pulled studies from seven databases (through November 2024), including real patient emails, Reddit posts, clinic chat transcripts, and in-person reception interactions.
Fourteen studies tested ChatGPT variants (3.5 or 4), and others included Claude, Gemini Pro, Le Chat, ERNIE Bot, and Med-PaLM 2.
Risk of bias was moderate in nine studies and serious in six, with common issues like curated question sets and supervised AI outputs. Still, the pattern – AI rated as warmer, clearer, more validating – showed up across institutions, specialties, and rater types.
Online, words can do a lot. In that narrow but growing slice of medicine – portal messages, email follow-ups, FAQ explainers – AI already writes the way many patients wish all clinicians did: slower, gentler, more explicit about emotions and next steps.
The opportunity isn’t to choose AI instead of clinicians, but to let it coach our written bedside manner while clinicians safeguard judgment, nuance, and truth.
The challenge now is to bring that warmth into care without losing what only humans can do, while testing whether kinder words, from any source, lead to healthier lives.
The study is published in the British Medical Bulletin.
—–
Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.
—–
