{"id":217637,"date":"2026-04-16T11:30:23","date_gmt":"2026-04-16T11:30:23","guid":{"rendered":"https:\/\/teknomers.com\/en\/21-popular-ai-chatbots-tested-for-differential-diagnosis-they-perform-worse-than-expected\/"},"modified":"2026-04-16T11:30:24","modified_gmt":"2026-04-16T11:30:24","slug":"21-popular-ai-chatbots-tested-for-differential-diagnosis-they-perform-worse-than-expected","status":"publish","type":"post","link":"https:\/\/teknomers.com\/en\/21-popular-ai-chatbots-tested-for-differential-diagnosis-they-perform-worse-than-expected\/","title":{"rendered":"21 Popular AI Chatbots Tested for Differential Diagnosis: They Perform Worse Than Expected"},"content":{"rendered":"\n<div>\n<p>&#8216;House&#8217; is a series that I love. I don&#8217;t care about the intrastories in the slightest, but the process of differential diagnosis &#8211; despite all the movie stuff &#8211; drives me crazy. This ability to rule out diseases that could explain the same symptoms to arrive at the most probable diagnosis seems like witchcraft to me. Well: they have put the 21 most popular AI chatbots to perform this differential diagnosis and the result is clear.<\/p>\n<p>It fails more than a fairground shotgun.<\/p>\n<h2>AI Chatbots in Differential Diagnosis<\/h2>\n<p><strong>In short:<\/strong> Mass General Brigham is not &#8216;anyone&#8217;. It is a non-profit network of American doctors and hospitals, including two of the most prestigious medical teaching institutions in the country. From January to December 2025, a group of researchers from the institution <a rel=\"noopener, noreferrer nofollow\" href=\"https:\/\/jamanetwork.com\/journals\/jamanetworkopen\/fullarticle\/2847679?guestAccessKey=ef7d2c87-beca-4c8b-b6a2-85d11bcf8185&amp;utm_source=for_the_media&amp;utm_medium=referral&amp;utm_campaign=ftm_links&amp;utm_content=tfl&amp;utm_term=041326\" target=\"_blank\">conducted<\/a> an evaluation of 21 AI chatbots such as Claude 4.5 Opus, DeepSeek, Gemini 3.0 Pro, GPT-5, and Grok 4 to determine their efficacy in early diagnosis through differential diagnosis.<\/p>\n<h2>Basic Information Challenges<\/h2>\n<p>The information provided to these models was extremely basic, mirroring what human professionals deal with when forming a differential diagnosis. The ultimate goal was to evaluate the clinical reasoning abilities of these advanced language models. Unfortunately, the answer remains a resounding no. While certain models optimized for reasoning scored higher than simpler ones like Gemini 1.5 Flash, the bottom line reveals that large language models (LLMs) still face significant limitations in this context.<\/p>\n<h3>Evaluation of Efficacy<\/h3>\n<p><strong>The exam:<\/strong> Each model was given 29 clinical cases, amounting to over 16,200 responses in total. The results showed that these newer, advanced chatbots <a rel=\"noopener, noreferrer nofollow\" href=\"https:\/\/www.massgeneralbrigham.org\/en\/about\/newsroom\/press-releases\/ai-chatbot-lacks-clinical-reasoning\" target=\"_blank\">failed<\/a> to produce an adequate differential diagnosis in about 80% of cases, especially when only basic patient information was available.<\/p>\n<h2>The Limitations of Initial Diagnosis<\/h2>\n<p>The fundamental data\u2014age, sex, and symptoms\u2014though vague, is what human professionals rely on initially. As tests are conducted and more data accumulated, they refine their findings. It\u2019s that crucial first &#8216;discard&#8217; step that often makes the difference.<\/p>\n<h3>Performance with More Data<\/h3>\n<p>As the LLMs were provided with more comprehensive data, their performance greatly improved. When presented with details such as physical analyses, lab results, and diagnostic images, AI could reach a final diagnosis in over 90% of cases. However, this illustrates the gap in their capabilities when tasked with initial filtering.<\/p>\n<h2>Trust and Safety Concerns in AI<\/h2>\n<p><strong>Don&#8217;t trust<\/strong> Google ChatGPT. Researchers emphasize that while commercial LLMs can accurately identify a final diagnosis with complete data, they struggle significantly at the onset of open cases. This raises questions regarding their reliability for home care. Although the AI industry continues to promote these tools in healthcare, the study highlights that \u201cdespite ongoing improvements, commercial LLMs are not ready for unsupervised clinical implementation.\u201d<\/p>\n<h3>The Human Element in Healthcare<\/h3>\n<p>Human oversight is essential, as \u201cvery close supervision\u201d is required to safely integrate an LLM into healthcare settings. Cases are increasingly emerging of individuals opting for ChatGPT over traditional self-diagnosis methods. The persistence of \u201challucinations\u201d in these advanced models raises alarming concerns regarding patient safety and integrity.<\/p>\n<h2>AI as a Tool Rather Than a Replacement<\/h2>\n<p>In any case, medical AI serves as just another tool rather than a complete solution. The models tested benefitted from common knowledge but lack specialization. AI can assist with tasks like eliminating possibilities or sorting copious data, but it\u2019s not yet a reliable partner in differential diagnosis.<\/p>\n<h3>El Salvador&#8217;s AI Experiment<\/h3>\n<p>In a bold move, El Salvador\u2019s government, led by President Nayib Bukele, has announced a $500 million initiative to automate healthcare using Gemini. Citizens will access the app <a rel=\"noopener, noreferrer nofollow\" href=\"http:\/\/Dr.SV\" target=\"_blank\">Dr.SV<\/a>, which will act as a family doctor, assigning real doctors for diagnosis and monitoring chronic conditions. Despite aspirations for creating an optimal healthcare system, it\u2019s worth noting that over 7,700 employees were laid off from the health sector in 2025.<\/p>\n<p>For the sake of Salvadorans, let\u2019s hope this experiment does not lead to dire consequences, reminiscent of prior technological misadventures.<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/teknomers.com\/category\/general\/\" rel=\"dofollow\">General News &#8211; 2<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8216;House&#8217; is a series that I love. I don&#8217;t care about the intrastories in the slightest, but the process of differential diagnosis &#8211; despite all the movie stuff &#8211; drives me crazy. This ability to rule out diseases that could explain the same symptoms to arrive at the most probable diagnosis seems like witchcraft to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":217638,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36399],"tags":[37133,10989,11152,1593,11645,4679,4441,1565],"class_list":["post-217637","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-chatbots","tag-diagnosis","tag-differential","tag-expected","tag-perform","tag-popular","tag-tested","tag-worse"],"_links":{"self":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts\/217637","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/comments?post=217637"}],"version-history":[{"count":1,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts\/217637\/revisions"}],"predecessor-version":[{"id":217639,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts\/217637\/revisions\/217639"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/media\/217638"}],"wp:attachment":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/media?parent=217637"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/categories?post=217637"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/tags?post=217637"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}