{"id":208604,"date":"2026-03-09T10:43:25","date_gmt":"2026-03-09T10:43:25","guid":{"rendered":"https:\/\/teknomers.com\/en\/openai-aims-to-create-the-perfect-gp-with-chatgpt-but-its-incorrect-half-the-time\/"},"modified":"2026-03-09T10:43:26","modified_gmt":"2026-03-09T10:43:26","slug":"openai-aims-to-create-the-perfect-gp-with-chatgpt-but-its-incorrect-half-the-time","status":"publish","type":"post","link":"https:\/\/teknomers.com\/en\/openai-aims-to-create-the-perfect-gp-with-chatgpt-but-its-incorrect-half-the-time\/","title":{"rendered":"OpenAI Aims to Create the Perfect GP with ChatGPT, But It\u2019s Incorrect Half the Time"},"content":{"rendered":"\n<div>\n<h2>ChatGPT Health Mode: A Troubling Start<\/h2>\n<p>OpenAI kicked off the year with the launch of its ChatGPT Health mode. While this feature is not yet available in Spain, it has been rolled out in the US. Early studies assessing its effectiveness, however, have not delivered promising results for OpenAI.<\/p>\n<h3>Significant Underperformance in Critical Scenarios<\/h3>\n<p><strong>It&#8217;s not that big of a deal.<\/strong> A recent study published in the journal <a rel=\"noopener, noreferrer nofollow\" href=\"https:\/\/www.nature.com\/articles\/s41591-026-04297-7\" target=\"_blank\">Nature Medicine<\/a> and reported by <a rel=\"noopener, noreferrer nofollow\" href=\"https:\/\/www.nbcnews.com\/health\/health-news\/chatgpt-health-under-triaged-half-medical-emergencies-rcna261409\" target=\"_blank\">NBC News<\/a> has highlighted a concerning failure rate. ChatGPT Health misclassified the urgency of 51.6% of the emergency medical cases it evaluated. Researchers presented thousands of clinical scenarios to the AI, which tended to undervalue critical situations\u2014suggesting patients wait 24-48 hours for conditions that required immediate intervention, like diabetic ketoacidosis or respiratory failure. Meanwhile, it did identify serious conditions like stroke or severe allergic reactions correctly.<\/p>\n<h3>The Inconsistent Decision-Making of ChatGPT Health<\/h3>\n<p><strong>It doesn&#8217;t make sense.<\/strong> Not only did ChatGPT underestimate urgent cases, but it also overvalued many mild symptoms, incorrectly advising patients to seek medical attention urgently for situations like a persistent sore throat. Dr. Ashwin Ramaswamy, the study\u2019s leader, stated to NBC that \u201cit doesn&#8217;t make sense that recommendations were made in some areas and not in others.\u201d<\/p>\n<h3>Handling Suicidal Ideations Poorly<\/h3>\n<p><strong>Suicidal ideas.<\/strong> The study&#8217;s findings extend to alarming scenarios involving suicidal thoughts. For example, one query detailed a patient wanting to &#8220;take a lot of pills.&#8221; Initially, a suicide prevention hotline message appeared, but when lab results were included in the query, ChatGPT failed to recognize the suicidal ideation and omitted the safety message. According to Ramaswamy, \u201cA crisis protection barrier that depends on whether lab results are mentioned is not in place, and is arguably more dangerous than having no barrier at all.\u201d<\/p>\n<h2>Why the Accuracy of ChatGPT Health Matters<\/h2>\n<p><strong>Why it is important.<\/strong> ChatGPT has effectively become the first point of contact for many seeking medical advice. The convenience of checking symptoms via mobile phones is gradually overshadowing traditional consultation methods. If places where individuals decide on emergencies rely on a tool with a 50% margin of error for serious cases, it signals a profound issue.<\/p>\n<h3>The Dangerous Implications<\/h3>\n<p>In comments to the <a rel=\"noopener, noreferrer nofollow\" href=\"https:\/\/www.theguardian.com\/technology\/2026\/feb\/26\/chatgpt-health-fails-recognise-medical-emergencies\" target=\"_blank\">Guardian<\/a>, researcher Alex Ruani noted that these findings are \u201cincredibly dangerous.\u201d He emphasized the potential for creating a \u201cfalse sense of security\u201d\u2014implying that if someone receives a wait-and-see recommendation during a critical incident, it could prove fatal.<\/p>\n<h2>OpenAI&#8217;s Response to the Criticism<\/h2>\n<p><strong>OpenAI responds.<\/strong> A spokesperson defended ChatGPT Health, asserting that the study does not reflect its typical use. The spokesperson argued that the AI is not intended to diagnose but to clarify follow-up questions and assist in providing context. OpenAI has consistently noted that the tool doesn\u2019t replace professional medical advice, yet how users interact with it remains outside the company\u2019s control.<\/p>\n<h3>The Risks of AI: Flattery and Hallucinations<\/h3>\n<p><strong>Flattery and hallucinations.<\/strong> Chatbots like ChatGPT often exhibit a tendency to agree with users, contributing to their \u201cflattery problem.\u201d Additionally, the phenomenon of \u201challucinations\u201d occurs when large language models prioritize giving a confident answer instead of admitting uncertainty. Research indicates that people often feel more secure using AI, even when it provides incorrect information. Combining reassurance, inaccuracies, and health-related queries creates a perilous scenario.<\/p>\n<p>Image | OpenAI<\/p>\n<p>In Xataka | People are blaming ChatGPT for causing delusions and suicides: What\u2019s Really Happening with AI and Mental Health<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/teknomers.com\/category\/general\/\" rel=\"dofollow\">General News &#8211; 2<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>ChatGPT Health Mode: A Troubling Start OpenAI kicked off the year with the launch of its ChatGPT Health mode. While this feature is not yet available in Spain, it has been rolled out in the US. Early studies assessing its effectiveness, however, have not delivered promising results for OpenAI. Significant Underperformance in Critical Scenarios It&#8217;s [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":208605,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36399],"tags":[23530,12522,1861,3916,18785,7277,269],"class_list":["post-208604","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-aims","tag-chatgpt","tag-create","tag-incorrect","tag-openai","tag-perfect","tag-time"],"_links":{"self":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts\/208604","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/comments?post=208604"}],"version-history":[{"count":1,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts\/208604\/revisions"}],"predecessor-version":[{"id":208606,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts\/208604\/revisions\/208606"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/media\/208605"}],"wp:attachment":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/media?parent=208604"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/categories?post=208604"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/tags?post=208604"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}