ChatGPT Health, a healthcare oriented large language model developed by OpenAI, underestimated the severity of 50% of the medical emergency cases posed to it, according to a study published in Nature Medicine. On the other hand, it overestimated the severity in another three-fourth of the cases, the research showed.

The researchers assessed the AI chatbot's ability to triage or evaluate the seriousness of medical emergency cases derived from real life situations.

ChatGPT Health is a spin-off of OpenAI's flagship AI product ChatGPT focused on answering "health and wellness" questions. The app asks users to share their personal medical information which, according to OpenAI, is stored on a secure platform and used by the AI to answer queries.

The product is free and requires interested users to sign up to use it with a waitlist ongoing. OpenAI also stated that the application is not "intended for diagnosis and treatement".

More than 40 million people across the world use ChatGPT to answer medical inquiries with two million questions a week about insurance.

ALSO READ: IBM Launches Sangam, Its First Infrastructure Innovation Centre In India, To Drive AI Innovation

Researchers fed 60 medical cases to the LLM and the cross-verified its answers with those of three physicians, who also analysed these scenarios based on their clinical expertise and medical knowledge and guidelines. These scenarios had 16 variations with changes in gender and race of the patient.

The study did not any significant differences in evaluations based on these parameters.

The chatbot's issues were apperent in 51.4% of the cases, where it under-diagnosed or under-triaged the cases posed to it, recommending a doctor's visit within the next one to two days instead of recommending an immediate trip to the emergency room which was what the symptoms actually required.

These cases included a potentially fatal diabetes complication called diabetic ketoacidosis and a patient entering a state of respiratory failure. If both were not dealth with immediately, they would have lead to deaths of these patients.

In the case of oncoming respiratory failure, the LLM recommended "waiting for the emergency to become undeniable” before going to the emergency room.

ALSO READ: The Evolution Of Data Sovereignty In The Era Of Agentic AI

It also overtriaged 64% of the cases posed to it, recommending a trip to the doctor's clinic when resting at home would have sufficed. The LLM proposed a doctor's visit within the next 24 to 48 hours instead of recommending at-home rest.

When it cames to cases where users described suicidal intent, the chatbot failed to provide emergency contact information but did so in cases where users had not done so.

Testers noted accurate results when it came to triaging strokes where the chatbot gave correct responses 100% of the time.

A ChatGPT spokesperson told CNBC that the application was not intended to diagnose the right course of action for medical scenarios but as a tool to answer follow-up medical questions all the while welcoming research on their product. He also added that OpenAI is currently working on improving the app and that it is currently avaliable to a limited number of users.

ALSO READ: Iran Conflict Could Hit Samsung, Other South Korean Chipmakers: Report

Essential Business Intelligence, Continuous LIVE TV, Sharp Market Insights, Practical Personal Finance Advice and Latest Stories — On NDTV Profit.