Researchers say you may want to think twice about using powerful artificial intelligence (AI) programs such as ChatGPT to self-diagnose health problems.
A team led by Waterloo Engineering found in a simulated study that ChatGPT-4o, the well-known large language model (LLM) created by OpenAI, answered open-ended diagnostic questions incorrectly nearly two-thirds of the time.
“People should be very cautious,” said Troy Zada, a doctoral student in management science and engineering. “LLMs continue to improve, but right now there is still a high risk of misinformation.”
The study used almost 100 questions from a multiple-choice medical licensing examination. The questions were modified to be open-ended and similar to the symptoms and concerns real users might ask ChatGPT about.
Medical students who assessed the responses found just 37 per cent of them were correct. About two-thirds of the answers, whether factually right or wrong, were also deemed to be unclear by expert and non-expert assessors.
“It’s very important for people to be aware of the potential for LLM’s to misinform,” said Zada, who was supervised for this paper by Dr. Sirisha Rambhatla, an assistant professor of management science and engineering.
Go to AI’s medical diagnostic skills still need a checkup for the full story.