ChatGPT’s Reliability Reduced When Provided More Evidence: Study

Story ByArti Ghargi

•

1 year ago

•

3 Mins Read

ChatGPT’s Reliability Reduced When Provided More Evidence: Study — Image Source: Freepik

The study examined two question formats for ChatGPT: questions without evidence and questions biased with supporting or contrary evidence.

A latest study-‘How different prompts impact health answer correctness’, conducted by scientists revealed that providing more evidence to ChatGPT actually reduces its reliability.

The study was conducted by scientists from CSIRO, Australia's national science agency, and the University of Queensland (UQ) in December 2023.

It sheds light on the accuracy of large language models (LLMs) such as ChatGPT when it comes to providing health-related information.

The study found that when presented with more clinical evidence, the accuracy of ChatGPT plummeted to as low as 28%.

Dr Bevan Koopman, CSIRO principal research scientist and associate professor at UQ, highlighted the pervasive trend of individuals turning to online tools such as ChatGPT for health information despite the known risks.

"The widespread popularity of using LLMs online for answers on people’s health is why we need continued research to inform the public about risks and to help them optimize the accuracy of their answers," Koopman said.

The Study Findings

The research, presented at the Empirical Methods in Natural Language Processing (EMNLP) conference, delved into a hypothetical scenario where individuals, often non-professional health consumers, sought answers to various health-related questions from ChatGPT.

These questions ranged from inquiries about the efficacy of zinc in treating the common cold to the effects of drinking vinegar to dissolve a stuck fish bone.

The study examined two question formats: questions without evidence and questions biased with supporting or contrary evidence.

Per the study, while ChatGPT exhibited an 80% accuracy rate when responding to questions without evidence, this accuracy dropped to 63% when evidence was provided.

Even more concerning was the sharp decline in accuracy to 28% when an "unsure" answer was permitted, the study reflected.

Caution on Integrating LLMs in Search Engines

"We're not sure why this happens. But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy," Koopman speculated.

Prof Guido Zuccon, Study coauthor and director of AI for the Queensland Digital Health Centre (QDHeC), cautioned about the integration of LLMs into major search engines, highlighting the potential generation of inaccurate health information.

The next steps for the research involve investigating how the public utilizes health information generated by LLMs.

Introduced in 2022, ChatGPT quickly became one of the world’s leading Generative AI chatbots. As of August 2023, it has a userbase of over 180.5 million users with 100 millions of them being active on a weekly basis.

However, when it comes to health queries, several studies have warned about ChatGPT’s ability to provide accurate answers to complex questions.

Previous Studies on ChatGPT’ Medical Information Accuracy

A recent study published in the British Medical Journal found that the large language models behind most of the popular AI-powered chatbots, including ChatGPT lacked sufficient safeguards or were inconsistent in preventing production of healthcare disinformation on their platform.

Another study published in McKnight's Senior Living in 2024 found that ChatGPT may be a useful tool for basic healthcare questions, but it struggles with more complex queries.

It warns that healthcare professionals and patients should be cautious about using ChatGPT as an authoritative source for medication-related information.

Similarly, a study conducted by researchers at Long Island University last year found that ChatGPT correctly answered only 10 out of 39 medical-related questions.

The findings indicated that the chatbot can produce incomplete information in some medical situations.

These studies and their findings underscore the need for further research to understand the limitations and risks associated with relying on AI-driven platforms for health information.

While AI integration in healthcare in the near future is undeniable, it's crucial to remember that current LLM technology lacks sufficient evidence to support its use in real health settings.

Stay tuned for more such updates on Digital Health News