Chatbot
Credit: Irina Strelnikova / Getty Images

Chatbots powered by artificial intelligence (AI) may be unreliable for providing patient drug information, with a study showing that they supply a considerable number of incorrect or potentially harmful answers.

The findings lead the researchers to suggest that patients and healthcare professionals should exercise caution when using these computer programs, which are designed to mimic human conversations.

Researchers found that the AI-powered chatbot studied, which was embedded into a search engine, provided answers that were of high, but still insufficient quality.

The answers also had low levels of readability, according to the study in BMJ Quality and Safety, and needed on average a degree-level education to be understandable.

“In this cross-sectional study, we observed that search engines with an AI-powered chatbot produced overall complete and accurate answers to patient questions,” the researchers led by Wahram Andrikyan, a PhD student at Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, acknowledged.

“However, chatbot answers were largely difficult to read and answers repeatedly lacked information or showed inaccuracies possibly threatening patient and medication safety.”

In February 2023, the landscape of search engines underwent a significant shift due to introduction of AI-powered chatbots, the researchers noted.

They add that both Microsoft’s Bing chatbot, an AI-powered copilot for the web, and Google’s Gemini—formerly known as Bard—both promise enhanced search results, comprehensive answers and a novel chat experience.

These chatbots are based on large language models, which are neural networks with an underlying architecture that allows them to be trained on extensive datasets from the internet

This enables them to converse on any topic, including healthcare-related queries. However, there may also be associated risks with their use that include the generation of disinformation, non-sensical or harmful content.

Noting that the effect of these models for drug information have primarily been from the perspective of health professionals, the researchers conducted a study from the viewpoint of patients.

As Google’s chatbot regularly refuses to answer medical questions, the study was conducted using Microsoft’s Bing AI copilot for the web.

The chatbot was asked 10 questions each for 50 drugs, ranging from what the drug was used for, how it operated and instructions for use to common side effects, and contraindications.

The 500 resulting answers had an average Flesch Reading Ease Score of just over 37, indicating that degree level education would be required to understand them. Even those with the highest readability still required a high-school level education.

The average completeness of chatbot answers was 77% when referenced against drugs.com, a peer-reviewed website with the latest drug information for healthcare professionals and patients.

Twenty chatbot answers deemed of low accuracy or completeness or a potential safety risk were then assessed by seven experts in the safety of medication.

Just over half of these, 54%, aligned with scientific consensus, while 39% contradicted this and the final 6% had no established consensus.

Among the subset of 20 chatbot answers, 42% were considered to lead to moderate or mild harm, and 22% to death or severe harm if patients followed the advice.

“Despite their potential, it is still crucial for patients to consult their healthcare professionals, as chatbots may not always generate error-free information,” the researchers concluded.

“Caution is advised in recommending AI-powered search engines until citation engines with higher accuracy rates are available.”

Also of Interest