ChatGPT can answer patients’ queries about radiation therapy for cancer at least as competently as online resources provided by professional societies, research suggests.
The AI system, which deploys complex algorithms to create realistic conversations, provided replies that were on par or superior in terms of factual correctness, completeness, and conciseness.
The findings, published in the journal JAMA Network Open, show the potential of natural language processing to answer questions and lighten physician workloads, potentially preventing burn out.
However, the chatbot answers were at a considerably higher reading level than those from human experts, suggesting some fine tuning may be necessary to maximize patient accessibility and understanding.
Senior researcher Amulya Yalamanchili, a Northwestern University radiation oncology resident, said the hope was that patients could educate themselves with ChatGPT before and after they saw a physician.
“The goal of this project is to empower patients,” she explained.
“This is a really technical field that can be hard to understand. All this information can be overwhelming to patients. If they have cancer in a sensitive area, they may not feel comfortable asking what their life will look like long term.”
To determine the value of the large-language model (LLM), the research team first retrieved common questions from RadiologyInfo.org, a website sponsored by the Radiological Society of North America and the American College of Radiology, as well as Cancer.gov from the National Cancer Institute at the National Institutes of Health.
A database was compiled that included 29 general radiation oncology questions from Cancer.gov and also 45 treatment modality–specific questions and 41 cancer subsite–specific questions from RadiologyInfo.com.
The answers to these 115 questions were then compared with those generated by ChatGPT version 3.5.
ChatGPT performed at least as well, or better, in 108 responses (94%) for relative correctness, 89 responses (77%) for completeness, and 105 responses (91%) for conciseness compared with those created by experts on the websites.
Chatbot answers had a high degree of similarity to expert answers, with a mean quantitative similarity score of 0.75.
Just two AI answers were identified as being possibly harmful. Potential harm was ranked moderate for one response regarding what the patient would feel during and after stereotactic radiosurgery (SRS) and stereotactic body radiotherapy.
The LLM stated that the patient would not feel any pain as it was non-invasive, which was deemed harmful as this did not describe the invasive nature of SRS headframe placement if required. The expert answer noted possible pain associated with placement of the headframe.
The chatbot also generated a response that was rated slight for potential harm when answering a question as to whether any special preparation was needed for external beam therapy procedure, as it did not note the need for tattoos at simulation.
Mean reading level was equivalent to college level for the chatbot versus tenth grade for expert answers.
Concerns about accuracy and potential harm to patients have limited the use of LLM chatbots in the clinic, the authors note.
“ChatGPT’s creator, OpenAI, has acknowledged that the application may provide ‘plausible sounding but incorrect or nonsensical answers.”
“However, this study found high qualitative ratings of factual correctness as well as conciseness and completeness for the LLM answers to common patient questions.”