Research led by the New York University (NYU) Langone Health shows that generative artificial intelligence (AI) could help improve communication between patients and their physicians, as long as accuracy is maintained.
The study, published in JAMA Network Open, showed that a large language model (LLM) using generative AI could help translate patient discharge summaries, which people in the U.S. have immediate access to by law, into a more user friendly and understandable format.
“Increased patient access to their clinical notes through widespread availability of electronic patient portals has the potential to improve patient involvement in their own care, as well as confidence in their care from their care partners,” write lead author Jonah Zaretsky, a physician and researcher at NYU Langone Health, and colleagues.
“However, clinical notes are typically filled with technical language and abbreviations that make notes difficult to read and understand for patients and their care partners. This issue can create unnecessary anxiety or potentially delay care recommendations or follow-up for patients and their families.”
In this study, Zaretsky and colleagues tested whether a LLM could effectively translate standard patient discharge information into a more readable and useful format. In total, 50 patient discharge summaries were included in the research.
Readability of the LLM output was assessed using the Flesch-Kincaid Grade Level test and understandability using the Patient Education Materials Assessment Tool (PEMAT). Accuracy of the AI-generated summaries was assessed by two physicians using a six-point scale.
The Flesch-Kincaid Grade Level was lower, indicating better readability, for the AI-generated summaries versus the original discharge summaries at an average of 6.2 vs 11, respectively. PEMAT understandability scores was also much higher for the AI-summaries versus the originals at a respective 81% vs 13%.
In terms of physician-rated accuracy, 54 of 100 reviews had a score of six indicating 100% accuracy. However, 18 reviews raised concerns about safety due to omissions of data and some inaccurate statements generated by the AI. The physicians also classed 44% of the AI-generated statements as incomplete.
“We think a major source of inaccuracy due to omission and incompleteness comes from prompt engineering that optimized for readability and understandability. For example, limiting the number of words in a sentence or a document is considered more understandable,” write the authors.
“This makes it difficult to provide a detailed, comprehensive description of a complex patient condition. Future iterations will have to explore the trade-off between readability and understandability on one hand and completeness on the other.”
While this study shows widespread use of generative AI to improve patient-physician communication may not be ready for widespread rollout at present, it shows there is potential for use if issues with accuracy can be improved.
Writing in an associated commentary article in the same journal, Charumathi Raghu Subramanian, a physician and researcher at the University of California, and colleagues commented: “LLMs may not be ready for widespread unsupervised use to generate patient facing discharge summaries, given real safety risks and formidable, if solvable, technological and workflow barriers, but perhaps in the near future, with better safety profiles, more automated inputs and outputs, and strict clinician oversight, they may become important tools in enhancing health care communication.”