Digital Brain hovering above a series of computer chips to illustrate artificial intelligence (AI)-powered tools to help patients.
Credit: BlackJack3D/Getty Images

Research led by the National Institutes of Health shows how large language artificial intelligence (AI) models can extract useful research information from patient imaging reports without compromising privacy.

“Recently released general-purpose large language models (LLMs), such as ChatGPT and GPT-4, have garnered substantial attention due to their remarkable ability to perform language generation and language understanding tasks, including question answering and conversion of free text to structured formats,” wrote Ronald Summers, senior investigator in the Radiology and Imaging Sciences Department at the NIH, and colleagues in the journal Radiology.

“Despite their promise, these models cannot be used with radiology reports in a clinical setting without first de-identifying the patient data. Unfortunately, removing all patient health information and personal identifiable information from radiology reports is a nontrivial task.”

In this study, Summers and team tested whether an alternate large language model (LLM) called Vicuna-13B could overcome the privacy issue associated with similar LLM’s and extract useful data from radiology reports. The big advantage of the Vicuna model over other LLM’s like ChatGPT is that it can be run locally and enables patient privacy to be maintained.

Using chest radiography reports from the Medical Information Mart for Intensive Care (MIMIC) database (n=3269) and the NIH (n=25,596) the researchers tested the efficacy of Vicuna-13B for labeling chest imaging reports looking at the presence or absence of 13 features on the images.

The prompt that performed best was a multistep prompt that guided the model from label to label by answering three questions about whether each finding was present, absent or uncertain/not applicable.

The team found that Vicuna-13B was comparable to other non-LLM programs used to label the images. “Our study demonstrated that the LLM’s performance was comparable to the current reference standard,” said Summers in a press statement. “With the right prompt and the right task, we were able to achieve agreement with currently used labeling tools.”

If issues of privacy can be addressed and good tests of accuracy achieved then LLM’s have great potential to enhance radiology and medicine more generally, according to Summers.

“LLMs have turned the whole paradigm of natural language processing on its head. They have the potential to do things that we’ve had difficulty doing with traditional pre-large language models,” he said.

“My lab has been focusing on extracting features from diagnostic images. With tools like Vicuna, we can extract features from the text and combine them with features from images for input into sophisticated AI models that may be able to answer clinical questions.”

Also of Interest