Cancer cells
Credit: Mohammed Haneefa Nizamudeen / Getty Images

A new partnership between Google Cloud and COTA, a real-world oncology data and analytics company, will develop algorithms for extracting and analyzing unstructured data from electronic health records (EHRs). The pair aim to augment manual, human-led abstraction with technology-based approaches by applying machine learning and natural language processing (NLP) to transform text, such as clinician notes, into structured fields for use in research and analytics.

One of the partnerships’ targets will be next-generation genomic sequencing. Reports providers receive from genetic testing labs are often in a PDF format. Traditional tools, such as optical character recognition, can’t accurately “read” the text in these PDF images, so this data often goes underused.

“We are collaborating with COTA to build a series of new natural language processing models tailored specifically to unstructured oncology data, including emerging data such as genomic sequencing,” said Shweta Maniar, director of life sciences industry solutions at Google Cloud.

“By training these algorithms specifically on oncology information, we will partner with COTA in generating a much more complete understanding of what is happening in the cancer care setting and how a patient’s unique clinical history may impact their response to treatment,” she added.

The EHR has revolutionized the way healthcare providers capture data. The Affordable Care Act’s Health Information Technology for Economic and Clinical Health (HITECH) provisions created tens of billions of dollars in incentives for healthcare providers to implement EHRs. From 2019 through 2021, 86% of non-Federal general acute care hospitals had adopted a federally certified EHR.

Meanwhile, real-world evidence/data, such as clinician notes and patient-generated data, is increasingly being used to improve patient care, spur research, in clinical trials, and in regulatory decision-making. But there remain big challenges related to using real-world data.

Despite the fact that the vast majority of critical health data is now created and stored digitally, much of the information is still generated in an unstructured format. Free-text clinical notes and PDF documents are largely invisible to algorithms that can mine structured data fields for insights into patient care and research leads.  Many real-world data companies have physicians manually curate the data, which makes scaling this approach across vast amounts of data both time- and resource-intensive.

“Imagine a scenario where we can be alerted, in real time, to new diseases or receive signals from geographies where patients are experiencing better outcomes, or poorer outcomes, so that we can take action quickly,” said Miruna Sasu, President and CEO at COTA. “In order for this to become our reality, we must leverage technologies to ingest healthcare data responsibly, accurately, and expeditiously. We are delighted to partner with Google Cloud to combine our respective strengths in technology and data science with the ultimate goal of improving care for patients.”

Founded by oncologists, COTA offers real-world data from leading academic and community-based cancer centers and an advanced analytics platform. They say they have the “largest inventory in hematologic oncology diseases with the highest quality and deepest data model,” also, “Volume for each indication based on disease prevalence within the U.S.” The company partners with life sciences companies, providers, and payers to guide cancer care and research.

Also of Interest