Biomarkers hold out the promise of significant time and money savings for drug discovery and development, but finding the right ones is notoriously tough. Many putative biomarkers fail validation after years of study, leaving developers poorer but not much wiser—a fact that highlights the complexity of biomarker development. But a growing number of life science artificial intelligence (AI) startups say they have just the tools to uncover the best biomarkers and to determine how to use them.
“AI is perfect for precision medicine because it can reveal meaningful patterns across multiple omics datasets,” says Richard Wendell, founder and CEO of AI startup tellic, a company that leverages AI in molecular biology, functional genomics, and drug discovery.
Most of the relatively young companies currently leveraging AI and machine learning are working on digital pathology or clinical decision support applications for cancer, or at least that is their initial focus. But the hope is that such tools will be built for many other indications as well.
The cancer connection
Cancer is a particularly attractive target now. The market for immune-oncology therapies alone is expected to exceded $100 billion within the next few years, and there are almost 20,000 cancer therapies in development according to ClinicalTrials.gov. There is also particular interest in the precision medicine approach of pairing therapeutics with diagnostics. According to the FDA, there are already more than 40 FDA-approved companion diagnostic tests, and the majority of them are for specific cancers.
Paige is a digital pathology software company whose AI-based products are built upon several years of collaboration between Memorial Sloane Kettering (MSK)’s computer scientists, machine learning experts, pathologists, and oncologists.
The company has the rights to all of MSK’s digitized pathology slides as well as rights to digitize the remaining archives. That’s more than 25 million slides, with 1.6 million of them already digitized. Paige can also link digitized pathology slide features to additional diagnostic information, such as IHC and genomic assay results, as well as treatment information.
“We have taken the multiple research-use technologies which the MSK group built, significantly advanced the technology, and productized this work,” explains Leo Grady, CEO of Paige.
One significant advance is the number of slide images that can be analyzed. A 2019 paper in Nature Medicine by Gabriele Campanella and colleagues at MSK’s pathology department demonstrated that. The study analyzed more than 44,000 digitized glass slide images from more than 15,000 people with prostate, skin, or breast cancer that had already spread to the lymph nodes. “In the past, typically no more than 500 to 1,000 slides have been used to train deep-learning models in pathology,” Grady says.
Their analysis “…resulted in areas under the curve above 0.98 for all cancer types,” Campanella and her colleagues wrote. “Its clinical application would allow pathologists to exclude 65–75% of slides while retaining 100% sensitivity.” The study demonstrates that their AI model can provide clinical-grade performance without requiring annotation by actual scientists, a process that takes much more time.
Within the last year, Paige inked two deals, one with Royal Philips to “deliver clinical-grade AI applications to pathology laboratories,” according to a company press release; and the other was with Invicro LLC, a Konica Minolta company, to jointly provide an integrated pathology solution to pharmaceutical and biotechnology sponsors.
PathAI is also working on digital pathology. About 50% of their projects are in oncology, while the rest span other indications including non-alcoholic steatohepatitis (NASH). Its main customers are pharmaceutical and biotech companies as well as reference labs. Those clients include Bristol Myers Squibb, Genentech Roche, Gilead, Merck, and Labcorp. “There is a lot in every slide that is difficult, if not impossible, for any pathologist to quantitate,” says Mike Montalto, CSO of Path/AI. The company’s platform, he says, can look at up to 5,000 pathology-based features at once.
In one project, PathAI developed an AI-powered measure of PD-L1 in tumor cells and compared its results to those of manual pathologists reading the slides in two pivotal clinical trials where the immuno-oncology drug nivolumab was tested. The PathAI test was able to identify significantly more patients who would be eligible for nivolumab than the manual test.
In that study, PathAI applied their deep learning-based tumor microenvironment (TME) assay to 1,027 digitized H&E stained slide images from IMpower150 to generate a high dimensional set of human interpretable features (HIFs) characterizing the TME. Those features included quantitative measurements of cancer cells, immune cells, stromal cells, and the cancer vasculature. According to PathAI, the study’s findings support the importance of the tumor microenvironment (TME) and vasculature in determining response to PD-L1 and VEGF-targeting therapies.
Two big challenges, says Montalto, are data availability and edge cases, which are extreme cases that “throw off your models.”
Digital health company BioAI Health uses genomics, pathology, and radiology data to answer critical questions in drug discovery and development, while holding all the data in a secure cloud environment. The company is working with pharmaceutical and biotechnology firms to generate CLIA-approved, lab-developed diagnostic tests to guide drug discovery and development. “AI can find things in data that may otherwise be missed,” says Thomas Colarusso, co-founder and CEO of the company.
The company’s first focus was oncology, but it is now moving rapidly into new areas, such as infectious disease and inflammation. It has “ready to go” models, some of which can be used for more than one indication and can also analyze the effects of a combination of two or more features. “That is a big advantage,” says Colarusso. The company can also build custom models to meet its clients’ needs.
The biggest question, he adds, is “How does the FDA look at your AI platform?” He points to the example of IDx-DR, a software program that can diagnose diabetic retinopathy using an artificial intelligence algorithm to analyze images of the eye taken with a retinal camera called the Topcon NW400. IDx-DR was the first FDA-approved device to use AI to diagnose diabetic retinopathy in adults. Colarusso believes this particular test provides a useful precedent and will also encourage more companies to use AI.
Alan Jerusalmi, co-founder and chief services officer at BioAI, says, “The AI system finds the molecular pathology and helps us determine robust signatures, even complex ones. These can be for response to therapy, to recruit patients for a clinical trial, to look at clinical trial data retrospectively, to repurpose a drug, and more.”
The size of training sets is important to all AI companies. One of Cambridge Cancer Genomics (CCG)’s tools determines whether a mutation is a driver of a particular type of cancer. It was initially trained on data from 64,000 patients. The latest version trained on data from 72,000 patients.
The precision oncology company also has developed a new variant caller that it says is more accurate than industry standard tools. “The standard variant callers show very little corroboration between their results,” says Nirmesh Patel, CCG co-founder and CSO.
But CCG’s tool kit entails more than those particular tools. “We are taking a holistic approach and are looking for complex biomarkers to match patients to drugs and then turn those into companion diagnostic tests,” says Patel. To find these markers, the company analyzes DNA, RNA, immune infiltrates, and more. Their OncOS precision oncology platform can also be used with very small amounts of DNA, even in liquid biopsy samples.
CCG partners with diagnostic labs that do the sequencing while CCG does the analytics and provides decision support to guide treatment, because their analytics platform identifies responders and non-responders. The company uses a federated data model to ensure that clients’ data stay within their own jurisdiction.
CCG recently began a collaboration with Dante labs, an Italian company that provides direct-to-consumer sequencing, including whole genomes. Together, the two firms will provide clinical services in oncology. Tumor samples and liquid biopsy samples will be sent to Dante and sequenced. Clinical reports generated by CCG will go to physicians who will present them to the patient.
One hurdle in AI is that “people misunderstand what it is,” Patel says. Regulatory agencies in particular want to know what’s going on under the hood, so it helps to have a transparent model.
Quantgene’s initial focus was cfDNA analysis of somatic mutations in cancers, but its offerings have since expanded and now include hereditary genetic testing, lifestyle factors, and COVID-19 testing.
The company was working exclusively with its own network of physicians, but it is now rolling out its consumer-facing product called Serenity. The offering comprises three plans: a cancer risk profile; a full-service hereditary heath solution; and the detection of eight cancers at early stages. The first two plans are already available.
“We’ve incorporated an entire pipeline into our process to guide patients through their genetic testing,” says Jo Bhakdi, founder and CEO of Quantgene. “This includes genetic counselling services, testing, and membership with continuous education.”
Quantgene built their AI platform using a Berlin-based multi-disciplinary team that includes physician experts on AI and advanced statistics, Ph.D.s in biology, and a software team. The AI models themselves are also complex. “You can’t just use one type of AI, you need to combine neural nets, machine learning, natural language processing, and deep quantitative AI systems,” says Bhakdi. “These may also be combined with other types of analyses, such as Bayesian Networks.”
Easy does it
One common factor among AI companies is ease of use. Over the last five years, tellic has built a biomedical data-specific automated pipeline, with dozens of different AI models that can be customized for different data types.
One of tellic’s models addresses “the synonym and acronym problem in biomedical text,” Wendell explains. That is critical for making the right connections between various omic datasets and existing biomedical data and publications. “For example, SCD can refer to the sickle cell disease or sudden cardiac death, among others,” Wendell says.
All of tellic’s AI models are trained on thousands of annotations by Ph.D.-level scientists who map all known synonyms to a single ontological concept, and are automatically updated as new research is published.
The company’s recently launched knowledge graph app leverages a graph database that brings together diverse datasets and makes it easy to run complex analyses. The platform is fully automated, and tellic’s biopharma partners use it for a variety of R&D projects.
AI also enables processing of data from different structures, including images, text, lab notes, and audio. Pharma has a disproportionate amount of unstructured data because of the focus on research. “AI opens the door to finding patterns that couldn’t be found before, and allows processing all files beyond structured data,” says Wendell.
AI spreads its wings
While oncology is by far the most popular target for precision medicine-related AI, researchers are starting to use these on other indications including neurological and cardiovascular conditions. But today, COVID-19 is the most obvious indication for these companies to add to their portfolio.
Quantgene, for example, is offering both molecular PCR and antibody testing. “Current COVID antibody testing gives one answer—a yes or a no, while Quantgene’s NGS platform can deliver six billion datapoints,” explains Bhakdi.
Earlier this year, tellic took the plunge and announced the release of its graph.C19 product containing data on COVID-19 and other coronaviruses. This web-accessible tool comprises more than 12 million distinct connections across two million documents. It lets researchers access integrated knowledge from their previous experience with coronaviruses, including SARS and MERS. Researchers can also gain insights from the ever-growing body of data on COVID-19.
Graph.C19’s release was a direct response to a White House call to action, which was made in partnership with several high-profile non-profits, companies, and other organizations. That call to action, asked the AI community to create technologies using natural language processing—a core technology used in tellic graph. This tool allows researchers to query high-priority questions related to COVID-19 and create associations among millions of data sources.
“Last year, we launched the first biopharma knowledge graph that enables scientists to instantly connect the dots on insights scattered across tens of millions of documents,” Wendell said in an April 4 press release. “Today, we are giving scientists a version containing all internetworked knowledge on COVID-19 because we want to do our part to fight back against this pandemic.”
AI still has its skeptics, and precision medicine seems to be getting more complicated as the types of data and ways of analyzing them multiply. “The key now is generating awareness of what AI can do for research and getting AI-based tools into the hands of scientists,” notes Wendell.
Colarusso, however, sees a rosy future ahead. “We are now at the forefront of the era of AI and over the next eight to 10 years, we should see it bloom.”