Inside Precision Medicine Informatics Aiming AI at Cancer-Related Biomarkers

Aiming AI at Cancer-Related Biomarkers

A range of new tools and techniques promise to reveal much more about the biology of cancer and ways to track and treat this wide range of diseases

By Mike May, PhD

February 12, 2024

Precision Oncology and Targeted Oncology Therapy — Credit: ArtemisDiana / iStock / Getty Images Plus

In oncology, biomarkers play a range of crucial roles in many applications, from early detection and diagnosis to treatment options and efficacy. Nonetheless, biomarkers can be elusive, especially at their often-low levels. Still, scientists around the world search for new tools to take on the biomarker challenges, driven by the vast potential applications in cancer research and patient care. Like in many other fields, some of the solutions might lie in artificial intelligence (AI) and two of its cousins: deep learning (DL) and machine learning (ML). As Dania Daye, MD, PhD, assistant professor of interventional radiology at Massachusetts General Hospital/Harvard Medical School, and her colleagues noted: “In recent years, artificial intelligence has emerged as a promising tool in the cancer sphere.”¹ Much of that promise might lie in identifying new biomarkers or making better use of existing ones.

So far, scientists are just getting started with AI in many realms. “AI-powered biomarkers are in their infancy and currently have only served as an aid to the pathologist in the interpretation of slide-based companion diagnostics like HER2 and PD-L1,” says Ken Bloom, MD, head of pathology at Nucleai, which is headquartered in Tel Aviv, Israel, and partners with biopharma companies across various stages of drug development and diagnostics for AI-powered tissue biomarker analysis. “These tools are important because manual interpretation of current assays is laborious and suffers from variability in pathologist interpretation.” AI-based tools, Bloom notes, “help overcome these limitations by analyzing slides quickly with high precision, accuracy, and reproducibility.”

Moreover, those are just a couple of the ways to apply AI in identifying cancer biomarkers.

Organs and more

As scientists explore the opportunities for using AI-based tools in the search for new biomarkers, an organ-by-organ approach will be common. For patients with hepatocellular carcinoma (HCC), as an example, clinicians struggle to find the best collection of biomarkers—from serum alpha-fetoprotein (AFP) to tumor vascularization—for screening patients. As Daye and her colleagues pointed out: “There is no consensus regarding if and how these biomarkers should be combined in a manner that is clinically useful or regarding the establishment of a pipeline to discover new biomarkers.” Nonetheless, Daye’s team discussed several AI-based approaches to finding better biomarkers for HCC. For example, AI can be used to scour literature for HCC-related biomarkers, predict the risk of HCC, determine a patient’s stage of the disease based on the expression of a collection of genes, or to predict the likely outcome from various treatments and the odds of recurrence. Consequently, these scientists concluded: “Significant potential remains for AI and ML to enhance biomarker detection in the screening, diagnosis, and management of liver cancers.”

In some cases, it’s not just the cancer, but also how it relates to other aspects of human biology that might reveal crucial biomarkers. That’s what Takuji Yamada, PhD, associate professor of life science and technology at the Tokyo Institute of Technology in Japan, and his colleagues explored in colorectal cancer (CRC).² “A growing number of studies have reported a link between the alteration in gut microbiome compositions and CRC,” Yamada’s team noted. “The elevated abundance of certain bacterial species such as Fusobacterium nucleatum and Parvimonas micra in CRC patients is often associated with the development of the disease.” So, Yamada and his colleagues used AI-based analysis to study bacterial populations in the gut and their relationship to colorectal cancer. From this work, the scientists reported: “We discovered different patterns of bacterial contributions to the CRC probability among individual CRC subjects.” Consequently, these patterns of bacteria in a person’s gut microbiome might be used as biomarkers of the risk of CRC.

Handling heterogeneity

The molecular variations in some cancers create new challenges and opportunities in using biomarkers. Although heterogeneity can make it difficult to distinguish between different subtypes of a cancer, AI might make use of the complexity to find hidden relationships in the data.

Non-small cell lung cancer (NSCLC) is known for molecular heterogeneity, but this disease includes two common subtypes: lung adenocarcinoma and lung squamous cell carcinoma. In hopes of using biomarkers to separate the diagnosis of these two subtypes, Kountay Dwivedi, PhD, a research scholar in the department of computer science at the University of Delhi in India, and his colleagues developed an explainable AI (XAI)-driven framework. Although many AI applications are seen as black boxes in which the data processing gets hard to explain, Dwivedi and his colleagues stated: “XAI methods have shown significant potential in explaining the behavior of the model, thereby building trust over it.”³

Workflow Diagram — Using an explainable AI (XAI)-driven framework, Kountay Dwivedi of the University of Delhi and his colleagues developed a workflow for identifying new biomarkers for non-small cell lung cancer (NSCLC).

With this tool, Dwivedi’s team looked for gene-expression biomarkers relevant to NSCLC. The scientists found 52 NSCLC-relevant genes. With the XAI-based tool, though, the scientists quickly cut down this list to new NSCLC-related biomarkers. “Among this set of biomarkers, 45 are found to conform with the established literature,” Dwivedi says. That left just seven new biomarkers, which Dwivedi says, “had not yet been discovered by any existing work—to the best of our knowledge.”

Beyond just discovering new NSCLC-related biomarkers, the XAI-based tool revealed several ways to make use of them. “When utilized for NSCLC subtype classification, the discovered set of biomarkers achieves an accuracy of 95.7%, outperforming state-of-the-art work,” Dwivedi explains. Plus, this research team found 14 biomarkers that might be druggable. So, Dwivedi notes: “This subset of biomarkers could be further investigated for their clinical efficacy towards NSCLC-targeted therapy and prognostic effects.”

Pharmaceutical companies also use AI to explore new NSCLC-related biomarkers. In a recent review article by two scientists from Daiichi Sankyo, they reported: “Here, we employed a text-mining approach and identified 215 studies that reported potential biomarkers of NSCLC using AI/ML algorithms.”⁴ Oncologists, though, want to ensure that such information gets put into practice. As these scientists concluded: “We anticipate that our comprehensive review will contribute to the current understanding of AI/ML advances in NSCLC biomarker research and provide an important catalogue that may facilitate clinical adoption of AI/ML-derived biomarkers.”

deep-learning algorithms illustration — Using deep-learning algorithms, Yafeng Qi of Peking University and collaborators converted signals from Raman spectroscopy into two-dimensional images, which improved the accuracy of distinguishing healthy from cancerous lung tissue.

XAI-based methods can also be applied to other cancers. For instance, Jungeun Kim, PhD, assistant professor of software at Kongju National University in South Korea, and his colleagues combined ML and XAI to search for biomarkers that predict the metastasis of breast cancer, which also displays considerable heterogeneity. After analyzing the genomic data from nearly 100 breast-cancer samples, the scientists found increased expression in 10 genes and decreased expression in eight genes. From the ML/XAI analysis, Kim and his colleagues concluded: “The findings of this study may prevent disease progression and metastases and potentially improve clinical outcomes by recommending customized treatment approaches for [breast cancer] patients.”⁵

The heterogeneity of breast cancer, however, impedes the selection of the best treatment for a patient. “These tumors exhibit a wide range of gene expression and prognostic indicators,” noted Dong-Qing Wei, PhD, professor of bioinformatics and life sciences and biotechnology at Shanghai Jiao Tong University in China, and his colleagues.⁶ So, these scientists combined three sources of information—“human breast cancer cell lines and drug sensitivity information from Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC) databases”—and analyzed it with ML-based methods. From this work, Wei and his colleagues identified six drugs that worked well against the cell lines and five biomarkers impacted by these drugs or radiation treatments. As a result, Wei’s team reported: “The proposed biomarkers and drug sensitivity analysis are helpful in translational cancer studies and provide valuable insights for clinical trial design.”

Combining datasets

Histology is one of the oldest tools for diagnosing cancer. Peering into microscopes, histologists have explored samples for signs of cancer since the work of German biologist Johannes Peter Müller in the 1830s. Despite nearly 200 years of work, histological analysis of cancer can still improve, and AI might drive more advances. As a team of scientists from Fudan University Shanghai Cancer Center noted, DL can “predict biomarkers with high performance from cancer pathology slides.”⁷

Today’s AI-based tools can combine cancer-related imaging with a much wider range of information. In a review article, for example, Sakda Khoomrung, PhD, associate professor of metabolomics and systems biology at the Siriraj Metabolomics and Phenomics Center (SiMPC), Mahidol University in Bangkok, Thailand, and his colleagues stated: “Recent studies suggest that a DL-based approach that integrates a broad range of datasets, including histology, magnetic resonance imaging (MRI), X-rays, and chromatograms, along with multi-omics data can significantly improve the accuracy of diagnostic models for cancer.”⁸

Khoomrung and his colleagues are pushing AI-based image analysis even farther into analytical opportunities. This team, Khoomrung says, is “exploring the transformative potential of deep learning in converting non-image metabolomics, together with associated metadata, into imaging data, with a focus on feature extraction and the development of classifier models.” For example, Khoomrung and his colleagues applied imaging metabolomics, in which samples are interrogated with mass spectrometry, to chronic kidney disease.⁹ Similar approaches could be applied to cancer-related biomarkers.

Other techniques also provide images—or at least graphs—that can improve the pursuit of cancer-related biomarkers. As an example, Yafeng Qi, PhD, assistant researcher of biomedical engineering and biomedical photonics at Peking University in Beijing, China, says, “Raman spectroscopy can reflect the subtle changes of tissue and cell biochemistry from the molecular level, and allows rapid, noninvasive, label-free and high spatial resolution acquisition of biochemical and structural information through the acquisition of point spectra or spectral images, making Raman spectroscopy a useful tool for cancer-related biomarkers.”

To get the most from information collected with Raman spectroscopy, AI helps scientists process the data. “At present, with the development of Raman technology, obtaining large amounts of Raman data is no longer a problem, however, the significant increase in data volume makes it difficult for conventional methods to calculate and extract subtle variations in complex hidden features in big data,” Qi says. “Luckily, AI happens to be a suitable method for processing large amounts of data, especially for Raman images.” Consequently, Raman spectral data can be processed with various AI methods to identify biomarkers related to cancer.

Qi and his colleagues started by analyzing Raman spectral data from healthy and cancerous tissues with conventional multivariate statistical analysis, but that didn’t produce very accurate distinctions between the samples. “Therefore, we came up with the idea of using deep learning,” Qi says. “For deep learning, however, two-dimensional images as input would be a better choice than one-dimensional data.” So, Qi’s team developed a novel method to convert Raman signals into a two-dimensional form.^{10, 11} “This novel method combined with deep learning yielded excellently accurate diagnosis of lung tissues,” Qi says. “Subsequently, we extended the transformation approaches of converting the 1D Raman spectroscopy into 2D figures and proposed a new concept called the Raman encoding figure.”^{12, 13} From this, Qi and his colleagues developed new methods of AI-based analysis of Raman data, and these produced 95% accuracy in distinguishing healthy from cancerous samples.

Sizing up cancer in space

Location distinguishes one cancer from another. To look for the most useful biomarkers, though, scientists need to know more than where the cancer started and even where it spreads. The processes inside and around a tumor might reveal a more detailed picture on the molecular scale, which can point out potential biomarkers.

“Spatial biology platforms—like spatial transcriptomics and multiplexed immunofluorescence—capture the immense complexity of the tumor and the tumor microenvironment by labeling diverse cell types and functional states, allowing scientists to better understand how and why therapies benefit some patients but not others,” says Bloom of Nucleai. “By deciphering complex tissue architectures and cellular interactions, we hope to uncover the key determinants of response, enabling us to develop more accurate biomarkers and diagnostic assays.”

As one example, Bloom says, “There is an urgent need to better identify patients who derive benefit from immunotherapy.” Among the many immune-based approaches to treating cancer, some of the most promising are based on checkpoint inhibitors, which are molecules that prevent cancer from putting the brakes on the immune systems attack on the disease. “Most indications require a companion diagnostic test, like PD-L1 expression or tumor mutational burden, but better biomarkers are clearly needed,” Bloom says. “Our deep-learning models allow us to understand the interaction of the patient’s tumor and immune system, leading to several promising new biomarkers.”

Really learning from these new biomarkers, however, depends on more than simply identifying them. The location of these biomarkers—where they are in a patient’s body—can also matter. As an example of that, Bloom says: “The proximity and density of PD-L1 positive lymphocytes relative to tumor cells has been shown to correlate with patient outcome as has the quantification of lymphocytes within the tumor parenchyma.” He adds, though, that “these findings have yet to be confirmed in randomized clinical trials.”

Turning spatial biology into an even better tool for finding biomarkers will depend on some improvements. One of those, according to Bloom, is the need to directly identify a wide range of cells and cell states from samples stained with hematoxylin and eosin, often simply called H&E-stained cells. “Deep learning, based on a pathologist’s annotations, is too limiting, restricting observations to easily identified cell types like tumor cells and lymphocytes,” he says.

These spatial-biology tools of tomorrow must also identify many more of the cells in a sample. “Currently, many cells are simply not classified because they are not recognized as an object or cells are misclassified because overlapped cells are identified as a single object,” Bloom explains.

TissueCypher from Castle Biosciences uses AI to analyze samples from patients with Barrett’s esophagus to assess their risk of esophageal cancer.

Then, these tools should also be optimized to analyze cells exposed to various treatments or markers. Here, Bloom says, these tools should “quantify and localize the numerous stains that can be applied to pathology slides, including immunohistochemistry and in situ hybridization.”

Other companies also apply spatial biology to biomarkers of cancer. For example, Robert Cook, PhD, senior vice president, research and development at Texas-based Castle Biosciences describes this company’s TissueCypher as “the first AI-driven precision medicine test designed to predict a Barrett’s esophagus (BE) patient’s risk of developing esophageal cancer.” With this tool, immunofluorescence-based spatialomics provides early indicators of progression risk from tissue biopsies. Cook added: “AI synthesizes this information into an actionable risk score, risk class, and five-year risk of progression.”

Turning information into knowledge

It’s one thing for AI-based tools to dig out information from data, and another to turn that information into knowledge—actionable information that can be used to expand the understanding of the biology of cancer and then improve patient care.

Wisecube, headquartered in Kirkland, Washington, might help with that through its AI-based knowledge graphs, which it describes as “a structured representation of interconnected knowledge that captures relationships and semantics between entities, enabling advanced reasoning, inference, and data integration.”¹⁴ The company adds: “Using knowledge graphs for biomarker discovery and validation can enable a more accurate and meaningful interpretation of biomarker findings.”

Such a meaningful interpretation is crucial to applying new biomarkers. Scientists need to know what the biomarkers tell them about cancer and how that information can be used to better predict, treat, and eventually prevent cancer. AI-based tools will not produce magical cures for all cancers, but it is a powerful approach for the oncology community. Only time will tell just how powerful AI will be in identifying new cancer-related biomarkers.

Mansur, A., Vrionis, A., Charles, J.P., et al. The role of artificial intelligence in the detection and implementation of biomarkers for hepatocellular carcinoma: outlook and opportunities. Cancers (Basel), 15(11): 2928 (2023).
Rynazal, R., Fujisawa, K., Shiroma, H., et al. Leveraging explainable AI for gut microbiome-based colorectal cancer classification. Genome Biology, 24: 21 (2023).
Dwivedi, K., Rajpal, A., Rajpa, S., et al. An explainable AI-driven biomarker discovery framework for non-small cell lung cancer classification. Computers in Biology and Medicine, 153:106544 (2023).
Çalışkan, M., Tazaki, K. AI/ML advances in non-small cell lung cancer biomarker discovery. Frontiers in Oncology, 13: 1260374 (2023).
Yagin, B., Yagin, F.H., Colak, C., et al. Cancer metastasis prediction and genomic biomarker identification through machine learning and eXplainable artificial intelligence in breast cancer research. Diagnostics, 13(21): 3314 (2023).
Mehmood, A., Nawab, S., Jin, S., et al. Ranking breast cancer drugs and biomarkers identification using machine learning and pharmacogenomics. ACS Pharmacology and Translational Science, 6(3):399–409 (2023).
Zhang, C., Xu, J., Tang, R., et al. Novel research and future prospects of artificial intelligence in cancer diagnosis and treatment. Journal of Hematology & Oncology, 16: 114 (2023).
Mathema, V.B., Sen, P., Lamichhane, S., et al. Deep learning facilitates multi-data type analysis and predictive biomarker discovery in cancer precision medicine. Computational and Structural Biotechnology Journal, 21:1372–1382 (2023).
Mathema, V.B., Duangkumpha, K., Wanichthanarak, K., et al. CRISP: a deep learning architecture for GC × GC–TOFMS contour ROI identification, simulation and analysis in imaging metabolomics. Briefings in Bioinformatics, 23(2): bbab550 (2022).
Qi, Y., Yang, L., Liu, B., et al. Accurate diagnosis of lung tissues for 2D Raman spectrogram by deep learning based on short-time Fourier transform. Analytica Chimica Acta, 1179: 338821 (2021).
Qi, Y., Yang, L., Liu, B., et al. Highly accurate diagnosis of lung adenocarcinoma and squamous cell carcinoma tissues by deep learning. Spectrocimica Acta Pt. A: Molecular and Biomolecular Spectroscopy, 265: 120400 (2022).
Qi, Y., Zhang, G., Yang, L., et al. High-precision intelligent cancer diagnosis method: 2D Raman figures combined with deep learning. Analytical Chemistry, 94(17):6491–501 (2022).
Qi, Y., Liu, Y., Lou, J. Recent application of Raman spectroscopy in tumor diagnosis: from conventional methods to artificial intelligence fusion. PhotoniX, 4: 22 (2023).
Sajid, H. Knowledge graphs in biomarker discovery & validation. (2023).

Mike May is a freelance writer and editor with more than 30 years of experience. He earned an MS in biological engineering from the University of Connecticut and a PhD in neurobiology and behavior from Cornell University. He worked as an associate editor at American Scientist, and he is the author of more than 1,000 articles for clients that include GEN, Nature, Science, Scientific American, and many others. In addition, he served as the editorial director of many publications, including several Nature Outlooks and Scientific American Worldview.

Seventeen More Genes Identified That Drive Age-Related Mutations in Blood Cells

Genetic Link between Inflammatory Bowel Disease and Parkinson’s Confirmed

Intense Ultrasound Releases Cancer Biomarkers, Allows for Earlier Cancer Detection

Genetic Variant Linked to Cardiovascular Disease and Death in Black Americans

Takeda Signs Deal Worth $2.2 Billion with AC Immune for Alzheimer’s…

Aiming AI at Cancer-Related Biomarkers

Organs and more

Handling heterogeneity

Combining datasets

Sizing up cancer in space

Turning information into knowledge

Related Content

Intense Ultrasound Releases Cancer Biomarkers, Allows for Earlier Cancer Detection

Ultra-Low Energy MRI Shows Its Potential

DeepMind’s Next-Gen Protein Structure Predictor AlphaFold 3 Released

AI Improves Prostate Cancer Recurrence Prediction