To dig out the most information from the rapidly expanding multi-omics datasets, they must be integrated. Nonetheless, combining information from genomics, transcriptomics, proteomics, and other omics creates an immense analytical challenge. Taking on that task, however, is well worth the effort. As Nina Gonzaludo, PhD, pharmacogenomics and human leukocyte antigens market development senior manager at California-based Pacific Biosciences, says, “Multi-omics approaches offer a more comprehensive view of biology and allow us to better understand the underlying connections and flow of information from the genome to the transcriptome, proteome, and beyond.”

Davide Chicco
Davide Chicco, PhD
Università di Milano-Bicocca

In essence, multi-omics approaches analyze biological samples in different ways that provide diverse types of information. As Davide Chicco, PhD, a assistant professor  at Università di Milano-Bicocca in Milan, Italy, puts it: “Having data from different omics sources means having multiple pieces of information regarding the same biological phenomenon, but from different perspectives.”

In an article about ways to steer clear of the pitfalls that exist in the integration of multi-omics datasets, Chicco and his colleagues used a photographic analogy.1 As Chicco explains it: “Different photographs of the same subject taken from different angles can produce a more thorough, complete view on the subject itself.” Likewise, he says, “Data from different omics sources, integrated together, can unveil enriched information and relevant data trends that otherwise would go unnoticed.”

Based on the potential to reveal more of biology’s unknown processes, Chicco says that “incorporating data of genomics, proteomics, metabolomics, metagenomics, phenomics, transcriptomics, epigenomics, and other omics areas is always a good idea, at least in principle, because the whole is greater than the sum of its parts here.”

Wet-lab challenges

Integrating diverse forms of data is rarely easy, especially in multi-omics, which creates a wide range of data from various techniques. “Multi-omics approaches have been hampered by requiring multiple assays, sequencing runs, platforms or technologies, making such studies costly and labor intensive,” Gonzaludo says. “Further, analysis approaches for multi-omics are still in development, as different platforms and technologies offer varying levels of resolution, data standards, and bioinformatics methods.”

Nina Gonzaludo
Nina Gonzaludo, PhD
Pacific Biosciences

Like most forms of technology, approaches to omics come with pros and cons. “Long-read sequencing can give high resolution into the human genome, as well as isoform-level information of the transcriptome,” Gonzaludo says. “However, mapping genome-wide changes to isoforms is still complex and can be computationally intensive.”

Alternatively, scientists can focus on specific genes or disease- or phenotype-related pathways, which simplifies the task. Here, though, Gonzaludo points out that “large-scale integration is still a work in progress.”

Overall, the task grows more complex as scientists delve deeper into omics. For example Gonzaludo says, “Adding additional layers of information, like with spatial sequencing or proteomics, can offer deeper insights into biology, but integration remains a key challenge.”

Analytical challenges

Even after scientists run all of the omics experiments that they like and collect giga- or terabytes of data, the information can only be integrated if it’s stored in an accessible and usable way. That’s rarely the case.

The first challenge is data format. With information coming from different data sources and collected with a range of technologies, “they need thorough standardization steps and harmonization steps before usage,” Chicco explains. “Sometimes researchers and developers overlook this phase, which is a mistake.” Actually, it’s a big mistake. “The data preprocessing, including standardization and harmonization, is often the most important step of a successful multi-omics study,” Chicco explains.

In addition, other scientists must be able to understand the data, which is not always possible. “The main problem I notice in multi-omics data integrated platforms is that often their data curators build them from their perspective, and not from the perspective of the users,” Chicco says. “Basically, the omics data platform developers often spend months or years to prepare a data resource that is perfectly usable and understandable—only by themselves.” Instead, Chicco notes that “omics data integration professionals should always work from the perspective of the users, by supposing how final users would take advantage of the resource.”

Addressing the issues

How can scientists resolve the key issues—format and perspective—that Chicco raises? One is easier to address than the other.

To resolve the issue of disparate formats, various open-source tools can be used to efficiently harmonize and standardize multi-omics data. “A person in charge of omics data integration should study them carefully and decide which tools to use appropriately,” Chicco says. “Moreover, in this step, best practices of computer science and bioinformatics should be followed, too: starting with a small data segment just to test the proper functioning of the software tools, using only automated scripts version-controlled with Git and GitHub and avoiding manual steps, and documenting everything.”

As most people know from politics, changing the perspective of people creates a complex challenge. The same is true in addressing the perspective of scientists regarding bioinformatic data. “Only an open mind, emotional intelligence, and professional empathy can help get the proper mindset to understand this issue,” Chicco says. “Awareness is the first step: integrated-data curators should comprehend that their developer-like mindset has surely influenced how the omics data platform was designed and arranged, and that they need to get a usage overview from external users.”

To produce such an overview of usability, a scientist could contact colleagues who did not participate in creating the data-integration platform. That way, a more unbiased assessment is possible. “Ask them to evaluate the platform’s usability, similarly to what is done through website usability surveys,” Chicco says. “Having an external usability survey about the omics data–integration platform would be extremely beneficial.”

Is AI the answer?

It’s rare these days to discuss high-level analysis of anything without considering the use of artificial intelligence (AI). In the integration of multi-omics data, as well, AI is commonly used.

Vasileios Stathias
Vasileios Stathias, PhD
Sylvester Comprehensive Cancer Center

For example, Vasileios Stathias, PhD, assistant director of data science at the University of Miami’s Sylvester Comprehensive Cancer Center in Florida, says, “We’re getting inspired by the advances in AI, especially multimodal AI, to find more creative ways to leverage and utilize information.” One of those ways is testing different AI architectures to automatically integrate various forms of data.

Picking the best AI-based approach, though, is difficult. “We’re in an information overload right now,” Stathias says. “There are so many AI models out there that it’s very hard to identify the right path.” To help with picking that path, Stathias suggests having standardized tests to assess the performance of the large number of AI models.

Moreover, AI does not give scientists a magical tool that can be applied without looking deeper into the potential pitfalls. Chicco believes that AI-based tools “should be employed with caution in this context: if correctly used, they can be powerful, but many of them lack interpretability.” Here, he recommends using tools built from explainable machine-learning models, like decision trees or k-nearest neighbors, because their “behavior can be interpreted by anyone,” he says. “At the beginning of a multi-omics data–integration project, I’d suggest avoiding black box models, such as artificial neural networks and deep learning-related stuff, that can obtain high performances but have a quite mysterious behavior, mathematically speaking.”

Maybe, though, the intelligence should be less artificial to get the most from integrating data from multi-omics studies. As Chicco says, “Rather than focusing on artificial intelligence, which is a trendy term nowadays, I’d recommend focusing on human intelligence.” As an example, he points back to building a data-integration platform from the perspective of a user.

Making use of multi-omics cancer data

Efforts to combine various forms of omics data to better understand and treat cancer started more than a decade ago. In 2011, for example, the U.S., National Cancer Institute created its Clinical Proteomic Tumor Analysis Consortium (CPTAC) to integrate genomic and proteomic data from a collection of cancers.2 Today, cancer scientists combine even wider varieties of data. According to Stathias, a CPTAC collaborator3, “Over the past five or six years, the advent of new technologies gives us the opportunity to really delve into the intricate molecular details of the cancer cell.”

In particular, Stathias points out the expansion of data types. Plus, advances in sequencing technology let users select between bulk or single-cell methods, as well as including the locations of those cells.4 “So, instead of aggregating all the cells together and getting the average readout, now we can have detailed cell-level information together with spatial information on how different cell subpopulations are talking to each other,” Stathias says.

Although much of the clinical analysis of tissue samples has always relied on histology slides, Stathias believes that such slides can reveal even more information. By using AI to analyze data from pathology slides and multi-omics data, Stathias says, “we could detect specific cancer mutations just by looking at the pathology slide data.” This example reveals how integrating different types of data—complicated as that is—could produce simpler and faster methods of clinical diagnostics. As Stathias notes, “Maybe in the future, you wouldn’t even need to sequence the patient for initial actionable information.”

A new type of twin

In manufacturing of various sorts, developers use a digital twin, which is an in-silico model of a process. A digital twin can be used in many ways, such as developing a process or analyzing and controlling a manufacturing line. By twisting the concept of a digital twin, it might change how scientists work with multi-omics data. That’s just what William Kearns, PhD, CEO, and chief scientific officer at Genzeva and LumaGene, hopes to accomplish.

William Kearns
William Kearns, PhD
Genzeva and LumaGene

“I’ve always been a crazy geneticist,” Kearns says. “I’ve always believed in genetics and DNA sequencing and things like that, but there’s more there than we see.” He hopes to unveil that hidden information with biomimetic digital twins, which he describes as an AI-based method that is “thinking like a human brain.”

To analyze samples from patients with endometriosis, Kearns has employed cutting-edge methodologies that incorporate clinical exome sequencing, phenotype-driven variant analysis, and tech innovator RYLTI’s Knowledge Engineering  Biomimetic AI Platform and Digital Twin Ecosystem. As Kearns notes, the patent-pending technology is leading to reducing time and cost of research in drug discovery and development.

As Kearns explains, limitations in traditional AI-based methods—particularly normalizing the data, excluding outliers, and the need for a training dataset—drove him to explore biomimetic digital twins. Here, all the data can be included, and there is no need for training.

Kearns and a team of colleagues—including ones from Brigham and Women’s Hospital of Harvard University, QIAGEN Digital Insights, and RYLTI—used a biomimetic digital twin to analyze phenotypic information and exome sequencing in samples from patients with endometriosis or endometrial-related diseases.5 “These are complex, multifactorial disorders where the pathophysiology of it is poorly understood,” Kearns says.

From the many DNA variants related to endometrial diseases, Kearns says, “it was just overwhelming.” So, he started looking at variants of unknown significance (VUSs) that are associated with endometriosis. From that, the biomimetic digital twin revealed variants that existed only in samples from patients with an endometrial-related disease. “We believe that these could be potential biomarkers for diagnostic purposes,” he says.

Tackling biomedical networks

Integrating omics can also be applied to various networks with a biological aspect. One of those is network medicine, which Joyce Hu, PhD, head of cancer biology at Sonata Therapeutics, described as: “biology-based, integrative approaches aimed at understanding the topology of complex disorders and the dynamic interplay of their various components, identifying cascades of causes and effects, and deciphering crosstalk among cells to suggest novel and more effective therapeutic approaches.”6

As one example, Feixiong Cheng, PhD, an expert in multi-omics at the Cleveland Clinic’s Lerner Research Institute,  and his colleagues applied network medicine to potential treatments for amyotrophic lateral sclerosis (ALS). “To date, the challenge to establishing effective treatment for ALS remains formidable, partly due to inadequate translation of existing human genetic findings into actionable ALS-specific pathobiology for subsequent therapeutic development,” Cheng and his colleagues reported.7 Through data integration, however, these scientists developed what they described as “a network-based multi-omics framework to identify potential drug targets and repurposable treatments for ALS and other neurodegenerative disease if broadly applied.”

Even the American Board of Precision Medicine aims to “transform the healthcare paradigm from fragmented, disease-centric, ‘one-size-fits-all’ sick care to one that incorporates a multi-omics, systems biology, and network medicine approach for proactive and personalized whole systems care.”8

Moreover, the integration of data from multi-omic studies can reveal more about biological networks. As one example, Rong Fan, PhD, the Harold Hodgkinson Professor of Biomedical Engineering at the Yale School of Medicine, and his colleagues pointed out that multi-omic analysis of single-cell data can be used to “elucidate diverse biological networks for each cell type.”9

Plus, Marieke Kuijjer, PhD, head of the computational biology and systems medicine group at the University of Oslo in Norway, and her colleagues reviewed the application of single-sample networks that combine a range of omic datasets. For one thing, these scientists noted that “approaches that measure multi-omics data in the same cell enable the identification of direct links between various omics data types.”10 Although this is a relatively new field, Kuijjer and her colleagues predicted that “modeling networks based on single-cell data should address challenges with sparsity, variability in sample size, and heterogeneity and how to accurately define cell types used for network modeling.”

Expanding data access

With the powerful benefits that could come from integrating multi-omics data, the question is: How can scientists improve this field. Chicco offers a prescription.

“I think it would be useful if researchers and professionals working on this topic worldwide decided to release their software in well-known, international bioinformatics projects—such as Galaxy, Bioconductor, or Bioconda—rather than releasing them on their own, isolated, forgotten websites,” Chicco says.

He adds pragmatic reasoning to this suggestion. “Often researchers of a particular university spend years to prepare a new integrated multi-omics data platform, and eventually they release it as a stand-alone website,” he says. “Sometimes they also produce a peer-reviewed publication about it but, since the website does not have much visibility, few users discover it and therefore employ it.” Over time, the platform practically disappears. “A few years after the release, the platform gets obsolete, and its website then gets dismissed,” Chicco explains.

That disappearance could be avoided. If researchers switched from releasing a platform on a stand-alone resource to one of the international projects that Chicco mentions, “it would be a completely different story,” he says. Then, a platform would more easily live beyond being published.

Although it takes time and energy to make an integration platform more widely available, Chicco believes that the effort is worth the while. As he says, “Having an omics data integration platform included in one of these resource catalogues would have a significant impact on the number of users who could take advantage of it.”

Optimizing the knowledge gained from multi-omics data depends on expanding access. “An African proverb says: ‘If you want to go fast, go alone; if you want to go far, go together,’” Chicco says. “It is true for bioinformatics as well.”

 

Read more:

  1. Chicco, D., Cumbo, F., Angione, C. Ten quick tips for avoiding pitfalls in multi-omics data integration analyses. PLoS Computational Biology 19(7): e1011224. (2023).
  2. U.S. National Cancer Institute. CPTAC.
  3. Wang, L-B., Karpova, Al., Gritsenko, M.A., et al. Proteogenomic and metabolomic characterization of human glioblastoma. Cancer Cell 39(4):509–528.e20. (2021).
  4. May, M. Sequencing spatially-located single cells. Inside Precision Medicine Apr: 7–11 (2024).
  5. Kearns, W.G., Stamoulis, J.G., Glick, J. et al. The applications of knowledge engineering via the use of a biomimetic digital twin ecosystem, phenotype-driven variant analysis, and exome sequencing to understand the molecular mechanisms of disease. The Journal of Molecular Dynamics (2024). doi.org/10.1016/j.jmoldx.2024.03.004
  6. Hu, J. Network medicines. Journal of Translational Medicine 21: 772 (2023).
  7. Yu, M., Xu, J., Dutta, R, et al. Network medicine informed multi-omics integration identifies drug targets and repurposable medicines for Amyotrophic Lateral Sclerosis. bioRxiv. (2024) doi: 10.1101/2024.03.27.586949.
  8. American Board of Precision Medicine.
  9. Baysoy, A., Bai, Z, Satija, R., Fan, R. The technological landscape and applications of single-cell multi-omics. Nature Reviews Molecular Cell Biology 24:695–713 (2023).
  10. De Marzio, M., Glass, K., Juijjer, M.L. Single-sample network modeling on omics data. BMC Biology 21: 296 (2023).

 

Mike May is a freelance writer and editor with more than 30 years of experience. He earned an MS in biological engineering from the University of Connecticut and a PhD in neurobiology and behavior from Cornell University. He worked as an associate editor at American Scientist, and he is the author of more than 1,000 articles for clients that include GEN, Nature, Science, Scientific American, and many others. In addition, he served as the editorial director of many publications, including several Nature Outlooks and Scientific American Worldview.

Also of Interest