A new software tool appears to find the genealogic connections between continental ancestry and cancer from tumor DNA and RNA. Developed by researchers at Cold Spring Harbor Laboratory (CSHL), the software is designed to reveal links between cancer and race or ethnicity. A paper describing their work is published in Cancer Research.
“We know epidemiologically there is a connection between ancestry and/or race and cancer,” said senior author, Alexander Krasnitz, of Cold Spring Harbor Laboratory. For example, women of African ancestry tend to have higher rates of triple negative breast cancer than in other populations. “Cancers can look different in patients of different ancestry,” he added. For that reason, he and his team searched for a mathematical method to mine the vast amount of cancer research data already accumulated in databases and hospital repositories to search for ancestral genetic clues to cancer types.
This approach is not the first time that attempts have been made to link cancer type and ancestry. “But what was done previously is not necessarily optimum for ancestry inference because cancer distorts the data, it distorts the genome,” said first author Pascal Belleau of CSHL. “That creates a lot of difficulty and that’s what we hope this tool tries to overcome; to develop a tool to find the ancestry from the data sample.”
For their tool, known as Robust Ancestry Inference using Data Synthesis (RAIDS), the team relied on cancer-derived data to determine the patient’s ancestry even though that individual has cancer, and that cancer may have affected the genome. To do so, they studied various types of molecular data, including whole-genome, whole-exome, targeted gene panels and RNA sequences, from cancer patients in four different databases. The other genetic data used in the tool includes information from the 1000 Genomes data set, which includes genomes from African, East Asian, European, American, and South Asian ancestry. The molecular data from both datasets were grouped into profiles that could be compared to infer continental-level global ancestry.
At its core, the ancestral background of the profiled patient is replaced with one of any number of individuals with known ancestry. The data synthesis framework is applicable to multiple profiling platforms, making it possible to assess the performance of inference specifically for a given molecular profile and separately for each continental-level ancestry.
“This ability extends to all ancestries, including those without statistically sufficient representation in the existing cancer data,” the authors wrote.
When tested against four different cancer types—pancreatic, ovarian, breast, and blood—and three molecular profiling modalities, the team was able to infer continental-level ancestry of the patients. The team found the software matched their hybrid profiles to continental populations with more than 95% accuracy.
“This study demonstrates that vast amounts of existing cancer-derived molecular data are potentially amenable to ancestry-oriented studies of the disease without requiring matching cancer-free genomes or patient self-reported ancestry,” the authors noted. In addition, the team believes the tool can be applied to any molecular data from which ancestry inference is challenging.
Krasnitz and Belleau recently joined a colorectal cancer study in collaboration with Northwell Health and SUNY Downstate Medical Center. The study allows them to explore how colorectal cancer mutates genes in different ways depending on specific races or ethnicities. They hope to further refine their software to infer the ancestry of not only whole genomes but every individual sequence.
“If we can identify more localized ancestries that are susceptible to different cancers or other aggressive diseases, it could help us pinpoint the specific part of the genome responsible and target it for treatment,” Belleau said.