Among the hundreds of mutations present in cancers, it is known that only a small number of these mutations contribute to the progression of the disease. Separating these so-called driver mutations of cancer from those that are inconsequential in the development of the disease—passenger mutations—is vital to identifying the genes that should be targeted for new therapeutic development.
Now, a team of scientists at the Massachusetts Institute of Technology (MIT) have developed a new computer model to rapidly scan the genomes of cancers cells to determine which mutations occur more frequently than expected in cancer cells—and, in turn, to suggest which mutations may be driving cancer development. Currently, roughly 30% of cancers have no known, detectable driver mutations of cancer. Their findings are published in Nature Biotechnology.
“We created a probabilistic, deep-learning method that allowed us to get a really accurate model of the number of passenger mutations that should exist anywhere in the genome,” said Maxwell Sherman, an MIT graduate and member of the researcher team that developed the tool. “Then we can look all across the genome for regions where you have an unexpected accumulation of mutations, which suggests that those are driver mutations.”
Identifying new driver mutations
While there are many known driver mutations of cancer influencing cancer development such as EGRF in lung cancer and BRAF in melanoma for which targeted therapies have been developed, protein-coding genes comprise only about 2% of the genome. The remaining 98% of the genome also contain mutations that can occur in cancer cells, the “noise” produced by some passenger mutations, which occur in high frequency, has been a challenge for researchers looking to better identify those that are promoting cancer development.
“There has really been a lack of computational tools that allow us to search for these driver mutations outside of protein-coding regions,” said . Bonnie Berger, the Simons Professor of Mathematics at MIT and head of the Computation and Biology group at the Computer Science and Artificial Intelligence Laboratory (CSAIL), and a senior author of the study. “That’s what we were trying to do here: design a computational method to let us look at not only the 2% of the genome that codes for proteins, but 100% of it.”
To tackle this problem, the MIT team applied deep neural network technology to train a computer model to seek out mutations in cancer cells that occurred more frequently than expected, using genomic data from 37 different types of cancer. Data for the project came from the Roadmap Epigenomics Project and an international dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG). The MIT model’s analysis of these data helped researchers create a map of the expected passenger mutation rate across the genome, such that the expected rate in any set of regions—down to the single base pair—can be compared to the observed mutation count anywhere across the genome.
“The really nice thing about our model is that you train it once for a given cancer type, and it learns the mutation rate everywhere across the genome simultaneously for that particular type of cancer,” Sherman said. “Then you can query the mutations that you see in a patient cohort against the number of mutations you should expect to see.”
Providing new targets
Within the vast non-coding regions of the genomes, the MIT team found variations that could be driving the cancers of an additional 5% to 10% of patients. One type of mutation that researchers determined could be driving cancer are called “cryptic splice mutations” involving introns, spacer elements that usually get trimmed out of messenger RNA before it is translated into protein. The team found that when the intron weren’t trimmed out in this form of mutation, their presence interrupted the activity of tumor suppressor genes, efeectively eliminating this form of defense against cancer. This new research showed that cryptic splice mutations occurred in roughly 5% of the driver mutations found in tumor suppressor genes.
The researchers suggest that an approach to treating patients with cryptic splice mutations could be via the use of antisense oligoneucloetides (ASOs), which could be used to provide a patch of the correct sequence over the mutated area of the DNA.
“If you could make the mutation disappear in a way, then you solve the problem. Those tumor suppressor genes could keep operating and perhaps combat the cancer,” said Adam Yaari, one of the lead authors of the paper. “The ASO technology is actively being developed, and this could be a very good application for it.”
Another region of noncoding driver mutations the team discovered is in the untranslated regions of some tumor suppressor genes. The tumor suppressor gene TP53, which is defective in many types of cancer, was already known to accumulate many deletions in these sequences, known as 5’ untranslated regions. The MIT team found the same pattern in a tumor suppressor called ELF3.
The researchers also used their model to investigate whether common mutations that were already known might also be driving different types of cancers. As one example, the researchers found that BRAF, previously linked to melanoma, also contributes to cancer progression in smaller percentages of other types of cancers, including pancreatic, liver, and gastroesophageal.
“That says that there’s actually a lot of overlap between the landscape of common drivers and the landscape of rare drivers. That provides opportunity for therapeutic repurposing,” Sherman said. “These results could help guide the clinical trials that we should be setting up to expand these drugs from just being approved in one cancer, to being approved in many cancers and being able to help more patients.”