Machine Learning Tool Maps Cancer Causing Mutations

Machine Learning Tool Maps Cancer Causing Mutations
Computer graphic illustration depicting a point mutation. A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a sequence of DNA or RNA. Point mutations have a variety of effects on the downstream protein product consequences that are moderately predictable based upon the specifics of the mutation. These consequences can range from benign (e.g. synonymous mutations) to catastrophic (e.g. frameshift mutations), with regard to protein production, composition, and function.

Researchers based at the Institute for Research in Biomedicine in Barcelona have created a machine learning tool that can evaluate all possible mutations in a given gene and assess how likely they are to contribute to cancer development.

The same group of researchers, led by Nuria López-Bigas, head of the biomedical genomics lab at the institute, have previously developed a method to identify as many genes as possible that are responsible for causing cancer and promoting its progression and spread around the body.

The new tool, BoostDM, goes further. “It simulates each possible mutation within each gene for a specific type of cancer and indicates which ones are key in the cancer process. This information helps us to understand how a tumor is caused at the molecular level and it can facilitate medical decisions regarding the most appropriate therapy for a patient,” explained López-Bigas in a press statement.

The previous work by this group of researchers, published in Nature Reviews Cancer last year, led to the creation of IntOGen an online database of information extracted from sequenced tumor samples from patients. The initial analysis of non-germline mutations of more than 28,000 tumors from 66 types of cancer led to 568 cancer genes and their likely methods of tumorigenesis being discovered. The tool is now available for researchers to use.

López-Bigas and team have also created another online tool aimed at clinicians and oncologists dealing with clinical decision making for cancer patients called the Cancer Genome Interpreter.

The BoostDM tool has now been incorporated into IntOGen and the Cancer Genome Interpreter to help improve their offerings. BoostDM was trained on specific sets of mutations in cancer genes to gain an understanding of how these genes and mutations affect different sets of tissues. It is able to learn over time by being continually exposed to more mutations in databases such as IntOGen and the Cancer Genome Interpreter, among others.

As described in the Nature paper about the tool, BoostDM was initially trained on 28,000 genomes representing 66 types of cancer. The researchers used evolutionary principles of positive selection to design the tool. In other words, mutations that drive cancer growth and development are found in higher numbers in samples than random mutations.

“We started from the premise that we only get to observe some mutations because the tumor cells with this mutation guide the development of the tumor, and we questioned what distinguishes these mutations from other possible mutations,” says Ferran Muiños, postdoctoral researcher and co-first author of the paper.

“Doing this analysis manually would be excessively laborious, but there are computational strategies that allow it to be organized systematically and efficiently.”

So far, BoostDM has already produced 185 models to identify mutations in different types of cancer. For example, it has identified all possible mutations in the EGFR gene that cause lung cancer tumors and also, in another model, glioblastoma brain tumors. Knowing about these mutations is useful for researchers, but also for medical decision making.

The researchers plan to continue developing and improving BoostDM, which should become even more accurate over time with the addition of more sequencing data from different tumors.