Scientists from Cambridge University and University College London report they have trained an artificial intelligence (AI) model to examine DNA methylation patterns that has identified 13 different types of cancer with 98.2% accuracy. Their work, published in Biology Methods & Protocols, demonstrate that is may soon be possible to deploy AI-based technology in the clinic to allow for earlier cancer detection and treatment.
The investigators’ working AI model relies on testing of tissue samples and will require additional training on a broader set of samples before it is ready to be deployed in the clinic. But the researchers also note that an important characteristic of their work was the use of an “explainable and interpretable” core AI model that provides users with the reasoning its predictions are based upon. As such, it can help build and reinforce understanding of the key underlying processes that contribute to cancer development.
“Computational methods such as this model, through better training on more varied data and rigorous testing in the clinic, will eventually provide AI models that can help doctors with early detection and screening of cancers,” said lead author, Shamith Samarajiwa, PhD, a senior lecturer and group leader of computational biology and data science at Imperial College London. “This will provide better patient outcomes.”
The model developed by Samarajiwa and team showed utility in detecting solid tumors such as breast, liver, lung and prostate cancer, among others by examining DNA methylation patterns in a variety of different cancers. By comparing the methylation marks in cancer with those in healthy tissue the AI model is able to identify methylation signatures—observed patterns of aberrant methylation—that are indicative of specific cancers.
“CpG island promoter hypermethylation of tumor suppressor genes is an early neoplastic event in many tumors,” the researchers wrote. “In addition, global DNA hypomethylation can lead to chromosomal instability, activation of oncogenes and latent retrotransposons that promote carcinogenesis. Hypomethylation is seen in many cancer types, including cervical, prostate, hepatocellular, breast, brain, and leukemia.”
The researchers note that these hyper- and hypo-methylation patterns can serve as accurate biomarkers of cancers, if properly identified. Furthermore, they noted, “they are of particular use for early detection of cancer, as epigenetic modifications are some of the earliest neoplastic events associated with carcinogenesis.”
For their work, the investigators used methylome microarray data obtained from the The Cancer Genome Atlas (TCGA) and other sources, with at least fifteen normal samples analyzed for each of the 13 cancer types in the study. After training the models—called XGBoost—on these data. The team evaluated their robustness on several independent data sets that were more heterogeneous than the TCGA data.
“We demonstrated that XGBoost models are suitable for classifying a multitude of cancer types using only DNA methylation data as input. We additionally designed EMethylNET, a robust deep neural network that was able to generalize to most independent data sets,” the investigators noted.
Based on their results, the research teams contends that this approach can potentially be applied for the detection of hundreds of different cancer types. Their hope is to be able to apply the solid tumor approach they demonstrated in this study to DNA methylation in cell-free DNA for detecting cancer early via liquid biopsy.