Researchers have created an open-access resource of genetic information and an algorithm that together could provide valuable tools for personalized medicine.
EN-TEx, outlined in the journal Cell, is the world’s most extensive catalog of genetic mutations known as allele-specific variants.
It deeply profiles approximately 30 overlapping tissues from the same four people, tracking mutations and recording their genetic consequences.
An algorithm subsequently derived from this can be used to predict the way in which genetic variants affect tissues and a person’s risk of developing particular diseases.
“It is very clear, for a long time, that the ideal would be to get everybody’s genome sequence and do the analysis of cause and effect [on] the variations as the basis of diagnoses and their treatment,” said senior researcher Thomas Gingeras, PhD, a professor at Cold Spring Harbor Laboratory.
“This is where medicine is going. And this is an attempt to provide a paradigm for doing that.”
Understanding the impact of genetic variants on parameters such as RNA expression or protein levels is important to functional genomics, the researchers note.
The Human Genome Project assembled one representative haploid sequence 20 years ago and, since then, many individual genomes have been sequenced.
Compared with the reference, an individual’s personal genome typically contains around 4.5 million variants. The vast majority of these are in non-coding regions and are most often present in the heterozygous state.
While many studies have attempted to associate genetic variants with phenotypic traits and changes in gene expression, most have used the generic reference genome and do not directly use variations observed in an individual’s diploid sequence.
Yet using a diploid genome can distinguish sequences from each of the two parental chromosomes—or haplotypes—that give rise to distinct molecular signals from each such as RNA expression or transcription factor binding.
In an attempt to improve on the current situation, the researchers created the EN-TEx, which provides epigenomes across tissues coupled with long-read genome assemblies to build generalizable models of the impact of different genetic variants.
“We mapped over a million allele-specific variants in each of the four sequenced individuals,” explained Gingeras.
“Our findings indicate that parts of the genome, called cis-regulatory elements, can be particularly sensitive to these genetic variants. Overall, EN-TEx provides rich data and models for more accurate personal genomics.”
EN-TEx includes 1635 datasets mapped to four personal genomes and represents a comprehensive catalog of allele-specific activity.
It provides a model for transferring known expression quantitative trait loci to difficult-to-profile tissues, such as the skin or heart. It is also a transformer model for predicting allelic activity based on local sequence context.
The researchers maintain that matching individuals and tissues in EN-TEx allows the relative contribution of inter-tissue and inter-individual variation to be precisely ascertained.
“We envision that in the near future, with the decreased cost of sequencing, generating a matched personal genome sequence as an accompaniment to each functional genomics experiment will become the norm,” they predict.
“Thus, the EN-TEx personalized epigenomics approach for analyzing the impact of genome variation will necessarily become commonplace, potentially providing benefits for precision medicine.”