In a collaboration between Regeneron Genetics Center and GlaxoSmithKline, researchers combine large-scale genomic analysis with biobank data review to discover rare, novel and clinically actionable gene variants within the population. “(The project) links genomic and molecular data to health-related data at population scale, allowing for more complete and systematic study of genetic variation and its functional consequences on health and disease,” explains first author Christopher V. Van Hout of the Regeneration Genetics Center.
Drawing from the UK Biobank, a repository of genetic and phenotypic data from over 500,000 people, the team selected 49,960 individuals and performed whole-exome sequencing (WES) over 39 megabases of the genome, including 19,396 autosomal and 82 sex chromosomes genes. “Exome sequencing allows direct assessment of protein-altering variants, whose functional consequences are more readily interpretable than non-coding variants, providing a clear path to mechanistic and therapeutic insights, as well as potential utility in therapeutic target discovery and validation in precision medicine,” says Van Hout.
The large-scale effort revealed approximately 4 million coding variants. Most (98.4%) of these are rare, present in less than 1% of individual, which are often overlooked in smaller studies. “These data greatly extend the current genetic resource, particularly in ascertainment of rare coding variation,” explains Van Hout. But rare does not equal unimportant—in fact, rare variants are more likely to have larger phenotypic consequences, according to Van Hout. By discovering these rarities hidden among the population, the researchers expose important contributors to disease traits and risk.
Among these are 231,631 predicted loss of function (LOF) variants—far more than expected. “(This is) a >10-fold increase compared to imputed sequence for the same participants,” says Van Hout. In fact, the team finds at least 1 LOF variant for almost all genes (97%) and over two thirds of genes (>69%) have 10 or more LOF variants dispersed among the population.
Some of these LOF variants have never been documented before. Examining UK Biobank phenotypic data, which includes primary care records, laboratory data, hospital episode statistics, whole body MRI imaging data and other measures, the team finds that these novel LOF variants have important functional consequences to patient health. “We discover novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits,” says Van Hout. These variants also point to molecular disease mechanisms that might be targeted by new precision medicine approaches. “LOF variation is an extremely important class of variation for identifying drivers of high genetic risk, novel disease genes and therapeutic targets,” Van Hout explains.
In addition, the researchers examine known disease-risk variants, such as BRCA1 and BRCA2, and asses their prevalence in combination with biobank data. In doing so, they generate disease risk profiles for patients with pathogenic variants and also reveal over two dozen novel disease linkages. “This resource is valuable for assessing variant pathogenicity, particularly for variants of unknown significance and novel variants, and in exploring the full spectrum of disease risk and phenotypic expression,” says Van Hout.
The team has made their data available to the scientific community through the UKB Data Showcase. “To our knowledge this is currently the largest open access resource of exome sequence data linked to health records and extensive longitudinal study measures,” says Van Hout. The resource will continue to grow and reveal new insights as the team continues their sequencing efforts, aiming to eventually cover all 500,000 UK Biobank participants.