A team of scientists led by deCODE genetics, now part of Amgen, report the first findings from a project to collect whole genome sequence (WGS) from more than 150,000 individuals in the UK Biobank.
The UK Biobank is a much-used resource in human genetics research. Set up in 2006, it has recruited around 500,000 participants, aged 40-69 years at enrolment, who have had an array of health and medical information collected. From 2012 onwards, researchers were able to access anonymized data from the biobank for research purposes.
While genome-wide single-nucleotide polymorphisms (SNP), data and whole-exome sequencing has already been completed for many of the UK Biobank cohort, this focused largely on the small amount of the genome involved with protein coding.
The current project, the first report from which is published in Nature, is the first to complete whole genome sequencing (average depth 23.5x) for a section of the participants, which also includes so-called ‘non-coding’ regions and variation.
The deCODE-led team carried out whole genome sequencing of 150,119 biobank participants. “All variant calls were performed jointly across individuals, allowing for consistent comparison of results. The resulting dataset provides an unparalleled opportunity to study sequence diversity in humans and its effect on phenotype variation,” write the authors.
The results of the analysis revealed 585,040,410 SNPs representing 7.0% of all possible human single-nucleotide polymorphisms, according to the authors, as well as 58,707,036 insertion or deletion mutations.
Around 895,055 structural variants and 2,536,688 microsatellites were also identified, both types of variants not normally recorded in less in-depth sequencing studies.
While the UK Biobank is largely (85%) of European origin, there are a small number of individuals of South Asian (n=9252) and African (n=9,633) origin in the cohort. Collecting data on wide ranging variants in different population groups is important to help understand the importance of these variants when they are seen in clinical practice, as the frequency and type of genetic variants differs in different population groups.
“Data of this type and quantity are going to revolutionize our ability to identify and characterize intergenic sequences of importance to human diversity, be it to risk of disease and response to treatment or some other attributes,” said Kari Stefansson, neurologist and founder and CEO of deCODE genetics, and lead author of the paper.
The researchers will now make this in-depth sequencing data available to professional researchers on the UK Biobank research analysis platform. They have also made both SNP and indel mutation frequency data available to help identify variants that are clinically relevant.