Artifical intelegence DNA
Credit: Rost-9D / iStock / Getty Images Plus

Entire genome sequences for nearly half a million people have been released by the UK Biobank, representing the largest dataset of its kind in the world.

The resource has the potential to offer new insights into the causes of major common diseases and guide the choice of potential therapeutic targets.

It has hailed as a step change in genomics and is available to approved researchers around the world through the UK Biobank Research Analysis Platform.

“This is a veritable treasure trove for approved scientists undertaking health research, and I expect it to have transformative results for diagnoses, treatments and cures around the globe,” said UK Biobank principal investigator Sir Rory Collins, PhD.

Executive vice president for innovative medicine research and development at industry partner Johnson & Johnson John Reed, PhD, maintained the findings could pave the way for more efficient clinical development and drive progress towards precision medicine.

“This landmark dataset will enable us to leverage the power of artificial intelligence and machine learning for rapidly identifying novel disease targets and helping researchers predict how a candidate medicine might impact certain subpopulations of patients, based on their genetics,” he said. 

The UK Biobank whole genome sequencing (WGS) consortium was formed in 2018 with the goal of sequencing the genomes of all UK biobank participants.

The five-year project cost £200m, involved 11 partners and took 350,000 hours of sequencing time to create 27.5 petabytes of genetic data. At its peak, over 20,000 whole genomes, each with around three billion base pairs of DNA, were being sequenced each month. It resulted in the genomes of 491,554 UK Biobank volunteers being sequenced overall.

Half the funding came from the U.K. government and the Wellcome research organisation. The remaining £100 million was given by the biopharmaceutical and healthcare companies Amgen, AstraZeneca, GlaxoSmithKline, and Johnson & Johnson.

In return for their £25m investment, each of the four companies received a nine-month head start with the data before its public release.

The UK Biobank, a large-scale biomedical database and research resource, follows the health of half a million volunteers recruited in 2006 and has already provided numerous clinical insights.

Data collected on over 10,000 variables, including blood pressure, cognitive function, diet and bone density, have been studied to examine why having the same genetic predisposition for a disease can result in different outcomes, reactions and side-effects to identical treatments.

It has led to thousands of scientific studies being published, and major insights such as the discovery that Type 1 diabetes is as common in adults as children.

Executive vice president of research and development at Amgen David Rees, PhD, said: “This ground-breaking dataset allows scientists to explore how genetics affect levels of proteins, metabolites and other physiological factors,  more closely than ever before, promising to accelerate our understanding of the genetic underpinnings of disease.”

Chief executive of UK Research and Innovation (UKRI) professor Dame Ottoline Leyser, PhD, noted: “Researchers can now apply to access de-identified full genome data from half a million participants, alongside a rich combination of medical, biochemical, lifestyle and environmental data from volunteers involved.”

“Today marks an important milestone in UKRI’s commitment to realise the potential of genetics for biomedical research, innovation and translation to the clinic.”

Also of Interest