Genomic Data From 200,000 UK Biobank Participants Available to Researchers

Blue DNA double helix on top of a print out of genetic sequence letters (black on a white background)
[Source: © Mopic/]

The UK Biobank has made whole genome sequence data from 200,000 participants available to researchers online, the world’s largest single release of human genomic sequence data.

The release can be accessed through the Biobank’s newly launched Research Analysis Platform. This cloud-based platform was developed by DNAnexus and launched in September this year using Amazon Web Services. It allows secure access to participant data for approved researchers.

Today’s data release is just the first step in an ambitious project to sequence all 500,000 biobank participants funded by the big pharma’s Amgen, AstraZeneca, GlaxoSmithKline and Johnson & Johnson, as well as the Wellcome Trust and UK Research and Innovation, which directs research and innovation funding for the UK Government.

The sequencing done in this project was carried out by Icelandic sequencing company deCODE genetics and the Wellcome Sanger Institute in Hinxton in the UK, an organization famous for its contribution to the original human genome project. The final release of the additional 300,000 genomes is due in early 2023.

The UK Biobank is already widely used by life science researchers from around the world and this new data should make it even more invaluable as a resource. While some genetic data has been available from participants, this is the first time whole-genome data has become available.

By combining the lifestyle and clinical data already collected about these participants with the detailed genomic data, there is a lot of potential for new findings that could help advance precision medicine.

For example, uncovering previously unknown genetic variants that contribute to disease, and gaining new insights into how different diseases progress over time depending on genetics. It is also hoped this data will help develop new drugs by uncovering new therapeutic targets.

“The release of the first 200,000 whole genome sequences is a tremendous achievement, not only for UK Biobank, but also for the sequencing partners, deCODE Genetics and the Wellcome Sanger Institute,” said Michael Dunn, Director of Discovery Research at Wellcome. “The integration of the sequences with the other characteristic data sets from participants will create a powerful resource to enable major discoveries that will benefit health outcomes.”

The new analytics platform has helped to democratize access to this large data repository, as until recently researchers had to download de-identified participant data to carry out their research studies. This requires significant computer storage space and power, as well as associated technical resources. The new platform makes use of the Cloud and allows them to access and work on the data without needing to download it.

Costs for using the platform are low. For example, according to the platform website, cloud storage is currently billed at approximately $0.01 per GB per month, however, the organisers recognise that these costs can mount up and Amazon Web Services have “pledged $1.5 million in research credits to support access for more researchers from low- and middle-income countries and early career researchers.”

This site uses Akismet to reduce spam. Learn how your comment data is processed.