When AstraZeneca (AZ) announced in April 2016 its intention to sequence the genomes of more 2 million patient samples over the course of 10 years, it was enabled by the continuing steep decline in the price of sequencing. However, the more significant aspect of the ambitious effort is to build a bespoke clinical/research database that combines these sequencing data with the phenotypic and drug response data the company has developed over the years through its clinical trials program.
More than a year later, AZ’s Centre for Genomic Research (CGR) has tapped genomic data and informatics company DNAnexus to leverage its cloud-based platform for the management of these vast data sets that the company intends to apply across its entire research and development pipeline.
“The scale of data has increased rapidly and the tools that are required have increased in complexity over the past few years,” said Richard Daly, CEO of DNAnexus. “But more importantly, the value of genomic data that everyone is focusing on now is the ability to create much more complex data sets that relate genomic data with phenotypic data.”
The AZ genomic initiative will rely on extensive clinical data that will include up to 500 specific measurements per each patient that has participated in a clinical trial with the company. According to Ruth March, VP of personalized healthcare and biomarkers with AZ, the DNAnexus platform will allow the company to “continue progressing towards our ambitious goal of analyzing two million genomes to help us better understand the underlying causes of disease.”
According DNAnexus, the collaboration with AZ is via an open-ended agreement, and will allow the CGR to manage and analyze the massive amounts of data it will generate via the sequencing of thousands of patient samples per week. These data will then be available to AZ and its researcher collaborators globally via’ DNAnexus’s cloud platform.
In Daly’s view, the scale and scope of a project like AZ’s can only be managed via the cloud. “With the rise in value of genomic data, and the need to process really large data sets that once you get above a few hundred whole genomes, into the thousands and now into the millions, you can no longer do that on a local computer cluster. Some of the work our customers are running on our platform are using 50,000 cores, when even the largest genomic computer centers only have 8,000 cores.”
To give a point of reference of the explosive increasing in data velocity, Daly cites an agreement the company entered less than four years ago with the Human Genome Sequencing Center (HGSC) at Baylor College of Medicine to analyze the sequencing data of 14,000 patients—a project that was referred to at the time as “large-scale.”
“When we announced that collaboration, it was considered a breakthrough and it was based on 15,000 samples,” he noted.
Now, with the numbers moving into the millions, past work with leading research centers and the FDA under its belt, DNAnexus is out to prove that the ability to securely scale is at the heart of providing its service. “Four years ago, when we ran the 15,000, we did not have the same architecture underlying what we do now,” Daly concluded. “With the global network and architecture (we deploy), whether it is 1 million, or 10 million, scale is not a factor.”