The Broad Institute of MIT and Harvard and Intel have announced a 5-year, $25 million research and development collaboration to optimize best practices in leveraging hardware, software and data analytics aimed at signfiicantly improving the analysis of diverse genomic data sets.
As a result of this investment, the partners will create the Intel–Broad Center for Genomic Data Engineering, where researchers and software engineers plan to build, optimize, and widely share new tools and infrastructure that will help scientists integrate and process genomic data. The Center will focus on three goals:
- Optimizing Broad's Genome Analytics Toolkit (GATK) best-practices hardware recommendations for genomic workloads for on-premise, public cloud, and hybrid cloud use cases
- Optimizing industry-standard Intel-based platforms, GATK, and other genomics software tools, such as the Broad’s workflow execution engine Cromwell, and GenomicsDB, a Broad-Intel solution for patient variant data storage and fast processing.
- Promoting more collaboration by healthcare providers, pharmaceutical companies, and academic research organizations through partnerships to develop workflow execution models across complex and distributed datasets.
The institute and Intel said they hope to enable researchers worldwide to run more data-intensive studies and generate robust results more quickly by accessing data that may have previously been unavailable to them.
“The size of genomic datasets doubles about every 8 months and, as it does, the challenge of acquiring, processing, storing, and analyzing this information increases as well,” Eric Banks, Ph.D., director of the Data Sciences and Data Engineering group at the Broad Institute, said in a statement. “Our work is a step toward building something analogous to a superhighway to connect disparate databases of genomic information for the advancement of research and precision medicine.”
Intel and the Broad Institute are building on a 2.5-year-old partnership, through which the partners earlier this year announced plans to co-develop new tools, and advance fundamental capabilities, so large genomic workflows can run at cloud scale. The new tools aim to simplify the execution of large genomic workflows such as GATK, as well as improve the storage, scalability, and processing of genomic data.
At the same time, Broad also launched collaborations with Intel and other cloud providers—including Google, IBM, and Microsoft—to enable cloud-based access to the GATK.