The National Cancer Institute (NCI), a division of the National Institutes of Health (NIH), announced today the launch of the Genomic Data Commons (GDC), a unified data system that promotes sharing of genomic and clinical data between researchers. The announcement was made during a visit from Vice President Joe Biden to the GDC’s operations center at the University of Chicago. An initiative of the NCI, the GDC will be a core component of the National Cancer Moonshot and the President's Precision Medicine Initiative (PMI), and benefits from $70 million allocated to NCI to lead efforts in cancer genomics as part of PMI for Oncology. The GDC will centralize, standardize, and make accessible, data from large-scale NCI programs such as The Cancer Genome Atlas (TCGA) and its pediatric equivalent, Therapeutically Applicable Research to Generate Effective Treatments (TARGET).
The GDC is being built and managed by the University of Chicago Center for Data Intensive Science, in collaboration with the Ontario Institute for Cancer Research, all under an NCI contract with Leidos Biomedical Research, Frederick, Maryland.
“With the GDC, NCI has made a major commitment to maintaining long-term storage of cancer genomic data and providing researchers with free access to these data,” explained NCI acting director Douglas Lowy, M.D. “Importantly, the explanatory power of data in the GDC will grow over time as data from more patients are included, and ultimately the GDC will accelerate our efforts in precision medicine.”
Together, TCGA and TARGET represent some of the largest and most comprehensive cancer genomics datasets in the world, comprising more than two petabytes of data (one petabyte is equivalent to 223,000 DVDs filled to capacity with data). Also, the GDC will accept submissions of cancer genomic and clinical data from researchers around the world who wish to share their data broadly. In so doing, researchers will be able to use the state-of-the-art analytic methods of the GDC, allowing them to compare their findings with other information in the GDC.
Moreover, data in the GDC, which represents thousands of cancer patients and tumors, will be harmonized using standardized software algorithms so that they are accessible and broadly useful to any cancer researcher. The storage of raw genomic data in the GDC will also allow it to be reanalyzed as computational methods and genome annotations improve. It is important to note that, in this era of heightened concern about data security and authorized access, the GDC has important safeguards to ensure secure data storage and downloading.
“Of particular significance, the GDC will also house data from a number of newer NCI programs that will sequence the DNA of patients enrolled in NCI clinical trials,” said Louis M. Staudt, M.D., Ph.D., NCI. “These datasets will lead to a much deeper understanding of which therapies are most effective for individual cancer patients. With each new addition, the GDC will evolve into a smarter, more comprehensive knowledge system that will foster significant discoveries in cancer research and increase the success of cancer treatment for patients.”
The hope is that the GDC will form the basis for a comprehensive knowledge system for cancer. GDC researchers will be able to integrate genetic and clinical data, such as cancer imaging and histological data, with information on the molecular profiles of tumors as well as treatment response. From this perspective, the GDC would become a valuable resource for generating potentially actionable and life-changing information that ultimately could be used by doctors and their patients.