During a panel discussion among scientists at the World Economic Forum in Davos, Switzerland, in January 2016, moderator U.S. Vice President Joe Biden asked for examples of obstacles researchers and clinicians face in the effort to cure cancer. While several topics emerged, the big issue was Big Data—more particularly, the collection, analysis, and application of Big Data.
The “Big” in Big Data may be taken to refer to the size of the datasets that are being amassed, or the importance of what these datasets, properly analyzed, might reveal. In either case, Big Data in practice amounts to the analysis of huge datasets to identify trends, find associations, and spot patterns.
Big Data is effective, some researchers say, because there is simply so much information available that can be analyzed. Large sample sizes, they point out, may reveal details that would normally go unnoticed in smaller sample sizes. Other researchers, however, contend that Big Data needs more than, well, lots of data. One such researcher is Keith Perry, the senior vice president and chief information officer at St. Jude Children’s Research Hospital in Memphis, TN.
When Mr. Perry was still working at the MD Anderson Cancer Center in Houston, TX, he was quoted in an institutional newsletter as follows: “Big Data is not just ‘big.’ The term also implies three additional qualities: multiple varieties of data types, the velocity at which the data is generated, and the [degree to which voluminous datasets are integrated].”
“Many of our databases currently don’t interface with each other because they’re generated by and housed in separate prevention, research, and clinical departments,” added Mr. Perry, contrasting the reality of these disparate structures with the potential of a centralized platform.
Another researcher who believes that size is not all that matters is Narayan Desai, Ph.D., a computer scientist at communications company Ericsson in San Jose, CA. He was quoted in a 2015 news article in Nature as follows: “Genomics will have to address the fundamental question of how much data it should generate.”
“The world has a limited capacity for data collection and analysis, and it should be used well,” he continued. “Because of the accessibility of sequencing, the explosive growth of the community has occurred in a largely decentralized fashion, which can’t easily address questions like this.”
For the rest of this article click here.