Hybrid-cloud companies Cloudian and ScaleMatrix, and genomic data analysis and management company OnRamp Bioinformatics have announced a partnership the companies say will significantly increase the speed of genomic data analysis while providing data handling and storage solutions that can reduce costs by as much as 50% compared to traditional cloud-based offerings.
The combined offering of the three companies draws on the data management and analysis expertise of OnRamp, with the object-oriented, on-premises storage models of Cloudian and ScaleMatrix. It allows research organizations—large and small—to maintain efficient in-house control of their own data.
“This solution combines everything needed—from sequencing to analysis to storage to compute—to collect information and form analysis on it in a single location, so you are eliminating the challenges of data moving around,” said Jon Toor, CMO of Cloudian. “We provide an infinitely scalable storage solution that sits behind the compute environment, which can sit behind the Illumina scanners and collect all of that information to make it readily accessible to the compute environment.”
Four-year-old, San Diego-based OnRamp Bioinformatics provides not only the informatics and computational power to the equation, but also a system for tiering which data derived from experiments should be retained, cataloged via metadata, or moved to the Cloudian or ScaleMatrix storage. Perhaps as important as the data retained, OnRamp automates the process of discarding unneeded data that can burden genomic IT and storage systems.
“The first part is understanding what data are valuable,” said Tim Wesselman, CEO of OnRamp. “We know which information you don’t need, so we just sweep it away while also efficiently managing the larger files. Another important step is our effective management of metadata around the applications. So we are also mindful of not only what should be moved to lower tiers of storage, that is to Cloudian, but also looking at what we can delete then recreate later, if needed, because of what we are able to record in our metadata.”
Effectively lessening the data burden without decreasing the value of the genomic data is only a part of the equation, said Wesselman, who previously worked in Hewlett Packard’s hyperscale computing division. OnRamp was initially working with ScaleMatrix using its on-premise computing offering before Cloudian joined to complete the picture.
“We discovered ScaleMatrix was super close to us and had an amazing, state-of-the-art data center—some of the technologies we had attempted to work on in the lab at HP, they had developed and patented,” said Wesselman. “But the storage costs were killing us. We were always pounding them, telling them we needed to bring the costs down, and that is when Chris [Orlando, co-founder of ScaleMatrix] said we needed to bring
For it’s part, Cloudian provides a significantly lower price of data storage compared to cloud services via its hybrid cloud object storage systems.
“We are object storage that sits in the data center. What is important is the scalability—it can grow infinitely to accommodate the data you are generating,” said Toor. “But from a cost perspective, one of the really appealing aspects of object storage is that we run on industry standard servers.
“So you think of the commodity servers which are produced in the 10-million-per-year quantities. We capitalize on that economy of scale to provide a storage environment which is inherently 70% less expensive than the traditional storage devices that sit in data centers today.”
All of which enables a service offering for providing valuable genomic data to a broader range of researchers. In Wesselman’s view, it is imperative that genomic research solutions move beyond the realm of the 15,000 bioinformaticians who currently ply their trade to the broader community.
“We have designed a system that is first for the biologists and researchers, so that they don’t need to answer questions more complex than ‘What is the build of the genome?’” he said. “The intuitive interface asks them the biology questions, they don’t select the pipeline. The analysis happens for them once all the data is there, because we know what they need to do.”
In order to deliver on the promise of an interface that can intuitively provide the data each scientist needs for their work, OnRamp brought on Jean Lozach as CTO, who previously held bioinformatics positions of increasing responsibility with Illumina.
Lozach said he has witnessed the evolution of sequencing and the use of sequencing data from the front lines, and noted the significant changes that have occurred in how genomic data is handled from the early days when researchers “wanted to keep everything inside the sequencer.”
He said: “People need to see next generation sequencing and proteomics and all the other systems as simply tools to use—it is nothing more than a tool. Thanks to Illumina it has become a very valuable tool, in terms of generating a lot of data. But [researchers] can’t worry about everything. They need to have confidence that we are doing things the right way, that we can effectively capture the right data and reanalyze it because we know all the parameters.”