Data Medicine
Credit: National Human Genome Research Institute

It is estimated that in the next ten years, there will be as much as 40 exabytes of new genomic data generated. (An exabyte equals one billion gigabytes, or 1018 bytes). These data—combined with other data streams including electronic medical records, health and fitness apps, wearables data, location, and socio-economic data—are today being cobbled together by researchers looking for new medicines and new treatment regimens to drive the future of precision medicine. Researchers are hungry to get their hands on as much of it as they can to help drive these new discoveries and expose the vulnerabilities of chronic diseases. But to what extent is the research community working to secure these data while also protecting the privacy of the everyday people who have consented for their information to be used in research?

A good starting guide was implemented in May 2018 when the European Union made the General Data Protection Regulation (GDPR) enforceable within its jurisdiction. At its core, GDPR is a human rights law developed to provide individuals with the ability to control their own data and how it is collected and used. Since then, GDPR has served as the model for similar laws in countries around the world and, lacking any federal law in the US, has also served as a basic framework adopted by some states including the California Consumer Privacy Act.

A central feature of GDPR, and one that both guides and confounds the life sciences research community, is found in Article 17, the right to erasure, commonly referred to as “the right to be forgotten.” Essentially, this right allows any person who has once consented to have their data collected and used by third parties to rescind that consent with the assurance that all identifying data will be immediately erased.

As a policy, it is sound. In practice, it is much harder to do.

Eric Perakslis
Eric Perakslis, PhD
Chief Research Technology Strategist
Duke University School of Medicine

According to Eric Perakslis, PhD, chief research technology strategist of the Duke University School of Medicine (among other roles at Duke), the right to be forgotten is “one of the secret sauces” of GDPR, albeit one that is difficult to execute.

“If you take a US analogy of genomic data, we think about deidentifying a genome and deidentifying phenotypic data, then putting those two things together,” he said. “Well, if you really are complying with right to be forgotten, you really can’t deidentify anybody, because you have to be able to find them in the dataset if they want to be pulled out of it. It’s a wonderfully diabolical control, that throws a lot of things off.”

Finding methods to allow individuals to be forgotten is not an impossible task, though it likely requires a sea change in how data is collected and then provided to researchers. In order to flip the script on how data is used in biomedical research, companies are increasingly being built around a patient-centric model that first seeks to protect the privacy right of individuals and then builds structures on top of that which allow for researchers to analyze and query the data for their purposes.

Protecting privacy in clinical trials

The impetus toward ensuring patient privacy when they consent to enter a clinical study has obstacles. Research scientists are hungry for data and in many cases want to share the data from their work for the greater good. But there exists a natural tension between sharing that data publicly and the potential for reidentification of individuals used in the studies.

Borislava Pavlova, PhD, is senior director, head of clinical trial transparency innovation and data, at Takeda Pharmaceuticals and was the principal investigator of a June paper published in BMC Medical Ethics that examined Takeda’s performance in protecting patient privacy and identifying the most common factors that could lead to patient exposure. Broadly, Takeda’s internal analysis of clinical trial manuscripts, abstracts, posters, and presentations that were accessible publicly found that in 13% of Takeda clinical trial publications reviewed, individuals could potentially be reidentified or inappropriate data sharing could pose a data privacy risk to study participants.

“The use and disclosure of clinical trial data via scientific publications and their availability via open access allows for greater awareness within the scientific and medical communities, as well as for patients and patient advocates, thus also encouraging rapid advances in research,” Pavlova noted. “However, even when study participant data presented in scientific publications are anonymized (according to current data protection laws and standards), there is still a substantial risk that a specific individual/patient could be reidentified.”

In some cases, the research contained elements/factors that Takeda considers direct identifiers such as participant identifiers, initials, or study IDs. It also looked for indirect identifiers such as sex, age, or location and found that including more than two per individual could allow for potential links to other publicly available data which could then be used together to reidentify a study participant. In total, Takeda’s internal audit found eight manuscripts included subject direct identifiers and another 26 contained subject indirect identifiers, all of which had the potential to jeopardize the privacy of study participants.

While the patchwork of regulations around the world and in the U.S. provide some guidance to the research community to implement privacy protocols, the reality is most are inadequate. Further, many simply seek to ensure the protection of the data via stripping away data that may be revealing.

“Data anonymization and de-identification cannot fully prevent the reidentification of clinical trial participants. Studies have shown that reidentification of hospital discharge de‑identified data is possible using certain demographic parameters and that participant ID code, year of birth, gender and ethnicity could uniquely identify the genomic data of study participants,” the researchers of the BMC Medical Ethics report noted. “The fact that many supposedly anonymous data sets have been released and then reidentified has raised further concern about the confidentiality and privacy of participants in clinical trials.

Personal data described in scientific publications may also be used to reidentify study participants (whether patients or volunteers), and the risk is highest for patients affected by a rare disease or condition.”

Pavlova noted that Takeda has developed an internal process for deidentification of data and for the review of clinical research to ensure all possible measures have been taken prior to public release of its clinical trials data. While the Takeda researchers are unsure if other pharmas and biotech companies have similar protocols, Takeda’s method could be a potential blueprint.

“We recommend other pharmaceutical companies consider our approach or something similar, and benefit from implementing such a process, especially for publication materials focused on clinical trials in sensitive rare disease populations,” Pavlova noted.

Data security is not data privacy

LunaDNA was in the vanguard of a small handful of companies that recognized the standard model for data sharing in medical research—one that was rooted in institutional control of data—was not honoring or valuing individual rights. As a startup in 2017, one year before the implementation of GDPR, the company took an approach that put the individual, not the institution, at the middle to make decisions about how, when, and by whom their data could be used for research. LunaDNA also provided a method for sharing value based on how much, and what kind of data they would make available for researchers, whether it was simply answers to a health questionnaire, or access to whole-exome or whole-genome data.

Scott Kahn
Scott Kahn
Chief Information and Privacy Officer, LunaDNA

Scott Kahn noted that while its early efforts to provide a secure platform on which to build privacy protocols focused on the use of blockchain and distributed ledger technology, the blockchain technology had limitations that effectively precluded compliance with GDPR.

“With GDPR you have the right to be forgotten,” Kahn said. “The right to be forgotten equals delete. But distributed ledgers are immutable. You can see that there is a conflict.”

Recognizing this, LunaDNA set out to create its own data technology architecture, borrowing some of the best aspects of distributed ledgers, to enable privacy protection and to comply with GDPR’s data rights including the right to be forgotten. While the company doesn’t discuss its proprietary methods in too much detail, Kahn did say that key a component of their model disaggregates the data into packets for each person consented in a study, then provides it to researchers within LunaDNA’s environment. If, mid-study, a participant no longer wants to be included, the company can easily delete that patient’s discrete data.

Ardy Arianpour
Ardy Arianpour
Co-founder and CEO

Building a patient-centric model that provides data to the research community was also core to Seqster, a San Diego-based health technology company that has found a method to collect patient EMR data from disparate EMR vendors and combine it with other data including genomic, wearable, and other real-world data. According to Ardy Arianpour, company co-founder and CEO, the company has found success via its ability to stitch together disparate data sets and provide these data—often at a highly granular level—to pharmaceutical researchers within an environment that provides for patient privacy.

“The reason why we’ve been successful is because we don’t own the patient’s data,” Arianpour said. “As you can imagine, Seqster’s entire reputation and our existence has been built around the fact that patients own their data. We have been able to configure our operating system to ensure data security as well as individual privacy and data control.”

Seqster, too, found that the most effective way for it to provide for patient privacy and control of their own data was to build a proprietary technology. Beyond the baseline of separating personally identifiable information from clinical data, Seqster prevents bulk queries for this data and its encryption technique provides an individual encryption key for each user. Operating without a shared key prevents exposing all the data during a particular study, should the key be discovered, while thousands of individual encryption keys—though not invulnerable—provides protection in numbers.

The company is HIPAA compliant and a covered entity based on its handling of EMR data, but is also able to comply with GDPR’s right to be forgotten. “But these don’t really mean anything to the individuals who share their data other than the fact that you’ve got to make the system as secure as possible. You’ve got to make it as private as possible. And you have to deploy multiple different technologies,” Arianpour noted.

Of course, Perakslis has noted that no technology is either foolproof when it comes to protecting data, nor is it future-proof, so companies like Seqster and LunaDNA need to be continually vigilant about potential threats while also adjusting to and adopting new technologies as they become available.

His advice to companies looking to provide the highest standards data protection and privacy is straightforward: “I think that what I would ask companies to do, would be to test their controls, work with really good white hat hackers, and make sure their data stays secure over time,” Perakslis said. “Because people often think they have built the perfect fence, but we see too many times that white hat hackers can find a way to get in.”

Perakslis notes that protection of personal data by companies can help bolster existing non-discrimination laws like the Genetic Information Nondiscrimination Act (GINA), and what he perceives as a lack of interest by governments to further regulate genetic privacy and protecting personal health information. Still, it is a vital component of today’s health data and research landscape.

“Look, privacy in and of itself doesn’t have a ton of value, right? But bad things can happen when you lose it,” he concluded.


Chris Anderson was the founding editor for Security Systems News, and Drug Discovery News, and led the print launch and expanded coverage as editor in chief of Clinical OMICs, now named Inside Precision Medicine. 

This site uses Akismet to reduce spam. Learn how your comment data is processed.