IPM’s editor in chief Damian Doherty caught up with Josh Denny M.D., M.S., chief executive officer of the National Institutes of Health’s All of Us Research Program to get a current perspective on current progress and future goals. Josh has played an integral part in the development of the All of Us program from its early genesis, principally as a member of the National Institutes of Health (NIH) Advisory Committee and indeed the Director’s Precision Medicine Initiative Working Group, which developed the program’s scientific rationale. He led the program’s early prototyping project and acted as the principal investigator (PI) for the All of Us Data and Research Center.
Doherty: The All of Us project was launched in 2015 with the goal of sequencing 1 million genetically diverse Americans to enable a better picture of human disease and to improve diagnostic and drug development. Can you give us an idea of where we are to date in terms of numbers of individuals sequenced?
Denny: After an initial period of infrastructure development, All of Us launched participant enrollment nationwide in May 2018. So far, more than 335,000 of our participants have completed initial steps of the program, including providing biosamples for analysis, and we just released to researchers our first genomic dataset including nearly 100,000 whole genome sequences and more than 165,000 genotyping arrays. This dataset is so important – worldwide, less than 2% of genome-wide analyses have been on individuals of African ancestry or who identify as Hispanic/Latino. In our dataset, about half of our individuals identify as a race or ethnicity other than White, Non-Hispanic. In this first dataset, we’ve already observed 100 million variants occurring in 3 or more of our participants that are not currently in the genome-sequence aggregation database gnomAD. That really talks to the novelty of our population and will have very real clinical impacts.
And, of course, the dataset will only get bigger: we’ve generated sequences on more than 200,000 individuals now and are working on curating our next big data release, sometime in the wintertime. Meanwhile, we’re also gathering rich phenotypic data from participant surveys, electronic health records, and more, so researchers can study behavioral, environmental, and sociocultural influences of health along with genetics.
Doherty: You were appointed head of the project just as the world was beginning to grapple with an ensuing pandemic. How much did that stymie progress or did it give the project time to double down on other areas of development?
Denny: In early 2020, we were hitting our enrollment stride, but it was almost all in-person enrollment. For safety reasons, we paused all in-person activities in March, 2020. We had to pivot to create new opportunities for remote enrollment and retention, including mail-in saliva donation kits, a protocol for telephone-assisted surveys, and more. These new approaches to enrollment have been really important for us. Each week, we have hundreds of people enrolling and donating saliva samples in a completely contactless way.
We were also able to step back and evaluate how we could contribute to the scientific knowledge of COVID-19. In one study, we analyzed 24,000 stored blood samples contributed by All of Us participants across all 50 states between January 2 and March 18, 2020. Our analysis found evidence of SARS-CoV-2 infections in five states earlier than had initially been reported, suggesting a low level of community spread before it was generally recognized.
We also fielded a total of six COVID-19 Participant Experience (COPE) surveys to our participants, beginning in May 2020. These surveys tracked changes in participants’ daily life, health, and well-being throughout the pandemic. Coupled with participants’ electronic health records, information from Fitbit devices, and biosamples, this data allows All of Us registered researchers to delve deeply into the wide-reaching physical and emotional impacts of the pandemic.
While All of Us wasn’t designed for pandemic preparedness and response, COVID-19 helped underscore the value a large, diverse, and longitudinal dataset like ours can bring for research on emerging diseases.
Doherty: Are we at a stage where we are already able to mine the data of the sequences already completed?
Denny: Yes. We recently released our first genomic dataset, including nearly 100,000 whole genome sequences and 165,000 genotyping arrays. Our program is unique in that it curates and releases data as it is collected. We anticipate adding to this dataset twice a year, with our next large release of genomic data available by early 2023.
Doherty: How do researchers who are interested in gaining access to the data get started and what is the composition of that data?
Denny: We house the data on a secure cloud-based platform–the Researcher Workbench–to support broad access and collaboration. Currently, access is available to researchers from U.S.-based academic, nonprofit, and health care organizations that sign our data-use agreement. We intend to expand access in the future to engage for-profit, international, and other researchers. Eligible researchers can visit ResearchAllofUs.org to register for an account, complete our online training on responsible data use, and begin setting up workspaces.
On the platform, researchers can access robust, in-depth data to conduct a wide range of studies. This includes genomics, information from electronic health records, baseline physical measurements, and participant-provided data from surveys and wearable devices. All of this data offers researchers a more complete picture of health to better understand how genes can cause or influence diseases in the context of other health determinants. We remove personal identifiers from the data to protect privacy.
Doherty: How successful has the recruitment of individuals from ethnic backgrounds been and is hesitation to enroll an ongoing concern?
Denny: We aim to enroll participants who reflect the rich diversity of our country and have made great strides so far. Of our participants who’ve completed the initial steps of the program, about 50% are from racial and ethnic minority groups, and overall, 80% are from communities that have been historically underrepresented in research. That includes sexual and gender minorities, residents of rural areas, people with lower incomes, and older adults, among others. We work with a wide range of partners who are trusted in their local communities to support engagement efforts. Building relationships with communities and participants is at the heart of our program.
Doherty: What is so unique and compelling about how this data will transform our understanding of disease and how much will the open and collaborative approach, which the project embraces, play a part in accelerating research?
Denny: The most compelling aspect of our data is its diversity: diversity of the participants that have joined, and the diversity of rich, longitudinal data they have donated. Biomedical research, and especially genetics, has primarily been based on data from people who are of European ancestry. If we’re aiming to individualize health care, treatment, and diagnosis for all, we need to increase data across diverse populations and explore broader genetic variability.
Also, as you noted, we aim to foster an open, collaborative model for research by making data and tools broadly accessible, so more researchers can bring their questions to the data and contribute insights. Paired with our transparent approach to publicly publishing researchers’ project descriptions, we are providing a platform to engage and build on our communities’ collective efforts.
Doherty: How many sequencing centers are involved in the project and what is the sequence run rate per month on
average?
Denny: The program’s three genome centers generate the genomic data, including whole genome sequencing and genomic arrays. They process about 5,000 participant samples each week. Then, the data undergo a number of additional processing steps and quality checks. Clinical validation labs, part of our genome centers, will also be validating impactful genetic variants for future return of health information to participants.
Doherty: What do the participants receive in exchange for their enrollment into the study?
Denny: Participants who complete the initial steps of the program, including providing biosamples at one of our partner sites, receive $25. We offer information back to participants who are interested in receiving it. Right now, we’re returning information about genetic ancestry and traits. Later this year, we’ll begin to return information about hereditary disease risk and pharmacogenetics. So far, more than 200,000 participants have expressed interest in getting DNA results returned to them. We plan to share more information over time. The longer participants stay engaged in the program, the more they stand to learn.
Doherty: What would you say have been the significant challenges in the last seven years and on reflection are you happy with progress made thus far?
Denny: The first challenge, which was anticipated, was how we would enroll such a diverse population across the United States that matched our aspirations. We’ve been able to achieve this through the dedicated work of so many partners across the country, including large healthcare provider organizations, community engagement partners, health and patient advocacy groups, and so many other committed individuals. We’ve had participants and communities involved from Day one, through everything we do. I think we’ve made excellent progress on our enrollment goals so far.
COVID-19 has been the most significant challenge we’ve faced to our enrollment. I’m very proud of how our team responded to inform and protect our participants and staff, creating new approaches to engage and enroll participants. However, it has had a multi-year effect on enrollment, and despite the hard work of so many, we still aren’t back to our recruitment rates before the pandemic. We continue to learn and test new routes to allow people to contribute to the program.
There have been other challenges, some not unique. Data harmonization, logistics of collecting data and samples in harmonized consistent ways across 300+ sites, pioneering a new data access paradigm in a cloud-based model, generating and processing 100,000s of genomic sequences, figuring out how to return and support genomic information return… all these and more have been so adeptly handled by our teams. I’m really proud of their work, creativity, and stick-to-itiveness.
Doherty: Do you think we’ve learned lessons from COVID in terms of how data can be shared and harmonized in a secure way? Do you think All of Us has illuminated the power of what can be achieved with other projects namely N3C and OHDSI?
Denny: When we started All of Us, no one had ever sought to “liberate” full Electronic Health Records from the many parts of the U.S. health system to share within a centrally accessible research resource like ours. Thanks to the hard work of all our partners and healthcare centers in our program, this was accomplished, and rather rapidly. It laid the groundwork for programs like N3C, which really did something transformative, using the urgency of the pandemic to pull together data across the United States to tackle COVID-19 with unparalleled speed. OHDSI – and all the other common data model movements – have been key to making these harmonizations possible. There’s still a lot of work to do to make these data integrated, but we have made huge steps forward.
Doherty: Have you been collecting any COVID data in parallel with sequencing data from participants?
Denny: Yes. We have collected participant responses to the six COPE Surveys, and three shorter minute surveys that focus on COVID-19 vaccines. Additionally, we have electronic health record data that includes information about COVID-19 symptoms, tests, diagnoses, and treatments. We work to make this information readily available so researchers can learn more about COVID-19 and its long-term effects.
Doherty: COVID has certainly illuminated some of the shortcomings in our health systems in terms of access and inclusion for minorities. You recently made an announcement on a new initiative called UNITE. Could you explain what this is?
Denny: The UNITE initiative began in early 2021 as part of an NIH-wide commitment to address structural racism within the NIH-supported scientific community. All of Us senior leadership, as well as representatives from across NIH, are united in identifying barriers to inclusion, cultivating diversity among researchers, and being transparent about the systematic changes needed. The first step at All of Us was to establish a senior leadership position addressing these issues. In September, we hired our first Director of Health Equity. Dr. Martin Mendoza leads our efforts to improve inclusion in precision medicine research. It’s important for us to support this work so we can live up to our name–to be a research program that’s open to all, for the benefit of all.
Doherty: How problematic has standardization been in collecting EHR data across the many health institutions in the network?
Denny: The All of Us Research Program employs Observational Medical Outcomes Partnership (OMOP) Common Data Model Version 5 infrastructure to ensure feasibility and standardization across all program data types, including physical measurements, electronic health records, and participant-provided information. Data coming from different EHR sources are standardized and stored in a set of formally described tables with defined relationships. This allows data to be accessed and connected in many different ways by researchers.
Our teams are continually evaluating the EHR data for completeness and quality issues to equip researchers with best practices and tools for using these data resources effectively.
Doherty: Adverse drug responses kill thousands of patients each year and costs healthcare systems millions of dollars – how important will the All of US PGx findings be in helping build the business case that PGx testing is something we should be implementing as routine clinical care?
Denny: For many providers, the business case of pharmacogenetics is already clear. Our dataset will be valuable to researchers looking to study, at the population level, just how frequent specific variants are, as well as the real-life clinical impact of drug exposures.
Doherty: Clearly security and privacy are of paramount importance to the program are you working with commercial partners that can help you mitigate any risk from cyber hacking?
Denny: Yes. The security of the Researcher Workbench has been evaluated according to Federal Information Security Management Act standards and is regularly tested by both internal and external security experts. A centralized data system is easier to secure and permits better audit trails than systems that require users to download the data for analysis.
Doherty: Once the participants have completed their enrollment, are there any plans to collect other contextual data that could be layered into existing data, for example behavioral, microbiome, environmental etc.?
Denny: Yes, absolutely. We have a longitudinal design to enable ongoing data collection and the integration of new data types over time. Our initial data types were built into the platform based on the recommendations outlined in the Precision Medicine Initiative (PMI) Working Group Report to the Advisory Committee to the Director of the National Institutes of Health. Additional data and data types will be added based on the areas of greatest scientific need and interest.
One way we’ll do this is by supporting ancillary studies. For example, the NIH Common Fund’s new initiative, Nutrition for Precision Health, powered by the All of Us Research Program, will recruit a diverse pool of 10,000 All of Us participants to inform more personalized nutrition recommendations. These participants will provide microbiome samples and other dietary data that will be added to the Researcher Workbench after initial analysis. A major challenge in precision nutrition has been the inability to combine the many factors that affect how individuals respond to diet into a personalized nutrition regimen. By building this effort into the All of Us Researcher Workbench infrastructure, researchers will be able to study how an individuals’ genetics, gut microbes, and other lifestyle, biological, environmental, or social factors impact overall health.
In the future, the program also plans to integrate more environmental data. This could include adding data linkages to existing national datasets that collect environmental information, as well as collecting environmental data directly from participants.
Doherty: Reporting responsibly on genetic findings is fundamental to the success of a project like this. How will you be able to scale the level of genetic counseling to provide the potential support and guidance needed?
Denny: We are committed to returning genetic results in a meaningful and responsible way. To do this, we are working with our partners to ensure the appropriate resources are in place to coordinate a smooth process for returning results. This includes the ability to offer genetic counseling to all participants who want help understanding their results, at no cost.
Doherty: What would success look like in say 2030?
Denny: By 2030, we will have enrolled more than 1 million participants who reflect America’s diversity, cover the lifespan, and have shared health data, biospecimens, and other types of data. Subsets of these participants will also choose to enroll in Ancillary Studies, which will add new data to the program through partnerships with other NIH Institutes and Centers and other entities. An example of an Ancillary Study is the recently launched Nutrition for Precision Health, powered by All of Us, which will have completed the largest precision nutrition study by that point, complete with metabolic, metagenomic, and dietary assessments. By 2030, we will have also launched and completed an initial enrollment of children from birth through 18 years old, creating one of the largest pediatric cohorts in the world. A global community of many thousands of researchers will be publishing a steady stream of high-profile papers. While harder to predict, I expect we’ll have begun to see real clinical impact from some of the All of Us data at this point: better understanding of the genetic variants, using All of Us in their routine assessment; polygenic risk scores applicable to diverse populations; new drug targets beginning to see exploration; better understanding of the interactions of lifestyle, environment, and genetics that will lead to interventions and/or better screening guidelines, especially across different populations.
And a really novel impact for such a research study: I believe we will have directly helped some of our participants live healthier lives, especially through the return of genetic results. There could be up to 30,000 people who receive life-altering results from hereditary disease risk, and many more who may end up on altered medication regimens. I think we will have a formidable role in making precision medicine routine practice, as these All of Us participants start sharing genomic results to doctors across the country.
Ultimately, we want All of Us to become an indispensable resource to the research community. By 2030, we hope to be well on our way to making that vision a reality!