With sequencing costs reportedly dropping to as low as $100 a genome, today’s medicine can enhance biological outcomes by harnessing data on genetic variation. A wealth of information on humanity’s responses to problems faced by some or all groups, such as aging, infectious diseases, diet, exercise, and smoking, is stored in the genomes of varied individuals. It goes without saying that gaining a better grasp of how our bodies have responded to these difficulties under different conditions could have a big impact on how we comprehend biology and how we design clinical interventions.
But as it currently stands, diversity, equity, and inclusion in race, demographics, socioeconomic status, and other factors on both sides of the table, researchers on one end and patients on the other, are lacking in biomedicine. To ensure that the stories that we discover encoded in our DNA — of history, health risk, and adaptation — represent us all, there must be equitable representation in both the workforce and patient research in genomics.
Stirring in diversity to patient sampling
According to Heidi Rehm, PhD, Chief Genomics Officer at Massachusetts General Hospital and Co-Director of the Program in Medical and Population Genetics at the Broad Institute, the data sets used to understand variation are more biased toward less diverse populations, which has several impacts. “If you’re doing a disease study and only enroll white people, you’re only going to find pathogenic variants in white people, and you’re not going to find pathogenic variants in black people, for example, and many other populations,” Rehm told Inside Precision Medicine. “Then you’re not going to understand the contributors to disease in black individuals as well as you will understand contributors to white individuals. You’re not as effectively building your knowledge base of the causes of disease. Without this data, we are hampered at interpreting variants either because we’re [impeded] in ruling them out as pathogenic or…in the evidence we’re building to prove them as pathogenic.”
Rehm says that the racial bias in data results in a higher rate of uncertainty on the clinical genomic testing results from underrepresented populations compared to those represented.
A few years ago, Rehm took part in a study looking at the rate of uncertainty in a few different diseases with high-volume testing, such as cardiomyopathy and hearing loss and showed that the positive yield was lower in those who were underrepresented and higher in European individuals. Inversely, the rate of inconclusive results was higher in underrepresented compared to Europeans. Rehm suspects this inequity in genomic certainty is in part due to the low rate of referrals for underrepresented individuals to specialty genomic clinics. When looking at the rate of referrals to specialty clinics, Rehm said that the representation of underrepresented individuals goes way down compared to the general population in that healthcare system. This data makes her consider that underrepresented individuals are not getting to specialists either because they’re not getting referrals or going to the appointments.
The root of this sampling issue starts with how genetic databases are assembled. Daniel MacArthur, PhD, who previously served as co-director of the medical and population genetics program at the Broad Institute, has sequenced tens of thousands of individuals and has spent years building large reference databases of non-pathogenic variation, which is essential for rare disease diagnosis. “When we first started sequencing rare disease patients, we would start to go through all the genetic changes we found in their genome and look them up in the existing databases of normal variation, and it was clear at the time those databases were just inadequate,” MacArthur told Inside Precision Medicine. “They weren’t accurate enough; they weren’t big enough; they didn’t have enough diversity.”
This sent MacArthur on a mission to build better reference databases of variation. In 2014, MacArthur and colleagues at the Broad launched ExAC (exome aggregation consortium), which, according to MacArthur, was based on 60,000 exomes and was the first big release of human variation that was accessible to anyone. “Once we built this big resource, we could take the list of every genetic variant that we discovered and how common it was in each of the different populations and put that up so that anyone could access it,” said MacArthur. “That free access was critical for it being so useful because it meant that any clinical lab anywhere in the world who was sequencing a patient could just go to our website and look up a variant they found and see how common it was.”
With time, MacArthur says that the resources started getting bigger and more types of data were included. Still, fundamentally the work stayed the same: generate clean data to make an extensive list of all the genetic changes made available for the whole world to look at so they can dig around and see how common each variant is. What followed were a series of later releases now called the Genome Aggregation Database (gnomAD), which aggregates and harmonizes exome and genome sequencing data from a variety of large-scale sequencing projects and makes summary data available for the broader scientific community. The only thing that changed was the number of people. “Now, we’ve got about 200,000 people in those databases, which still blows my mind to think about how far we’ve come in the course of the last 20 years — to go from having one genome to having all of these genomes that are publicly accessible is quite amazing,” said MacArthur.
Reaching the underrepresented
But Rehm and MacArthur are not satisfied with the current diversity of the gnomAD database. For Rehm, who is continuing the work on gnomAD, along witrh MacArthur and colleagues the project is seeking out more highly diverse populations to include in the dataset for interpreting variants. Typically, the gnomAD team includes any data sequenced at the Broad because it’s quite easy to get with requisite permission. However, that’s only the data generated at the Broad. “Other countries have more diversity than in the U.S., and there’s data that’s getting sequenced but never gets sent to the Broad,” said Rehm. “In fact, there’s data that can’t be sent to the Broad — it actually can’t physically leave the country that generated it due to national restrictions in certain countries. So, we are developing methods to try to teach those other countries and the genomic experts there to run the same pipelines and approaches that we are running on our data and to be able to generate similar data sets with aggregate results that can be shared alongside the gnomAD dataset”
At the Broad, Rehm also does a lot of rare disease research studies on individuals with a diagnosis. Through these studies, her team is working to recruit more individuals of diverse backgrounds. To do so, they’ve translated materials into different languages and are using social media platforms to reach geographically diverse regions so that no one has to travel, which is resource intensive. “We just send kits to their house and collect samples wherever they are,” said Rehm, who is trying to reduce the barriers to enrollment in studies. “You don’t have to be next to an academic medical center, for example. We also collaborate with colleagues who are recruiting individuals with neurodevelopmental disorders to enable diagnosis of individuals there.”
As the Chief Genomics Officer at Mass General Hospital (MGH) and through that role working on the issue of access to care — fewer patients of diverse backgrounds get referrals to specialty clinics — Rehm is trying to tackle genetic literacy with physicians at the primary care stage. Accordingly, MGH has launched an e-consult service targeted at primary care physicians so they can ask questions about genetic services including whether to refer a patient to a specialty clinic for genetic testing or counselling. In addition, MGH has also launched the Preventive Genomics Clinic so that individuals who are at risk for disease but don’t have a disease and therefore wouldn’t be referred to specialty clinics and might never get there could still get access to genetic testing. “We would see those patients at risk and target the primary care clinics for those referrals,” said Rehm. “We’re working on converting that to a fast track system, which is a virtual clinic to be able to see more patients in direct collaboration with primary care physicians who see the more diverse populations and address the very high volume needs.”
MacArthur, on the other hand, left the Broad because, although focused on trying to bring in lots of different groups of people to acquire data from as diverse a range of individuals as possible, the projects were limited by the data that was being generated by the genomics community as a whole. “The data we were pulling together represented hundreds of millions of dollars worth of sequencing data being generated, said MacArthur.
“The majority of it, at about 60%, was from people of European ancestry, and the non-Europeans included tended to come from only a relatively small number of groups from African Americans, of Latino or Hispanic ancestry, and East and South Asia. Many parts of the world weren’t represented, including most non-European communities living in my native Australia.”
At The Centre for Population Genomics at the Garvan Institute, the goal is to engage these underrepresented communities to ensure that they are partners in the research journey and that we can work together on understanding what they want to get out of being part of genomics, and that they understand the benefits of participation,” said MacArthur. “The project is just in the process of ramping up the phase one component of what we’re now calling ‘Our DNA,’ which will be this big project that will eventually bring together 7,000 individuals from a set of communities specifically selected because they’re not represented in these big international databases.”
According to MacArthur, academic research, in general, needs to do better in genuinely engaging with these underrepresented communities and ensuring that they understand what’s being asked of them. “There’s lots of social science literature on the attitudes of underrepresented communities about participation in research, and one of the interesting themes from that literature is that often these communities, often with good reason, have low trust towards academic research,” said MacArthur. “They’re worried about research data being misused in ways that might hurt them or their community.” But MacArthur has found that many of these communities are very optimistic about being part of research and are driven by design not to be left behind as the world moves in a particular direction. Researchers who are putting together big cohorts or resources will often design their recruitment strategies in a way that doesn’t deliberately exclude these other groups but increases the barriers to entry. The materials aren’t translated into suitable languages. There’s no upfront engagement to understand what they need to know to feel comfortable participating in this study. There’s no attempt to design recruitment in a way that will work with the lifestyles or focus on the places where these communities gather.
So, with the “Our DNA” project, MacArthur is ensuring to have spent enough time working with community representatives and focus groups with stakeholders. He’s meeting with these communities so that we understand what they want from the actual research process and what they need to be guaranteed to feel comfortable being part of that in terms of data security and sharing. “It’s not something academics are usually good at doing,” said MacArthur. It meant building a team with people with experience in social sciences and anthropology and communications, and community advocacy so they could drive this project. It’s also led to working with a new indigenous-led network of researchers, which is building up those same connections with the indigenous communities locally.
The world is moving in the right direction, and we’re seeing that happening now in many different countries. The All of Us project in the U.S. has been investing heavily in some of that engagement work, particularly with Native Americans, and in the U.K. there are projects like Genomic England, which have thought a lot about community engagement and inclusion. These projects may one day reach a point in scaling up where the question becomes how many people can be brought in for these communities and sequence and make that data available. And the bigger that number is, the more powerful those resources will be in changing the effectiveness of genomic medicine for those communities. MacArthur says that he is keen to ensure that there are enough resources in place to have cohorts that extend up into the tens or hundreds of thousands of people.
Not buying into the concept of diversity and inclusion in patient research is starting to have consequences beyond how it affects the global good. There are more direct effects on researchers, such as certain grants will not fund a study if there isn’t a plan for enrolling diverse populations. “You see much more deliberate approaches to ensure the recruitment and engagement of diverse individuals, as there are grants that will even designate a minimum percentage for recruiting underrepresented individuals,” said Rehm. “For the All of Us research program, roughly 80% of individuals in that program are underrepresented in biomedical research.” And for clinical research in general, the FDA’s Office of Minority Health and Health Equity (OMHHE) leads efforts to advance minority health and health equity-focused regulatory science research.
Weaving equity into the workforce
Not only is the representation of underrepresented populations coming up short on the patient research fronts in many countries, but the same also holds for the biomedical workforce, particularly (and ironically) in the genomics and genetics sectors. At the National Human Genome Research Institute (NHGRI), acting deputy director Vence L. Bonham Jr. is focused not only on the diversity of patients and study participants but of the healthcare workforce, particularly in genetics and genomics. “There’s a clear recognition that the field of genetics and genomics is not that diverse,” said Bonham. “It’s important from the perspective of innovation that you have individuals from different backgrounds to be part of the research being conducted in the field of genetics and genomics.”
To succeed in biomedicine, people from all backgrounds, especially those from underrepresented groups, must be fairly represented in both the data and in positions tackling scientific challenges and using new knowledge for the benefit of a society that is becoming more and more diverse. This phenomenon is not unique to biomedical research and science. Instead, it applies to several different areas of industry and business where a reflection that having individuals coming from different perspectives and minds provides opportunities for developing research activities and efforts within that space. This isn’t new information — the data has been around for over 15 years — and some studies are being conducted to gather more empirical data to go along with this statement of the importance of diversity regarding innovation.
When looking at the field of genetics and genomics, historically, scientists from many parts of the world have yet to be as engaged in this field. To both engage participants from different backgrounds and bring individuals with different expertise and perspectives, it is important to provide opportunities for everyone to participate in this science. Along these lines, The NHGRI has several different efforts and initiatives to expand the tent of inviting people to be part of this science, bringing in new faculty and students, and providing opportunities to expand those conducting genetics and genomics research.
A vital mission of the NIH and NHGRI is to train the next generation of scientists, so there are a number of different efforts going on concerning training. Bonham says that the NHGRI funds and supports various training programs, both institution programs and individual trainees, and seeks to train individuals in-house. Within the NHGRI’s training diversity and health equity office and other parts of the NIH, there’s an effort to increase diversity. “Our institute has an initiative that is currently open for solicitation for grant applications that we call the Diversity Genome Centers, and it is focused on providing opportunities at minorities serving institutions to build on genetics and genomics programs at those institutions,” said Bonham. “This is exciting with regards to building and expanding the tent of institutions involved in genetics and genomics research.”
The NHGRI also developed and published an action agenda in January 2021 that sets up the next decade’s approach to enhance workforce diversity. To be at the forefront of efforts to improve the diversity of the genomics workforce, the NHGRI Action Agenda for a Diverse Genomics Workforce has the following four major goals. The first goal is focused on the general public as well as kindergarten to 12th grade. “We recognize that you can’t just start in graduate school for individuals to go into genetics and genomics, so we seek to excite young people and families about genetics and genomics as careers,” said Bonham. The second goal is to take individuals who are undergraduates in college who are excited about genetics and genomics and provide training opportunities for them. “We have an important role as an institute in providing training opportunities for undergraduate and post-baccalaureate students,” said Bonham. “That is an important role to get individuals interested in independent careers ultimately. The third goal is to focus on helping graduate students, post-doctoral fellows, and individuals very early in their genetic and genomic careers to become independent investigators in different types of scientific enterprises, whether in academic institutions or industry. The fourth and final goal is to evaluate everything the NHGRI is doing and to change it as we go along. “This is a commitment that our institute has made, and our leadership is made that we see this as a long-term area of priority for our institute,” said Bonham. “We want to continue to work with industry, with academic institutions, with other government agencies, really trying to lift the boat for everyone and to bring more people into our science”.
Can machines solve genomic inequity problems?
Even with greater numbers of diverse research eyes, the data in genomics has reached levels of complexity and variability that make it hard for any human to process and analyze. So, what about the non-human? Artificial intelligence (AI), the simulation of intelligence in a non-living agent, is rapidly changing the world and impacting fields like genomics. The FDA is exploring how AI can be used to advance precision medicine, by predicting patient responses based on baseline patient characteristics. But even AI can be biased, resulting in spurious or even unethical and discriminatory conclusions when applied to human health data. Poorly represented data sets and a lack of diversity in the developer community can, just as with human eyes, lead AI to propagate disparity and bias for certain demographics, needs, and values, accruing the benefits of AI to the few instead of the many. The solution in the short term hinges on using more diverse data collection and AI monitoring as well as longer term structural changes in funding, publications, and education to address these challenges.
To ensure a path to genomic equity, data accessibility is paramount. Without data sharing, the potential to help diagnose and treat patients around the world shrinks. What’s more, there are concerns that we’re entering a world where the amount of data that’s being generated about human variation but isn’t shared will rapidly outstrip the information that is actually available to the world to help inform decisions. Some countries, for instance, are generating their own databases but then not sharing so that only people within the country can use them. That is a dangerous precedent because it means that everyone around the world who comes from the same backgrounds as people in those countries, misses out on the information that could be used to provide a diagnosis for them. To move towards a future for equitable genomics, there must be fundamental value rooted in the idea that the data generated is made accessible to the world, can be used for the benefit of the patients, and will not get arbitrarily locked away for nationalistic or commercial reasons under the pretense of data security and privacy. It is our moral obligation to figure out how to make genomics data as accessible as possible.
Jonathan D. Grinstein’s wonder for the human mind and body led him to an undergraduate education in Neural Science and Philosophy and a doctorate in Biomedical science. He has 10 years of experience in experimental and computational research, during which he was a co-author on research articles in journals such as Nature and Cell. Since then, Jonathan hung up his lab coat and has explored positions in science writing and editing. Jonathan’s science writing work has been featured in Scientific American, Genetic Engineering and Biotechnology News (GEN), and NEO.LIFE.