In 2020, non-Hispanic Whites made up 57.8 percent of the population, according to the U.S. Census Bureau. In 2021, researchers estimated the proportion of samples from individuals of European ancestry they included in genome-wide association studies (GWAS) to be 86.3 percent. Researchers have now published the results of a GWAS using samples from the United States Department of Veterans Affairs (VA) Million Veteran Program (MVP) that includes more than 635,000 samples—29 percent of them from individuals of color, i.e. those who do not self-identify as non-Hispanic White.
“Diversity is critical in advancing genomic studies, providing foundational data for downstream implementation ranging from risk prediction to targeted therapeutics,” wrote a group of 66 authors in the paper appearing in the 19 July 2024 issue of Science. Anurag Verma, PhD, is the study’s lead author and Associate Director, Clinical Informatics and Genomics for Penn Medicine BioBank, University of Pennsylvania.
Verma and his co-authors pointed out in the paper that the lack of diversity in genomic studies remains a challenge, despite the existence of biobanks such as UK Biobank, FinnGen, and Biobank Japan. “To date, the MVP has enrolled its 1 millionth participant, with over 175,000 participants genetically similar to the African population, making it the biobank with the greatest representation of this population group,” they wrote in the paper.
According to the authors, the current study fills in crucial gaps in knowledge of the relationships between genes, traits, and disease across diverse populations. The findings underscore the importance of diversity in genetic studies and the need for expanding representation in future GWAS investigations. GWAS studies have provided foundational knowledge about the genetic basis of disease and have helped inform precision approaches in medicine for prevention and treatment. However, current GWAS have focused primarily on people of European descent, limiting the applicability of findings to more diverse population groups.
To address this, researchers used data from the MVP, which includes more than 635,000 participants—a third of which are from non-European genetic backgrounds, roughly double the proportional representation seen in the most recent GWAS datasets. Using this data, researchers conducted GWAS to analyze 2068 traits in participants from four population groups—African, Admixed American (genetically equivalent to those who identify as Latino), East Asian, and European—allowing the authors to characterize the genetic architecture of complex traits within diverse populations and compare genetic predisposition between population groups.
According to the findings, among 635,969 participants, the study identified 26,049 variant-trait associations across 1,270 traits. Overall, researchers discovered more similarities than differences in gene-trait associations between ancestry groups, demonstrating substantial similarities in the genetic architecture of the sample group. Notably, the analysis revealed that 3,477 variant-trait associations were significant only when individuals from non-European populations were included, highlighting the importance of genetic diversity in population-wide GWAS analyses.
Still, the current study has limitations, wrote Alice Williamson and Segun Fatumo in a related Perspective in the same issue of Science. Participants of the MVP are on average older than the general population and are eight percent female. “Nevertheless, these data provide a valuable complement to other large-scale biobank efforts and highlight the benefit of including more diverse populations in genomic discovery.”
The authors pointed out that other diverse biobanks are growing and the current study highlights the importance of these efforts. “Our comprehensive phenome-wide GWASs presented here underscores the increased power of discovery that comes from including individuals from diverse populations, enriching our understanding of the genetics of complex health and disease traits, while highlighting the large degree of similarities in genetic architecture of these traits across populations,” the authors wrote.