A software package called Tractor, developed by researchers at Massachusetts General Hospital, can help make genome wide association studies more diverse.
Many genome wide association studies or GWAS have been carried out over the last 20 years and a large amount of data on genetic variants that contribute to different diseases has been collected. This information is increasingly being used to develop genetic risk scores to predict how likely it is that someone will develop diseases ranging from cancer to Alzheimer’s.
However, the data collected in these studies is largely based on studies of people with white European ancestry, which means that any risk scores developed are less accurate for people from other ethnic backgrounds such as people of Asian or African descent.
“If you build disease-risk models on available data and attempt to extrapolate them to diverse populations, the accuracy of predicting who will get sick is reduced,” says Elizabeth Atkinson, Ph.D., lead author of the paper describing the research published in Nature Genetics and a researcher in the Analytic and Translational Genetics Unit at Massachusetts General Hospital.
“These errors exacerbate existing health disparities, in part because we aren’t finding specific gene variants that may contribute to higher risk of a particular disease in diverse populations.”
Part of the reason studies have been kept to these homogenous groups until now is that mutation patterns in different ethnic groups are influenced by human migration patterns from thousands of years ago and it has been difficult for researchers to correct for this in statistical analyses. For example, someone of African descent will likely have many more genetic variations than someone with European ancestry and these differences can be difficult to correct for when analyzing genomic data.
This is what freely available Tractor software tool was developed to assist with. It is designed to allow inclusion of diverse groups into GWAS studies without making data analysis difficult or inaccurate.
“Different ancestry groups have gene variants that occur at different frequencies due to the populations’ demographic history,” explains Atkinson.
“Not taking ancestry into account in a GWAS can lead to false-positive hits or to gene variants cancelling themselves out and dismissed as not important. So, until now, it’s been easier to exclude people with multiple ancestries from GWAS to avoid being confounded by different patterns of gene variants.”
Tractor uses genomic information to identify which ethnic ancestry sequence from individuals included in a GWAS comes from and labels it accordingly. This allows researchers to estimate ancestry-specific effect sizes from different mutations more accurately.
Atkinson and colleagues have tested the software in simulation and on population samples with different ancestries and found it was accurate and improved GWAS predictive power. It also allows researchers to estimate how much a specific variant influences disease risk in different populations by estimating ancestry-specific effect sizes, something not possible in a standard GWAS.
“Instead of getting a weighted average of the disease-risk effect size for a particular gene variant, Tractor can determine how large or small the effect of a variant is in various ancestry groups,” says Atkinson. “This will be informative for building genetic risk scores in diverse populations.”