University of Oxford researchers have discovered that taking part in genetic studies is encoded in a person’s DNA and leaves detectable “footprints” in the data.
The findings, in the journal Nature Genetics, could help increase representation in genetic studies and improve their design.
They could also improve the generalizability of research findings and help unpack how complex genetic traits related to health and disease.
“For genetic studies, we demonstrate that within the participants’ genotypes there is information that captures the effect of participation bias, the phenomenon where the participants are not fully representative of the target population,” researcher Augustine Kong, a professor at the Leverhulme Centre for Demographic Science at the University of Oxford, summarized to Inside Precision Medicine.
“Utilizing this previously unnoticed information leads to a better understanding of the nature of the bias and enhances the investigations of the relationships between genes and various human traits, health related and otherwise.”
Ascertainment bias occurs when data that have been collected are not representative of the intended study population, leading to misleading results.
This bias is particularly difficult to identify and correct for in genetic studies, said study co-author Stefania Benonisdottir, a DPhil student at Oxford’s Big Data Institute.
While a genetic variant that increases the chances of participating in a study might be more common in participants than others, comparison is not possible as the genotypes of non-participants is not known.
However, Kong and Benonisdottir exploited the idea that two closely related participants, such as siblings or a parent and child, would have an increased likelihood of inheriting part of the genome that increases the probability of participation.
By applying this to the U.K. biobank, they created a polygenic score demonstrating that participation correlated with body mass index and educational attainment.
This is consistent with previously reported differences between the UK biobank and general population, validating the method as capturing genetic associations relating to participation.
The polygenic score was also associated with being invited to participate in a follow-up study of diet, possibly because participants had to provide an email address to be contacted again in order to be invited into that study.
The estimated correlation between the genetic components underlying participation in UK Biobank and educational attainment was estimated at a substantial 36.6%.
But Benonisdottir noted that these correlations were far from being total, so “participation in genetic studies is not simply a consequence of other established traits but is rather a complex trait in its own right and should be studied as such.”
She added: “In short, ascertainment bias leaves genetic footprints within datasets and those footprints can be utilized to investigate the bias itself.
“By taking participation behavior into account we could improve future data analysis as well as improve future study designs.”
In an editorial accompanying the study, Mark Adams, PhD, from the University of Edinburgh, noted: “The future goal should be to put together genetically informative biobank health datasets that are an accurate representation of each population.
“In the meantime, by deducing and correcting non-representativeness in existing studies, researchers can enhance the generalizability of their findings and improve our understanding of complex genetic traits and their relationship to health and disease.”