DNA microarrays
DNA microarrays are used in biological research to simultaneously measure the expression of thousand of genes

A new statistical tool developed at University of Chicago makes it quicker and easier to find genetic variants underlying disease. The tool, described in a paper today in Nature Genetics, combines data from genome wide association studies (GWAS) and predictions of genetic expression to better identify disease-causal variants. Causal-transcriptome-wide association studies (cTWAS) uses a Bayesian multiple regression model and can account for multiple genes and variants at once.

GWAS is often used to associate genes with human traits, including common diseases. But most human diseases are not caused by a single genetic variation, but rather are the result of a complex interaction of multiple genes, environmental factors, and other variables. GWAS, however, only identifies association, not causality. In a typical genomic region many variants are highly correlated with each other, due to linkage disequilibrium.  

“You may have many genetic variants in a block that are all correlated with disease risk, but you don’t know which one is actually the causal variant,” said Xin He, PhD, associate professor of Human Genetics, and senior author of the new study.

 “That’s the fundamental challenge of GWAS, that is, how we go from association to causality,” he added.

Making the challenge even harder, most genetic variants are located in non-coding genomes, making their effects difficult to interpret. A common strategy to address this is using expression quantitative trait loci (eQTL)—genetic variants associated with gene expression.  Many methods have been developed to nominate risk genes from GWAS using eQTL data, but, they all suffer from the fundamental problem of confounding by nearby associations. 

In the new study, He and Matthew Stephens, PhD, professor of Human Genetics, developed cTWAS, which uses advanced statistical techniques to reduce false positive rates. Instead of focusing on just one gene at a time, the model accounts for multiple genes and variants. Using a Bayesian multiple regression model, it can weed out confounding genes and variants.

“If you look at one at a time, you’ll have false positives, but if you look at all the nearby genes and variants together, you are much more likely to find the causal gene,” He said.

The paper details genetics of LDL cholesterol levels. As one example, existing eQTL methods nominated a gene involved in DNA repair, but the new cTWAS approach pointed at a different variant in the target gene of statin, a common drug used to treat high cholesterol. In total, cTWAS identified 35 putative causal genes of LDL, more than half of which have not been previously reported. These results point to new biological pathways and potential treatment targets for LDL. 

The cTWAS software is now available to download from He’s lab website. He hopes to continue working on it to extend its capabilities to incorporate other types of ‘omics data, such as splicing and epigenetics, as well as using eQTLs from multiple tissue types.

“The software will allow people to do analyses that connect genetic variations to phenotypes. That’s really the key challenge facing the entire field,” He said. “We now have a much better tool to make those connections.”

Also of Interest