A set of eight “super-variants” that strongly influence mortality risk in patients with COVID-19 has been uncovered by a team of researchers from Yale University’s biostatistics department. The team made this finding through an analytically “enhanced” genome-wide association study (GWAS) of over 1,700 patients whose data was included in the UK Biobank. While the report, posted Nov. 9, is still at the pre-print stage and has not undergone peer review yet, it describes a potentially powerful new analysis technique and its results may help explain why there is so much variation in the outcomes of COVID-19 patients, even when they receive identical treatment.
The senior author of the paper is Yale professor Heping Zhang, and the lead authors are Jianchang Hu and Cai Li, also at Yale. The paper, “Genetic variants are identified to increase risk of COVID-19 related mortality from UK Biobank data,” is available at medRxiv.
The authors point out that the severity of coronavirus disease 2019 (COVID-19) is highly heterogenous with some patients showing no symptoms at all while others may become disabled or die. Further, they note: “Studies have reported that males and some ethnic groups are at increased risk of death from COVID-19, which implies that individual risk of death might be influenced by host genetic factors.”
To help explain this trend, they looked for variants associated with mortality through a genome-wide association study (GWAS) using data from 1,778 patients infected with severe respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes COVID-19. Of these patients 25.03% (445) ultimately died. That first analysis, they report, did not identify any significant variants, “To enhance the power of GWAS and account for possible multi-loci interactions, we adopt the concept of super-variant for the detection of genetic factors,” they write.
This second type of analysis revealed eight “super-variants” (or supervariants) consistently identified across multiple replications as susceptibility loci for COVID-19 mortality. Super-variants, the authors write, are “a combinations of alleles in multiple loci in analogue to a gene.” In contrast to a gene, which represents a specific region on a chromosome, the loci contributing to a super-variant are not restricted by to any particular location in the genome.
The authors took this approach, they report, because development of COVID-19 is related to environmental exposure as well as genetics, but genetics could have a more obvious effect with respect to mortality. Also, because COVID-19 is a complex syndrome, outcomes could involve the interaction of multiple genetic factors. “Our analysis with super-variants enables leveraging gene interactions beyond the additive effects,” they write.
They focused their study on patients from white British ancestry. For their analysis, the team looked at more than 18,600,000 SNPS, representing 146 deaths and 402 survivors. They used local ranking and aggregation methods to identify the significant super-variants, and then applied random forest technique to rank the SNPs in terms of their importance. The top SNPs in each set were then aggregated into super-variants. Further detail is available in an earlier paper, for which Zhang was also senior author: Hu, J., et al., Supervariants identification for breast cancer. Genetic Epidemiology, 2020. DOI: 10.1002/gepi.22350
The super-variants the team found were on Chromosomes 2, 6,7, 8, 10, 16 and 17. They contain variants and genes related to cilia dysfunctions (DNAH7 and CLUAP1), cardiovascular disease (DES and SPEG), thromboembolic disease (STXBP5), mitochondrial dysfunctions (TOMM7), and innate immune system function (WSB1). The researchers add that, “It is noteworthy that DNAH7 has been reported recently as the most downregulated gene after infecting human bronchial epithelial cells with SARS-CoV2. In their paper, the authors discuss how these supervariants may impact mortality.
This study was made possible in part by the August 2020 UK Biobank release of COVID-19 test results from 12,428 people, including 1,778 cases of the disease. The datase included available healthcare data, genetic data, and mortality information, comprising, as the authors say “a unique resource and timely opportunity for learning the host genetic determinants of COVID-19 susceptibility, severity, and mortality.”