Researchers at the Broad Institute of MIT and Harvard University have created an online, freely accessible browser called Genebass to help other investigators and clinicians link rare genetic variants with disease phenotypes.
The data powering the browser is an analysis of exome data from almost 400,000 individuals participating in the UK Biobank project, which was published in the journal Cell Genomics this week.
The researchers carried out a systematic association study to assess links between 4,529 disease phenotypes and single variant and exome sequence data from 394,841 individuals. This data is available for searching and analysis on Genebass.
The team used cloud computing power to carry out large-scale data analysis, which would otherwise have taken longer and required high powered computing resources.
The UK Biobank analysis revealed that more severe mutations, such as those resulting in the loss of function of a gene, were more likely to be linked to diseases and phenotypes. They also identified some new associations such as a link between SCRIB gene variants and the integrity of neurological white matter thought to support brain function.
They also found that “the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection.”
“Building this data set and releasing it to the general public is exciting because while we haven’t discovered the next PCSK9 yet, someone else may, using this resource,” commented Konrad Karczewski, co-first author of the paper and a computational scientist at Broad, in a press statement.
The researchers now want to analyze and add more datasets to the browser, particularly those with more diverse participants such as the All of Us project. The UK Biobank is a fantastic resource, but has the disadvantage of being composed mostly of participants from White European backgrounds making the findings less applicable to other population groups.
As more data is added the usefulness of the browser can only improve, notes the team. “Using these fuller, broader datasets may give us a lot of interesting information in terms of better representation and better power for gene discovery,” explained Karczewski.