A team of researchers has characterized thousands of genetic variants in 273 genes known to be medically important as a benchmark for future research and to help improve the accuracy of new studies.
Led by researchers at the National Institute of Standards and Technology (NIST), Baylor College of Medicine, and cloud-based data analysis and management company DNAnexus, the study was carried out as part of the Genome in a Bottle Consortium.
Overall, 273 of almost 400 “challenging autosomal genes” were sequenced and assembled using a haplotype-resolved whole-genome assembly. The team annotated 17,000 single-nucleotide polymorphisms, 3,600 insertions and deletions and 200 structural variations during their study.
“Some of these genes, which have previously been very difficult to access, are suspected to have some connection to disease. Others have very clear clinical importance,” said Justin Zook, a biomedical engineer based at NIST and co-lead researcher on the study, in a press statement. “SMN1, for example, is a gene we characterized that is directly associated with spinal muscular atrophy, a rare but severe condition.”
The idea of the consortium is to use new and established genome sequencing methods to create in-depth and highly accurate benchmarks of important genes and their variants to give researchers a guide to refer to when they are completing their own research.
“The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting,” explain the authors of the article describing this study published in Nature Biotechnology.
Structural variants can easily introduce large differences between a genome and a reference genome and it is easy for errors to creep in in the lab setting. To get round this problem, the researchers used high fidelity, or HiFi, sequencing, to better sequence longer stretches of DNA for genes with these challenging areas and variations.
“Instead of having a thousand-piece puzzle, where you have these little, tiny pieces that you have to put together, it’s more like having a hundred-piece puzzle where you have bigger pieces that you can put together,” said Justin Zook, a biomedical engineer based at NIST and co-lead author on the study, in a press statement.
The team also used a method known as hifiasm to assemble the stretches of sequence that avoids amalgamation of different sequences and helps maintain accuracy.
The researchers hope their work will help improve medical genetics research in the future. As well as spinal muscular atrophy, the researchers also recorded variants in genes linked to heart disease, diabetes, celiac disease and other conditions.
The benchmarked sequences are now freely available for researchers to use. To be useful, they would need to be compared with sequenced samples from the HG002 genome, which can be accessed from NIST.