A team of researchers led by scientists at the University of Washington (UW) has completed the sequence and assembly of the Western lowland gorilla, our closest evolutionary relative next to the chimpanzee—using new sequencing technology based on longer sequence reads that allowed missing genes and missing forms of genetic variation to be discovered for the first time.
These new technical advances in reading long DNA sequences could have widespread ramifications in not only better understanding primate evolution but providing valuable insight into the development of many human diseases.
Typical genome projects today are performed using next-generation sequencing platforms, which can rapidly, accurately, and cheaply produce short sequence reads that require computer algorithms to assemble the fragments back together. The program attempts to reconstruct the original genome by using the overlap between the sequence reads. However, the presence of long, repetitive DNA that is all too common among human and other primate genomes confuses assembly software and causes it to break the genome into tiny fragments.
“Such assemblies can be like Swiss cheese, with a lot of missing biological information in the gaps,” remarked senior study author Evan Eichler, Ph.D., professor of genome sciences at UW.
“These gaps are not random, but are clustered at sites of repeats,” Dr. Eichler continued. “If geneticists can't capture these repeats and determine structural differences in genomes, they have problems understanding the organization of genes and comparing genetic variation within and across species.”
The research team used Single Molecule, Real-Time (SMRT) sequencing technology, the assembly tools Falcon and QUIVER, as well as other techniques to generate long sequence reads—which were more than a hundred times the length of the most popular sequence technologies. The long reads allowed them to traverse most of the repeat regions of the gorilla genome during the assembly.
The result was a new gorilla genome assembly that was larger and had far fewer pieces. Instead of 400,000 fragments, there are now only 1,800 pieces. The average size of the genome fragments was 800 times larger with approximately 90% of all gaps in the original assembly closed.
The findings from this study were published today in Science through an article entitled “Long-read sequence assembly of the gorilla genome.”
“My motivation for studying human and great ape genomes,” Dr. Eichler explained, “is to try to learn what makes us tick as a species. I'd like to see a re-doing of all the great ape genomes, including chimpanzee and orangutan, to get a comprehensive view of the genetic variants that distinguish humans from the great apes. I believe there is far more genetic variation than we had previously thought. The first step is finding it.”
Among the areas where the researchers have seen intriguing dissimilarities between humans and gorillas are in genes associated with sensory perception, keratin (a skin protein) production, insulin regulation, immunity, reproduction, and cell signaling.
Patterns of genetic variation within the gorilla genome can provide evidence of how disease, climate change, and human activity affect lowland gorilla populations.
“I think the take-home message is that the new genome technology and assembly bring us back to the place we should have been 10 years ago,” Dr. Eichler stated.
Dr. Eichler and his colleagues believe that these advances are likely to contribute substantially to research on the genetic underpinnings of human disease, especially if more human genomes are sequenced using this technology.
“As medical researchers, if we depend only on short read sequences, there's a chink in our armor,” Dr. Eichler stated. “The work on the gorilla and other human genomes clearly demonstrate that large swathes of genetic variation can't be understood with the short sequence read approaches. Long read sequencing is allowing us to access new levels of genetic variation that were previously inaccessible.”
However, Dr. Eichler added that “at $80,000 a pop, the price is not yet right today for clinical sequencing of human genomes using the long reads. Given a few years of years of cost reduction and further advances in technology, I am willing to bet this is the way we will sequence human genomes to discover disease-causing mutations in the future.”