An international, team of about 100 scientists in the Telomere-to-Telomere (T2T) consortium, have published the first complete, gapless, human genome sequence, adding a whole chromosome’s worth of previously hidden DNA—the missing eight percent.
The team of scientists were led by Adam Phillippy, PhD, Senior Investigator of the Computational and Statistical Genomics Branch of the National Human Genome Research Institute (NHGRI) and Karen Miga, PhD, assistant professor in the Biomolecular Engineering Department at University of California, Santa Cruz.
Some of this work appeared previously in a preprint on bioRxiv. But the work published today includes new content, including the first complete Y chromosome. In addition, the group is unveiling a new UCSC Genome Browser featuring all 24 #T2T human chromosomes.
Their results appear in six papers in Science and more than a dozen papers elsewhere.
Robert Waterston, PhD, professor of genome sciences at the University of Washington geneticist says, “There are no longer any hidden or unknown bits.”
When scientists declared the Human Genome Project complete two decades ago, their announcement was a tad premature. A milestone achievement had certainly been reached, with researchers around the world gaining access to the DNA sequence of most protein-coding genes in the human genome. But even after 20 years of upgrades, eight percent of our genome still remained unsequenced and unstudied. Derided by some as “junk DNA” with no clear function, roughly 151 million base pairs of sequence data scattered throughout the genome were still a black box.
“You would think that, with 92 percent of the genome completed long ago, another eight percent wouldn’t contribute much,” says Erich D. Jarvis, PhD, professor at The Rockefeller University, who helped develop a number of techniques central to unlocking the final pieces of the human genome. “But from that missing eight percent, we’re now gaining an entirely new understanding of how cells divide, allowing us to study a number of diseases we had not been able to get at before.”
For example, the team discovered unexpectedly high levels of genetic variation in centromeres and other regions—“a whole new treasure chest of variants that we can study to see if they have functional significance,” says Phillippy.
Some key advances that made this work possible are rapid improvements sequencing technology by Oxford Nanopore Technologies and Pacific Biosciences. In particular, the project got a boost when PacBio introduced a new sequencing machine which generated long-read sequencing reads that were greater than 99 percent accurate. “It was the last piece of the puzzle – like putting on a new pair of glasses,” says Phillippy. The PacBio technology couldn’t cover all parts of the genome equally well, but the scientists realized that by combining the long-read sequencing with the Oxford Nanopore data, they could fill all the gaps.
One key moment came when the team tried to assemble the most difficult regions of the genome—the highly repetitive DNA in the centromeres. The researchers realized that the algorithms for assembling the pieces couldn’t handle the repetition, but the human eye could. On the computer screen, the scientists saw where the different repetitive sequences had become tangled together. Then, they untangled it manually, “like untangling a string in your yo-yo,” Jarvis says. By summer’s end, the team had sequenced every chromosome.
The data offer “the foundation for a new era” in studying centromeres, says Miga, who co-led the T2T centromere satellite working group. Scientists will now be able to explore how this newly discovered variation contributes to disease, and how centromere DNA changes over time, she says.
The T2T results also point to more complex patterns of variation in genes that may have helped create the human species – and could explain our rapid evolution. The full genome sequence reveals that some genes associated with bigger brains are highly variable, Evan Eichler, PhD, professor of genome sciences at University of Washington, explains. One person might have 10 copies of a particular gene, while others might have only one or two. The mismatched genes can lead to “an earthquake” of gene alterations, Eichler explains. As a result, “these regions become a crucible for both rapid evolutionary changes and disease susceptibility, both within and between species,” he says.
The successful completion of a single genome is hardly the last word. Consortium members are already working to sequence a genome with different chromosomes inherited from each parent. They’re also beginning a pan-genome effort to read the entire DNA sequences of hundreds of people from around the world. “The goal is to create as complete a human genome as possible, representing much more of human diversity,” explains Jarvis, co-leader of the pan-genome effort.
But the new sequence is the indispensable first step, says Eichler. “Now we have a Rosetta stone for looking at complete variation in hundreds of thousands of other genomes going forward.”