In genome sequencing, “complete” never really means 100% complete. The genomes that are called complete usually omit repetitive DNA segments, which are usually scanned in short, hard-to-piece-together segments. Not content with with this standard of completions, a large group of genomic researchers recently decided to try something different.
The researchers, representing the National Human Genome Research Institute (NHGRI) and other institutions, made a point of sequencing an X chromosome from end to end—from telomere to telomere—without missing any highly repetitive DNA.
“This accomplishment begins a new era in genomics research,” said Eric Green, M.D, Ph.D., the director of the NHGRI. “The ability to generate truly complete sequences of chromosomes and genomes is a technical feat that will help us gain a comprehensive understanding of genome function and inform the use of genomic information in medical care.”
The work described by Green was detailed July 14 in Nature, in an article titled, “Telomere-to-telomere assembly of a complete human X chromosome.” According to this article’s authors, finishing the entire human genome is now within reach.
“[The first gapless assembly of a human chromosome] was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation,” the article’s authors stated. “Focusing our efforts on the human X chromosome, we reconstructed the ~3.1 megabase centromeric satellite DNA array and closed all 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and cancer-testis ampliconic gene families (CT-X and GAGE).”
The “hydatidiform mole CHM13” refers to the effectively haploid CHM13hTERT cell line for sequencing. Selecting CHM13 for sequencing allowed the scientists to circumvent the complexity of assembling both haplotypes of a diploid genome.
Essentially, the scientists avoided sequencing two dissimilar X chromosomes from a normal human cell. Instead, they used a special cell type—one that has two identical X chromosomes. Such a cell provides more DNA for sequencing than a male cell, which has only a single copy of an X chromosome. It also avoids sequence differences encountered when analyzing two X chromosomes of a typical female cell.
The scientists also eased their burden by using technology that sequences relatively long sections of DNA. That is, the scientists opted for “long reads” over “short reads.” Long reads are easier to contextualize.
“Imagine having to reconstruct a jigsaw puzzle,” explained Adam Phillippy, PhD, senior author of the current study and a principal investigator and head of the Genome Informatics Section at the NHGRI. “If you are working with smaller pieces, each contains less context for figuring out where it came from, especially in parts of the puzzle without any unique clues, like a blue sky.
“The same is true for sequencing the human genome. Until now, the pieces were too small, and there was no way to put the hardest parts of the genome puzzle together.”
To assemble long reads into a complete human X chromosome, Phillippy and his team used a computer program that they had developed. They closed many gaps, but a large gap remained that corresponded to the X chromosome’s centromere. This gap, which included roughly three million bases of repetitive DNA, was closed by a team led by Karen Miga, PhD, a research scientist at the UC Santa Cruz Genomics Institute and the first author of the Nature article.
“We’re starting to find that some of these regions where there were gaps in the reference sequence are actually among the richest for variation in human populations,” said Miga. “So, we’ve been missing a lot of information that could be important to understanding human biology and disease.”
Filling in the remaining gaps in the human genome sequence opens up new regions of the genome where researchers can search for associations between sequence variations and disease and for other clues to important questions about human biology and evolution.
Of the 24 human chromosomes (including X and Y), the scientists chose to complete the X chromosome sequence first, due to its link with a myriad of diseases, including hemophilia, chronic granulomatous disease, and Duchenne muscular dystrophy.
“Although we have focused here on finishing the X chromosome, our whole-genome assembly has reconstructed several other chromosomes with only a few remaining gaps and can serve as the basis for completing additional human chromosomes,” the article’s authors noted. “Efforts to finally complete the human reference genome will help advance the necessary technology toward our ultimate goal of telomere-to-telomere assemblies for all human genomes.”
The current effort is part of a broader initiative by the Telomere-to-Telomere (T2T) consortium, partially funded by NHGRI. The consortium aims to generate a complete reference sequence of the human genome.
“We don’t yet know what we’ll find in the newly uncovered sequences,” Phillippy remarked. “It is the exciting unknown of discovery. This is the era of complete genome sequences, and we are embracing it wholeheartedly.”
Potential challenges remain. Chromosomes 1 and 9, for example, have repetitive DNA segments that are much larger than the ones encountered on the X chromosome.
“We know these previously uncharted sites in our genome are very different among individuals,” Miga emphasized, “but it is important to start figuring out how these differences contribute to human biology and disease.” Both Phillippy and Miga agree that enhancing sequencing methods will continue to create new opportunities in human genetics and genomics.