Since the beginning of the human genome project in 1990, sequencing technology today has evolved beyond all recognition. No-one knows this better than Jonas Korlach, Chief Scientific Officer of Pacific Biosciences and co-inventor of its revolutionary single-molecule real-time (SMRT) sequencing technology.
At the end of March, a paper published in Science by the Telomere-to-Telomere (T2T) Consortium announced that a complete human genome had finally been finished using PacBio’s HiFi sequencing technology. While technically, the first ‘complete’ human reference genome was released at the turn of the millennium, based on the work done on the Human Genome Project, it was only 92% complete with difficult to sequence heterochromatic regions, such as the telomeres and centromeres of the chromosomes and others remaining unfinished.
Korlach has followed and played a significant role in the development of sequencing from the 90’s. Starting his undergraduate education in his home city of Berlin at Humboldt University, he moved to the U.S. and became a graduate student in molecular biology at Cornell University. It was here that he developed SMRT sequencing, along with professor Harold Craighead and fellow graduate student Steve Turner, who went on to found PacBio in 2000.
Since the company’s first sequencing machine was sold in 2010, PacBio has gone from strength to strength and is now widely considered the world leader in long-read sequencing. This was cemented after the HiFi sequencing technology breakthrough in 2019. Indeed, Illumina, which has dominated the short-read sequencing market since the end of the official Human Genome Project had aimed to acquire and merge with PacBio, announcing the proposed acquisition in 2018. However, pressure from anti-competition authorities in the U.S. and U.K., concerned that the merged company would completely dominate the sequencing market, most likely caused the deal to fall through and the two companies announced it would not go ahead in 2020.
Instead PacBio is continuing its journey as a leader in high accuracy, long read sequencing and has recently ventured into the short-read sequencing world acquiring short-read sequencing biotech Omniome last year. Korlach spoke to Inside Precision Medicine senior editor, Helen Albert, about the evolution of SMRT, the development of PacBio, the long-awaited completion of the human genome project and the future of sequencing.
How did your journey into the world of SMRT sequencing begin?
I went to graduate school at Cornell University. When I got there in the fall of 1997, one of the first courses that I took was called biosynthesis of macromolecules. We went through all the enzymes, DNA polymerase, RNA polymerase, the ribosomes. The late 90’s was a really exciting time because high resolution crystal structures came out that provided a still picture of what these molecules look like.
It struck me that, in contrast, there was almost no information about the dynamics, about how they move in time. I got interested in method development and started thinking about what it would take to try to watch these molecules in real time. I went to what would be my graduate advisor, Professor Watt Web, and pitched this idea. I’m eternally grateful that he didn’t throw me out of his office, but instead said ‘my lab is known to tackle impossible problems’ and ‘let’s think about that.’ We ended up talking for over two hours.
He explained very quickly that there were no available microscopes out there that could do what I was thinking of doing. He suggested that we talk to Professor Harold Craighead, who was right next door, Steve Turner was his graduate student. We met in November ‘97, for the very first time. And that started the collaboration.
Getting something like that off the ground must have been a challenge. How was the transition from discovery to successful company?
It took a few years. The first task we had was to build a lab. I finished my PhD in 2003 and worked for a short time as a postdoc while helping to develop the technology. Shortly after we got the Series A funding…[in 2004] I officially joined the company. We went to California, founded a lab, and the first task was to replicate what we had done at Cornell. We built the setup and continued to work on making the sequencing work and building a system that you could ship.
We got to that point in 2010-2011. That’s when we became a commercial entity and built the first instrument. Since then, it’s been iterations on the hardware. Similar to other companies in the space, you then build systems that are ever increasing throughput, lower cost, greater ease of use and so forth.
Underneath that is the software and the chemistry. We just had the 11.0 software released. The big news there is that we can now also address the epigenome, we can now measure 5-Methylcytosine methylation at the same time as we measure the DNA sequence. With that, the Sequel IIe machine becomes the world’s first on instrument five phase sequencer. There’s been a lot of excitement around that.
When did HiFi sequencing really take off around the world?
At first, we didn’t have HiFi sequencing and we also had long noisy reads. It took until 2018–2019 because we had to develop the chemistry and the system to allow for really long reads. That really was the watershed moment. We had two worlds; many short accurate reads, and long noisy reads. With HiFi sequencing, you break that division, because you take the best of both worlds. You take the accuracy from the short read side, and the length from the long-read side and you put them together.
Every biologist knows that humans are diploid, we have two copies of each genome and they are not the same. This has been a challenge for the genomics community. Most of the previous genome assemblies were collapsed, so you’d get one sequence for the organism. But of course, that’s not the biological truth. With HiFi sequencing, what we’re seeing is that we can generate true diploid genome assemblies. Being able to sequence and phase the alleles, meaning separate the alleles and get one full contiguous sequence from the mother and the father, is so important for recessive disease diagnostics and for pharmacogenomics.
HiFi’s really powerful and has found so many applications. Because the data are so accurate, the bioinformatics is so much faster. Genome assemblies now take hours, instead of many days. The file sizes become smaller, so all the handling, transfer and storage, everything becomes so much cheaper.
How did PacBio get involved with finishing the Human Genome Project?
A significant portion of the genome got duplicated relatively recently in evolutionary terms, which means the sequence is very similar. There are mega bases of sequence that are there twice, or sometimes three times or more. That is not resolvable with the Sanger sequencing technology… HiFi sequencing gives you much longer sequence reads and they’re also highly accurate. This is required to really resolve and detect small differences between those pieces that exist in the human genome from these duplications. You also need long reads to get through the repetitive regions.
Of course, the draft genome was announced 20 years ago. But the scientific community realized that it was a draft genome and about 8% was missing entirely and there were other smaller holes. So, that is now what is described in the Science paper, that finally we’ve been able to sequence a human genome to completion, all the bases from start to finish, no gaps. The assembly was built on HiFi data, and then other technologies like those of Oxford Nanopore Technologies, Bionano Genomics and 10x Genomics were put on top of the HiFi genome to orthogonally validate and to make sure that it was correct. This is, for the first time, a book of complete instructions of what it means to be human.
Since the draft human genome sequence was published in 2000, there’s been lots of developments in the sequencing world such as those from Illumina and Oxford Nanopore Technologies, among others. What makes PacBio’s technology stand out?
Illumina sequencing has been utilized widely, and is very useful for many things. That’s why it is the market leader. But for resolving many regions in the human genome, it is deficient, because it doesn’t have the read lengths. We see this in a large number of other applications where there are limitations to Illumina sequencing, solving rare disease cases, for example. The solve rate for rare diseases, which we know are genetic, is below 50% for Illumina sequencing. More than half the time, the researcher and the clinicians have to shrug their shoulders and tell the patient we don’t know what is what is wrong with you.
We’ve seen great traction and adoption of PacBio HiFi sequencing to now increase that solve rate. With HiFi sequencing, you have true whole genome sequencing. You see a lot more variation, you can access a lot more regions in the human genome, and many other areas. This is why we’re now seeing this rise of long read sequencing. For example, for tandem repeat disorders, you need the length and you need the accuracy to get exact breakpoints to see the variation within the tandem repeats. In fragile X syndrome, the risk of a mother giving birth to a son that has Fragile X Syndrome goes down from 75% to 25% if she has one or two CAG interruptions in the CGG repeat, so it’s very critical to get the sequence correct.
Oxford Nanopore produce ‘noisy’ long reads. They are long, but they have systematic errors, which means they consistently make errors in the same place. This means that no matter how much you sequence, you will get it wrong every time versus [PacBio’s] random errors, which you can get rid of by just sequencing a lot.
What are you focusing on as a company going forward?
We see there’s a lot of opportunities on the epigenome side. So, that’s one area. The second area, I would say, is the transcriptome. The available epigenome and transcriptome data is currently limited and people are actually now saying what has been done, needs to be redone. For example, in the single cell RNA seq space, there was the Human Cell Atlas done with Illumina sequencing. You’re basically just counting and characterizing those cells relative to what genes are expressed and what are not expressed. What you’re completely ignoring, and is invisible with short read sequencing, is the transcript splicing of the different isoforms.
We are also an official partner in the Human Pan Genome Reference Consortium, the HPRC, that’s an NIH funded initiative to basically create at least 350 reference quality human genomes, and specifically calling out different ethnicities. This effort wants to capture the entirety of the genetic variation and the genetic makeup of the global human population, essentially for the first time.
In the long-read space, what we’ve seen is that people love HiFi data, people want more, and cheaper, as people always do. So, we’re working on increasing the throughput, decreasing the cost, making workflows easier, and so forth. On the short read side, we made a decision that we want to address the very large markets that exist for short read sequencing that are not going away. Last year, we acquired a company called Omniome that has been developing a short read sequencing technology. It’s going to be really interesting in 2023 to see how these new technologies work in people’s hands. We are confident that we made the right acquisition and we acquired the best short read sequencing technology out there.