The Biden Administration’s new $1.7 billion commitment to expand genomic sequencing to fight COVID-19 is a welcome step that unfortunately comes late in the course of the pandemic, though of course better to come late than not at all. Yet, given that the infrastructure is still not in place, the United States, like much of the world, is still flying blind when it comes to COVID-19 with little understanding of which variants—foreign or homegrown—are circulating in our communities, how they are affecting the course of disease, and whether they are breaking through the protection offered by vaccines.
As a member of the public health community, our failure is a large source of professional frustration. But as a pioneer in the field of viral genome sequencing, it is also very much a source of personal sadness.
One of the first people to sequence a full strand of DNA was one of my professors and a lifelong mentor of mine, Walter Gilbert. He and a colleague, Allan Maxam, published a major paper in February 1977 describing every step of their chemical method for sequencing DNA, the Maxam-Gilbert method. The paper was a profound revelation. At the time, I was running my own lab at the Dana-Farber Cancer Institute, studying retroviruses and their potential links to leukemia and other cancers.
When I heard of what Allan and Wally had published, I called Allan and asked him if he’d be interested in joining me on some of the work I was doing. Over the next few years, we published a series of papers together, including one in which we sequenced a novel piece of DNA for the very first time, DNA that had been transcribed from RNA in a cancer-causing retrovirus, Rous sarcoma. Our efforts had an important impact on my work at the time but an outsize influence on my career as a whole, as it inspired me to create the biotech company Human Genome Sciences and use these powerful new techniques for both fundamental biological discovery and drug development.
Today, DNA sequencing can be done in mere hours with all manner of material, from blood to sewage. But back when I was working with Allan it took us several days to accomplish the same task. First you needed to prepare the DNA, denaturing it into a single-stranded chain, and labeling it on the 5′ end. Then you applied a set of chemicals to break apart the individual strands of DNA, which were then relabeled as fragments. Those fragments then needed to be processed to allow us to read its sequence. Initially, that reading was done by computer—another major effort given the lack of computational powers of even our largest mainframes, which took up entire floors of our buildings but were significantly less powerful than an average mobile phone today. The result was a printed sequence that ran fourteen inches wide and up to one hundred feet long on the old dot matrix printers used in the day. The Maxam-Gilbert method was called “rapid sequencing” but the whole process from preparing the DNA to final printout still took several days.
But that sequence held enormous potential. In my lab, we immediately applied the Maxam-Gilbert method to explore how anticancer drugs worked. First, we isolated a target segment of a known sequence of DNA and mixed it with the anticancer drug in question. Then we compared the two patterns to identify exactly where the anticancer drug attacked the DNA. In short order, we were able to show what parts of the DNA were damaged in a cancerous cell and the molecular impact of various anticancer drugs on the DNA. This initial understanding led to even greater discoveries, including the potential effect of combining different cancer drugs to kill isolated cancer cells.
By the 1980s our DNA research on anticancer drugs was thriving, but our work on retroviruses and their link to human cancers was withering on the vine. The number of scientists who believed in that link had been steadily decreasing, though thankfully among the remaining included Bob Gallo. In the spring of 1981, Bob called me with news that I had long waited to hear. He had discovered a human retrovirus that caused leukemia, a virus he named Human T cell Leukemia Virus (HTLV) since it only seemed to affect T cells. Armed with his discovery and Maxam and Gilbert’s sequencing method, I was off.
Sequencing HTLV led me and my team to a series of new and important discoveries. First, we discovered that HTLV could activate itself (transactivation), unlike every other known retrovirus we had studied. Then we isolated the element in the viral genome that responded to the transactivating signal. Finally, we figured out what signaled the start of transactivation, a tax gene that caused normal lymphocytes to grow uncontrollably. Later, we learned how HTLV was transmitted, not by the virus alone but instead through an infected cell—a very unusual mode of virus transmission.
When AIDS first appeared on our radar around this same time, there was no stopping me and my colleagues in the lab. Our collective understanding of retroviruses and how they caused disease was what led us to so quickly be able to identify HIV, another retrovirus, as the source of the new disease. In less than two years, teams of scientists working in France and across the United States were able to isolate the virus and determine its entire sequence. My lab worked closely with another group at Washington University in St. Louis to sequence a DNA copy of a strain that grew especially well in the lab. When we combined our results, at last we had it, the full sequence of the HIV genome.
The DNA sequence was printed on a continuous piece of paper about fourteen inches wide and sixty feet long. The RNA sequence ran down the center of the page. The predicted protein sequence was printed above the RNA sequence. Once we had the sequence of the virus, we immediately understood many of its working parts. We marked that printout up from top to bottom, pointing out precisely where we thought each protein was made. Over time, we were able to accurately pinpoint each of the virus’ proteins.
That was the moment when I fully understood the power of sequencing to understand not just how viruses work but also how to stop them. Once we had the sequence, we were only weeks away from having a protein and an assay to identify drugs to target different parts of the virus. That was also the moment I realized that what we could do for a virus we could do for the full complement of all human genes. With sequencing, we could shortcut decades of arduous work between the discovery of a disease to developing pharmaceutical targets to treat or cure it. It was a concept I applied over and over again, especially during my time building Human Genome Sciences.
But sequencing does more than simply help us develop drugs. With HIV, which has killed more than 30 million people worldwide and continues to kill nearly 700,000 people each year, DNA sequencing has become an integral part of tailoring treatments and controlling the pandemic as a whole. HIV develops resistance to new drugs quickly, whether the drugs are given solo or in combination. On an individual level, knowing a patient’s viral sequence can help determine which course of treatment will work best. On a population level, understanding the prevalence of drug resistant strains of the virus can help identify the dynamics of transmission and can give policymakers and drug developers an early heads up when new antiretroviral regimes need to start moving down the pipeline or other interventions need to be implemented.
We’ve used a similar approach when it comes to influenza, which is another virus that is constantly changing. Each year, almost every country, including the US, performs whole-genome sequencing on thousands of influenza viruses collected through virologic surveillance. That sequencing shows us not only how the virus is evolving and whether it is causing more severe disease or becoming more transmissible, but it is also key to determining how we develop the vaccines for that year and how we may adapt treatment regimes.
Over the years, with our work on HIV, influenza and a number of other diseases, we have built up a vast reservoir of knowledge—and the fundamental technologies—to track and treat viral infectious diseases. This makes our failure to apply this knowledge immediately after the emergence of SARS-CoV-2 all the more baffling.
At the end of 2020, the United States ranked 43rd among all countries in sequencing COVID-19 and its variants. Even today as I write, it lags behind nearly thirty other countries in its efforts. The U.K., on the other hand, is among the best, which is no doubt how and why they were among the first to identify a SARS-CoV-2 variant. For a while, South Africa and their network of five regional genomic surveillance centers were outpacing our own efforts, which is why they were quick to identify new variants as well. Now, new variants are popping up all over the world, causing enormous numbers of new infections in Europe, Canada, India, Brazil, and beyond. Yet here in the United States we are just ramping up our efforts and we continue to fly blind.
Nearly 40 years of work on HIV sequencing and genetic surveillance of influenza should have been enough to alert us of the need for deep and detailed genetic surveillance of SARS-CoV-2 as it emerged as a threat. Not immediately establishing that infrastructure was one of the greatest oversights of the pandemic and is tied directly to the hundreds of thousands of lives unnecessarily lost. Many assumed that the virus would be relatively stable, likening it to SARS-1 or MERS. But for others, including myself, we knew to be watchful and wary. The virus could reinfect or linger latent in a patient only to reemerge weeks or months down the line. This was a sign that the virus was changing, but without the sequencing work we were blind to its shifting shape.
Like others, I am grateful for the new commitment to sequencing even at this late date. One billion dollars to improve the capacity states to identify COVID mutations and monitor the circulation of variants; $400 million to establish six Centers of Excellence in Genomic Epidemiology to fuel new research including, potentially, new tools for genomic surveillance; $300 million to build out a National Bioinformatics Infrastructure, a data system to track how pathogens are spreading or mutating. It is a start, but it is only an initial small injection of interest and funding for an effort that must be sustained and strengthened in the decades to come. If not, we pay in lives lost — not only to COVID-19 but to the next great pandemic that washes onto our shores.