The dark genome—it sounds like a work of science fiction. Perhaps something Keanu starred in shortly after The Matrix? But for a dozen or so biotech companies, investigating this mysterious genetic “dark matter” is at least as exciting as anything dreamt up by a movie studio.
Once dismissed as “junk DNA,” the dark genome is increasingly looking like an untouched resource that can provide answers to fundamental processes in life.
And, through the application of the latest technology, it has the potential to lead to new, ground-breaking treatments for disease on an unimagined scale.
“We believe that the dark genome is a bottomless biological treasure trove that has hidden patiently, waiting to be discovered and appreciated for the lessons it can reveal about why various diseases develop and progress,” says Rosana Kapeller, MD, PhD, CEO, Rome Therapeutics, which is developing dark genome therapeutics.
Oxford Science Enterprises life sciences operating partner Craig Fox, PhD, whose company acted as a founding investor in dark genome company Nucleome, agrees.
“There is no doubt that the dark genome contains a rich source of untapped disease-relevant biological insights that are likely to provide the basis for new effective therapeutics and stratified medicine approaches,” he says.
The scientific world has long known about this ramshackle collection of non-coding DNA, untranslated regions, splice sites, and transposable elements, which had previously been written off as a meaningless evolutionary legacy.
But major clues as to its importance came with the first sequencing of the human genome a little over 20 years ago.
The 13-year project to spell out three billion letters of genetic code was heralded as one of the greatest feats of scientific history.
It brought with it enormous quantities of genetic data, but also a major gap—a glaring lack of protein-encoding genes. They made up just 2% of the genome, a fifth of early estimates, and a remarkably small number considering the complexity of a human.
It is only now that scientists are probing the vast remaining portion, a situation that Sudhakaran Prabakaran, PhD, founder of dark genome company NonExomics, compares to an iceberg.
Around 20,000 proteins encoded by traditional genes exist in the visible “exome” part, of which 812 have been studied as therapeutic targets over the past fifty years.
However, his company believes it has discovered a massive 248,000 proteins that lie beneath in the hidden part of the iceberg, made independently in the dark genome.
It is exploring what they term these “nonexomic proteins”, focusing on a short list of 3,000 that are produced from areas of the genome called novel open reading frames, or nORFs.
“We know that almost 95% of all clinical trials fail at Phase III stage and we don’t know why they’re not able to accomplish this task of curing diseases,” he explains.
“What our work shows is that now there are more proteins the genome is capable of making, and we call this region—the rest, the 98% region—the non-exome.”
Prabakaran points out that the definition of a gene has been set in stone for the past fifty years and does not include these non-exomic proteins made in the dark genome.
“When we imagine a gene, we think of it as a car with four doors and four wheels and a steering wheel. And we can’t imagine anything outside of this definition, but it turns out that genomes are more plastic than what we know of and so can make proteins anywhere.”
He believes the whole definition of a genome needs to change “but as you know, pushing through a 5-door car or a 6-wheel car is not easy.”
The computational biologist’s approach has already reaped real-life rewards, helping to guide treatment in a patient whose cancer had perplexed oncologists and surgeons.
Mapping 20,000 exome protein signatures from the patient’s biopsy samples onto 16,000 samples from different people across 33 cancers offered no clues as to the type of malignancy.
It was only when they pulled out non-omic protein signatures and overlapped these with the other samples that it became clear the patient’s data was clustering with a type of lung cancer.
“It’s an exciting field,” he says. “People are now opening up to this idea… there’s exciting stuff happening outside of the conventional, whatever we know of as biology.”
Over at Nucleome, they are adopting a different approach. The company, founded by gene regulation experts at the University of Oxford, is investigating how the dark genome controls traditional gene expression in specific cell types, switching them on and off at the right time and level.
“Over 90% of disease-linked genetic changes are in this uncharted part of the genome and have no function ascribed, representing a significant opportunity for discovery of genetically validated targets with biomarkers,” says its chief scientific officer Stephen Harrison, PhD.
Nucleome links these variants to gene function in specific cell types to precisely map affected pathways. It is initially focusing on lymphocytes and related autoimmune disease and aims to build both a pipeline of therapeutic assets and biomarkers.
“Since the dark matter constitutes the majority of the human genome, targeting it offers a vast landscape of potential novel therapeutic avenues to be uncovered,” explains Harrison.
“Additionally, since a significant portion of disease-linked genetic variations reside within this uncharted territory, understanding its role will unveil crucial insights into disease mechanisms and facilitate the identification of precise therapeutic targets and help identify the patients that will benefit from treatment.”
In the States, Rome Therapeutics has chosen yet another way of studying genomic dark matter. It is exploring the many repeats in DNA sequences that originate with viruses that have integrated into the human genome over the course of evolution.
These DNA repeats lie dormant in healthy cells but can be activated in disease states or under environmental stress, playing a key role in autoimmune and neurodegenerative diseases and cancer.
The Boston-based company’s lead drug impacts an enzyme produced by a virus-like element called LINE-1, an “ancient genetic parasite” that can replicate itself and insert itself again so successfully that makes up nearly a fifth of the dark genome.
At Rome Therapeutics, their lead dark genome drug aims to inhibit one of its enzymes implicated in the triggering of many autoimmune diseases and it will be exploring its benefits in lupus in human studies later this year.
Kapeller echoes the concerns voiced by Prabakaran over traditional approaches to treat disease, saying existing drugs often provide “symptomatic or band-aid approaches” that do not modify its underlying root cause.
The situation is particularly evident in autoimmune disease, where treatments currently suppress different parts of the immune system to prevent autoimmunity.
“The problem is suppressing the immune system then leaves patients at risk for developing serious and even life-threatening infections, such as those caused by external viruses, among other unpleasant side effects,” says Kapeller.
Rome’s approach instead would prevent the innate immune system from being triggered in the first place, which she says would potentially make it the first non-immunosuppressive drug to treat autoimmune diseases.
“I think the biggest benefit of illuminating the dark genome is that it’s helping us gain a much deeper understanding of disease-driving biology in certain diseases and that, hopefully, will lead to more treatments for more patients— and, importantly, to precision treatments that can modulate underlying disease biology,” she maintains.
Back in Oxford, EnaraBio is exploiting yet another aspect of the dark genome as a source of potential treatments, through “dark antigen” and T-cell biology.
It is looking at the genomic instability that is a hallmark of cancer, where regions that are usually silenced become exposed and activated.
The company is exploiting the observation that some of these newly activated sequences make small proteins that are rapidly broken down by the cancer cell, parts of which are presented on the surface by a molecule called HLA.
It is developing ways to identify fragments of the proteins or “dark antigens” made by these newly activated sequences that are presented on the surface of the cell.
Enara president and CEO Kevin Pojasek, PhD, says dark antigens represent an entirely new, untapped set of targets for treating a range of solid tumors.
He explains that cancers result in an overload to the system. “You have all this genomic instability, rapid turnover, there’s lots of things going on in the cell, and it really has evolved to rapidly proliferate and invade and just take care of everything else that’s going on,” he explains. “And so, what we see is a range of dark antigens in terms of their source, in terms of their links to cancer.
“But I think holistically what’s going on is you just have a lot of new protein sequences being made and the cell machinery is saying, okay, some of these are happening, some of them are going to do stuff, but the others are maybe just trying to get cleaned out of the cell.”
He admits they are “still pretty early on in our journey” to develop therapies. Nonetheless, the company has made a couple of pharma deals that include a long-standing collaboration with German behemoth Boehringer Ingelheim. It has licensed a set of Enara’s dark antigens which will be put into a candidate cancer vaccine and should progress towards the clinic next year.
Advances in technology developed within the past decade have been key to unlocking the potential of the dark genome for many of these companies. Next generation sequencing has initiated a generation of precise genome mapping and comparative genomics. Long-read sequencing, which can sequence long strands of DNA or RNA without breaking it into fragments, has improved the ability to detect specific gene variants and uncovered missing dark genome events.
Meanwhile, improvements in transcriptomics have helped removed the bias that has missed repeats in cancer and had consequences for the immune system.
“We’ve done a lot of work mapping the targets, validating the targets, so both the discovery of these is very complex and requires a lot of cutting-edge technology,” says Pojasek.
He believes that “hugely powerful” new technologies developed in the past five to 10 years have helped scientists to capitalize on the discoveries they are making in the field today.
“I think part of the reason they have been dark for so long is that the technologies really weren’t there to probe and to understand the features of the dark genome,” he elaborates.
“And so, I think some of the foundational advances in the field that we take advantage of are RNA sequencing data.”
The company is performing in-depth RNA sequencing data both from bulk tumor samples and single cells and then increasingly ribosomal footprinting to see what is actually getting translated in a cell. Mass spectrometry identifies peptide antigens on the surface of cells in primary human tumors and RNA sequencing then allows them to create a targeted database of candidate antigens.
For antigen validation, it uses an advanced in-hybridization technique that has probably only been around for a decade, while artificial intelligence and machine learning enhances the ability of potential therapies to see targets.
Prabakaran says traditional narrow definitions for genes have previously reinforced technological constraints, with large-scale sequencing strategies looking for regions that are already well known.
“It’s like peaks in a mountain range, we know there is a peak, we go look for it and we don’t see anything underneath it, we are flying up above,” he explains.
“If you keep going down and down, you’re seeing more valley and more smaller peaks. And part of the problem had been ability, with technologies that are not that sensitive.
“And that’s, that’s where we come in. We develop the technology and the algorithms to put all of them together and see all these signatures.”
Kapeller adds: “The key challenge is that this is an entirely new area of biology and the technologies needed to explore and understand the dark genome are still emerging.
“We launched in 2020, and I can tell you unequivocally that our understanding of the dark genome and LINE-1, in particular, has radically evolved over the past four years.”
And what of investors in the field? Fox admits that it is still “early days” in terms of dark genome companies having a clinical impact.
“Many of these approaches are still at the early stage, with only limited examples that these approaches can deliver meaningful impact to patients, and so [they are] more appealing at this moment in time to investors with a higher risk appetite and those that like to ride the journey of exploratory and cutting-edge research.”
He adds: “It is still early days and so many investors may well wait for evidence that these insights can lead to clinical impact. However, there are 10–15 biotech companies focused on the dark genome specifically and this is likely to increase.”
Anita Chakraverty is a U.K.-based journalist who has been writing about medicine and health across several international publications for more than 20 years. In her spare time, she enjoys reading, films, and walks in the countryside.