Three Million African Genomes: A New Project for A New Generation

Three Million African Genomes: A New Project for A New Generation
World map, illustration.

Much has been achieved since the sequencing of the human genome project, but we are still only at the beginning of implementing truly personalized medicine. Could the proposed Three Million African Genomes project take us closer to the goal of inclusive, accurate genetic risk prediction that includes everyone regardless of their genetic ancestry?

This month is the 20th anniversary of the first publication of the draft human genome sequence. In the last two decades, genetics has advanced dramatically as a result, but it has become clear that human biobanks and genomic sequence databases have a diversity problem.

Earlier this month, Nature published an article by Ambroise Wonkam, M.D., Ph.D., a professor at the University of Cape Town and head of the African Society of Human Genetics. In the article, he outlines a proposal to set up a new sequencing project that aims to sequence 3 million genomes across Africa.

“Less than 2% of human genomes analyzed so far have been those of African people, despite the fact that Africa, where humans originated, contains more genetic diversity than any other continent,” he writes.

“The current lack of diversity in the global genomic resource really underscores the need for a project like this,” says Zané Lombard, Ph.D., an associate professor of human genetics at the University of the Witwatersrand.

Lombard and colleagues recently showed that high-depth sequencing of African genomes even on a small scale (they sequenced 426 individuals) can reveal a huge amount of previously unknown genetic variation – in their study alone they discovered 3 million new variants.

“We also show that we are not yet reaching a plateau in new discoveries, and therefore can learn much more by increasing the number of African genomes available in the global resource,” she adds.

Three million genomes is an ambitious goal, most genome projects to date have focused on sequencing thousands of people – such as the UK’s 100,000 genomes project. However, advances in next generation sequencing that have increased speed while also decreasing costs make it more achievable than it was 20 years ago.

“As I understand, Professor Wonkam has done some calculations, based on the number of ethnolinguistic groups in Africa versus the number represented in current public genomic datasets, to arrive at the number of 3 million genomes,” says Lombard.

In order for such a project to work, a large degree of collaboration between different African countries will be needed. The Human Heredity and Health in Africa (H3Africa) consortium was set up in 2010 to study the genomics and medical genetics of the African people. It has already started the groundwork of building such a network and indeed Lombard’s recent work sequencing African genomes on a small scale was a direct result of her work as a principal investigator for H3Africa.

Ananyo Choudhury, Ph.D., is a senior researcher at the Sydney Brenner Institute for Molecular Bioscience, also based at the University of the Witwatersrand. He was first author on the recent Nature paper that was co-authored by Lombard.

“This is definitely a very ambitious project and securing funding worth hundreds of millions of dollars would be first and perhaps the most difficult challenge,” he suggests.

“Although we are gradually developing the capacity to do cutting-edge genomic research in the continent, the scale of the proposed study is unprecedented. Therefore, setting up of computational infrastructure for efficient storage and processing of a genomic dataset of this size as well as finding a critical mass of people with the necessary computational and analytic skills would also be major challenge.”

Another potential problem is persuading people to have their genomes sequenced. “Community engagement is another very important aspect, as exploitation of vulnerable populations have unfortunately occurred in the past,” says Lombard. “In the pursuit of improving diversity, it is very important that the preferences and permissions of local participants should be rigorously upheld.”

We are now moving closer to an era where we have a genetic screen or have our genomes sequenced as a standard part of our medical record. Polygenic risk scores (PRS), once a tool used mostly for research, are now being rolled out in the clinic.

The problem with most PRS, which work by adding up the risk for a given disease associated with having a selection of different genetic variants, is that they are only as good as the population they were developed on. In other words, if the genome wide association study (GWAS) the score is based on mostly includes people of European ancestry then it will be less representative of people of other ethnicities.

“People of European ancestry make up nearly 80% of all published genome-wide association studies,” says Elizabeth Atkinson, Ph.D., a researcher at Massachusetts General Hospital and Harvard Medical School who is working on software to improve these disparities.

“This means that we know more about the underlying genetic basis for traits and diseases in European-descent individuals than those of other ancestries. This is obviously not only not equitable, but feeds into the concerning health disparities that are currently observed across ancestries.”

Tyler Seibert, MD. Ph.D., is an oncologist and researcher based at the University of California San Diego. He has been working with colleagues to make a PRS for prostate cancer risk more representative in people with African ancestry and published results in Nature Communications this week.

“In our recent study, we tested a score for prostate cancer risk that was developed in a European dataset, and we applied it to people of African ancestry. The score did not work as well in the African-ancestry dataset, consistent with studies in other diseases,” he explained.

“The good news is we can improve—we just need the data. In a separate study, we searched for genetic markers that might specifically improve performance in the African-ancestry population. By using African data for genetic score development, we were able to find genetic markers that look like they could bring the overall score performance to be nearly comparable in people of African and European ancestry.”

Awareness of the lack of diversity in genetic biobanks and sequence databases is much higher than it was and a number of efforts have been made to start new biobanks in the U.S. and elsewhere that include a broader range of ethnicities. For example, the ongoing All of Us program is planning to collect genetic data from 1 million volunteers in the US, and aims for 70-80% of these to come from groups of non-European ancestry.

But are these kinds of projects enough? “These efforts should be applauded. However, there is no substitute for studying African genetics in Africa. That is the only way to capture the full extent of genetic diversity there,” says Seibert.

“That will benefit people around the world, too. For example, African Americans in the US generally have mixed ancestry—their genetic ancestry is part European, part African. That mixture exists throughout their genome. So, if I want to look at a genetic marker for risk of prostate cancer that differs by ancestry, knowing that the individual is, say, 75% African and 25% European, may not be enough… Understanding what different kinds of African genomes look like will help us better sort this out.”

There is no doubt it will be expensive and take time to complete – Wonkam estimates core funding would need to be roughly $450 million per year and the whole project would take around 10 years. But the value it could bring to African people, and people of African ancestry, in the long term is enormous.

For example, almost 50% of African and African American populations carry a variant called CYP2B6*6 that is linked with severe side effects to the HIV drug efavirenz. Knowing about variants like this is helpful as it can allow different drugs to be prescribed and hopefully reduce the likelihood that people skip doses and increase the risk of viral resistance.

“The project raises an obvious question: how can expansive genomic sequencing be justified when people still die of malnutrition, malaria and HIV?” writes Wonkam. But adds that he thinks it “will improve capacity in a whole range of biomedical disciplines that will equip Africa to tackle public-health challenges more equitably, and yield knowledge that could benefit vulnerable populations.”