Over 200 million predicted protein structures have been made freely available by Google offshoot DeepMind and the intergovernmental European Molecular Biology Laboratory (EMBL).
Their AlphaFold Protein Structure Database uses artificial intelligence (AI) to forecast the 3D shape of nearly all catalogued proteins known to science, covering almost every organism that has had its genome sequenced.
DeepMind—a subsidiary of Google’s parent company Alphabet Inc.—and EMBL’s European Bioinformatics Institute (EMBL-EBI) have expanded the database almost 200-fold in the hope that it will spur scientific research, tackle global challenges and accelerate the diagnosis and treatment of disease.
Scientists are currently using the AlphaFold method and database for a variety of applications including fighting antibiotic resistance, reducing plastic pollution, tackling drug resistance to understanding honey bee immunity.
A protein’s structure, determined from its amino acid sequence, determines its biological function.
“This is one of the most important datasets since the mapping of the Human Genome,” EMBL Deputy Director General and EMBL-EBI Director Ewan Birney told Inside Precision Medicine.
“In my wildest dreams, I didn’t expect the AlphaFold Database to grow so quickly and become such a comprehensive data resource for the scientific community. It’s a real joy to see the data open and accessible for everyone to explore and build on.”
Since its launch in July 2021, the AlphaFold database has been accessed by more than half a million researchers from 190 countries.
Over 2 million protein structures have so far been viewed, and the database has been cited in more than 1000 scientific papers.
Initially, the database included more than 350,000 protein structure predictions, including the 20,000 proteins found in humans.
Later, it was expanded to include UniProtKB/SwissProt, a freely accessible resource of protein sequences and functional information.
Then, earlier this year, it added the predicted structures of nearly 200,000 proteins from 25 organisms that are on the World Health Organization’s list of neglected tropical diseases and 10 on its antimicrobial resistance list.
Researchers are now building on AlphaFold to create and adapt tools such as Foldseek, which allows fast and sensitive comparison of large structure sets, and Dali, which relates new 3D protein structures to existing ones.
New algorithms based on AlphaFold’s core machine learning ideas have also been applied in areas such as predicting RNA structure or novel protein design.
“We’ve been amazed by the rate at which AlphaFold has already become an essential tool for hundreds of thousands of scientists in labs and universities across the world,” said DeepMind founder Demis Hassabis in a press statement.
He added: “Our hope is that this expanded database will aid countless more scientists in their important work and open up completely new avenues of scientific discovery.”