An artificial intelligence (AI) program has been developed that could speed vaccine development in the face of emerging variants. The new machine-learning model, DeepVacPred, can accomplish vaccine design cycles that took months or years in minutes or seconds, according to its designers, researchers at the University of Southern California Viterbi School of Engineering. In this study, they used their model to design a multi-epitope vaccine against SARS-CoV-2, which causes COVID-19.
“This AI framework, applied to the specifics of this virus, can provide vaccine candidates within seconds and move them to clinical trials quickly to achieve preventive medical therapies without compromising safety,” said Paul Bogdan, associate professor of electrical and computer engineering at USC Viterbi and corresponding author of the study. “Moreover, this can be adapted to help us stay ahead of the coronavirus as it mutates around the world.”
The group’s findings appear today in Nature Research’s Scientific Reports.
Several new variants of SARS-CoV-2 emerged last fall, raising concerns about the efficacy of the vaccines currently being rolled out. Those variants include B.1.1.7 in the UK, B.1.351 in South Africa, and P.1 in Brazil. These variants are spreading across the world now, and they each contain multiple mutations. P.1, for example, has at least 17 unique mutations. The rise and spread of these variants make it crucial for vaccine developers to stay ahead of them.
Combining in silico immune-informatics and deep neural network strategies, the DeepVacPred computational framework directly predicted 26 potential vaccine subunits from the available SARS-CoV-2 spike protein sequence. The researchers then used further in silico methods to investigate the linear B-cell epitopes, Cytotoxic T Lymphocytes (CTL) epitopes, and Helper T Lymphocytes (HTL) epitopes in the 26 subunit candidates and identified the best 11 of them to construct a multi-epitope vaccine against the virus.
They also used bioinformatics to evaluate the anticipated human population coverage, antigenicity, allergenicity, toxicity, physicochemical properties and secondary structure of the candidate vaccine, which revealed its design was promising. The 3D structure of the vaccine was predicted, refined and validated by in silico tools. Finally, they optimized and inserted the codon sequence into a plasmid to ensure the cloning and expression efficiency.
DeepVacPred accelerated the vaccine design process and constructed a 694aa multi-epitope vaccine containing 16 B-cell epitopes, 82 CTL epitopes and 89 HTL epitopes that was promising and could be further evaluated in clinical studies. Moreover, they reported that they could trace the RNA mutations of SARS-CoV-2 to ensure that any designed vaccine can tackle specific RNA mutations.
The engineers say they can construct a new multi-epitope vaccine for a new virus in less than a minute and validate its quality within an hour. By contrast, traditional lab-based methods can take more than one year.
Bogdan said that if SARS-CoV-2 becomes uncontrollable by current vaccines, or if new vaccines are needed to deal with other emerging viruses, USC’s AI-assisted method can be used to design new vaccines quickly.
In their current study, the USC scientists used only one B-cell epitope and one T-cell epitope. But they note that by applying a bigger dataset and more possible combinations, they can develop a more comprehensive and quicker vaccine design tool. They estimate their method can perform accurate predictions with over 700,000 different proteins in the dataset.
“The proposed vaccine design framework can tackle the three most frequently observed mutations and be extended to deal with other potentially unknown mutations,” Bogdan said.
The raw data for the research comes from the Immune Epitope Database (IEDB), in which scientists around the world have been compiling data about the coronavirus, among other diseases. IEDB contains over 600,000 known epitopes from some 3,600 different species, along with the Virus Pathogen Resource, a complementary repository of information about pathogenic viruses. The genome and spike protein sequence of SARS-CoV-2 comes from the National Center for Biotechnical Information.