New algorithms that harness the power of protein folding in 2022

Big pharmaceutical companies have been studying protein folding for a long time. Discoveries and innovations in the field can revolutionize drug development and other biological advances. Recently, the development of the COVID-19 vaccine has also been supported by tackling this problem.

The process of protein folding prediction involves a combination of complex algorithms. Recent models from big tech companies like Meta and Google have made strides in solving this protein folding problem and piqued the interest of researchers after gaining open source.

Here is a list of some of the leading protein folding prediction models that are highly accurate and compete in methods and speed!

Folding Alpha 2

Google’s DeepMind has made a major breakthrough using a deep learning approach to build AlphaFold, which has a network-based approach to predicting protein structures. In 2018, AlphaFold 1 was highly praised at CASP13 for its remarkable innovations and now with AlphaFold 2, DeepMind has increased the speed and accuracy even further.

AlphaFold 2 won CASP14 in 2020 and has since been considered the best protein folding model.

DeepMind has decided to make their model open-source for more contributions and additional innovation. In July, DeepMind collaborated with the European Bioinformatics Institute (EMBL-EBI) and published the predicted structures of all cataloged proteins, expanding their previous database by more than 200 times.

Discover the code for AlphaFold here.

ESMFold

Meta AI’s debut of Scalable Scale Modeling (ESM), has proven to be one of the biggest contenders or the best alternative to AlphaFold 2. Just like AlphaFold, the template is also open to the public.

ESMFold has excellent accuracy and works on end-to-end atomic level protein structure. It uses ESM-2, which is a transformer-based language model built on 15 billion parameters. Since it is based on a language model, ESMFold differs from other protein folding prediction models in that it offers higher accuracy and faster inference.

ESMFold produces an accurate protein structure even with a single input sequence because it exploits the internal representations of the language model. When it comes to testing on CASP14, the model received a score of 68 which is lower than AlphaFold 2, which received a score of 84.

To see the code, click on here.

RoseTTAFold

Minkyung Baek of Baker Lab has developed a tool to predict protein structures using deep learning called “RoseTTAFold”. It is based on a three-way neural network and is interesting for protein structure even without determined structure, which makes it faster in prediction.

The three-track network integrates one-dimensional protein structure and processes into two-dimensional sequence information with amino acid distance at a time. The software allows direct collection of reasons and patterns in the relationship between folded architecture and peptides.

According to several reports, RoseTTAFold was able to predict tens of hundreds of previously unknown new protein structures. Scientists and researchers also predict that the software could solve the modeling problems of X-ray crystallography and cryo-electron microscopy.

Click on here for the GitHub repository.

OmegaFold

In July, the Chinese biotechnology company “Helixon” developed OmegaFold and joined the race to predict protein folding, beating competitors in several areas. After surpassing RoseTTAFold and rivaling AlphaFold 2 for its high-resolution protein structure prediction, the developers released the code on GitHub.

The model works on divergent sequences, unlike multiple sequence alignments in AlphaFold and RoseTTAFold, allowing them to make predictions and suggest geometry-inspired transformer models trained on protein structures from single sequences.

OmegaFold works on the protein language model, OmegaPLM, which can detect structural information encoded in amino acid sequences. Thus, the model can predict protein structure ten times faster than RoseTTAFold because it can predict structure and folds with a single amino acid sequence.

Click on here for the reference.

DI-TASSER

Zhang Lab at the University of Michigan developed the DI-TASSER, or DI-TASSER, which is used for high-precision prediction of protein folding and structure. It is built by integrating threading and deep learning. DI-TASSER comes after the lab’s older model, “I-TASSER”, and offers superior speed and accuracy.

From a sequence of queries, the generation of interresidual contact and distance maps is processed using two multiple deep neural network predictors: DeepPotential and Attention Potential.

The model has an optional additional server called DI-TASSER-AF2 which incorporates AlphaFold2 constraints and increases overall accuracy compared to the two models separately.

Click on here to visit the lab’s website.

IntFOLD

This server provides a unified resource for automatically predicting protein tertiary structures with built-in model accuracy estimates (EMAs). The server is a fully automated and powerful tool for predicting protein structures from their amino acid sequences.

The server has been tested on CASP and performed very well in blind tests. The results are presented as graphical outputs, which is also beneficial for non-expert users as it provides a visual summary of a complex data set.

Click on here to read the IntFOLD research paper.

RaptorX

RaptorX offers model-based protein secondary structure prediction and modeling. The model-based tertiary structure modeling approach allows the model to complete processing a 200 amino acid sequence in approximately 35 minutes.

What sets RaptorX apart from other protein folding prediction models is a new non-linear scoring function, aligning the target sequence with multiple distant model proteins and a probabilistic consistency algorithm.

Learn more about RaptorX here.

Sharon D. Cole