AlphaFold

From Proteopedia

(Redirected from Alphafold)
Jump to: navigation, search

Image:Sameer velankar alphafold database for febs junior sections.jpeg

The AlphaFold Database in context by Sameer Velankar and Gerard Kleywegt, EMBL-EBI, October 26, 2021. Hosted by FEBS Junior Sections.

In 2020, the AlphaFold2[1][2] system of DeepMind[3][4] demonstrated a major breakthrough[5][6][7][8]. At CASP14, AlphaFold2 was far better able, among over 100 competing groups, to predict structures, including sidechain positions, so close to the subsequently revealed X-ray crystallographic structures as to differ by little more than the differences between two independently-determined X-ray structures of the same molecule. It did this for about two-thirds of the targets in the competition. AlphaFold2 has been hailed as largely solving the protein structure prediction problem for single-chain proteins[5][6][7][8]. "Never in my life had I expected to see a scientific advance so rapid." said Mohammed AlQuraishi of Columbia University[5]. But consider also "The joys and perils of AlphaFold"[9].

In 2022, at CASP 15, AlphaFold2 continued to outperform all other methods in the majority of cases (see a summary of results at Theoretical models).

In September, 2023, John Jumper and Demis Hassabis received the Lasker Award for revolutionizing protein structure prediction[10][11].

If you want an AlphaFold-predicted structure for a protein sequence:

Contents

AlphaFold Database of Predictions

In 2023, the free AlphaFold Database has been expanded to >200 million structures. Proteins in UniProt now link to the AlphaFold models in the Structure section. For an overview of which proteins are and are not in the AlphaFold Database, see Which proteins are included? in the FAQ at the main page of AlphaFold Database.

In July, 2021, DeepMind made available over 300,000 structure predictions from amino acid sequences in their free AlphaFold DB[12][13][14][15]. These predictions include nearly all ~20,000 proteins in the human proteome, 36% with very high confidence, and another 22% with high confidence[15][16]. Also included are E. coli, fruit fly, mouse, zebrafish, malaria parasite and tuberculosis bacteria[15]. Limitations of these predictions were enumerated[14][9], including:

  • Inability to predict protein-protein or protein-DNA/RNA/ligand complexes. #RoseTTAFold and AlphaFold both claim progress on predicting protein-protein complexes.
  • Does not predict ligands, cofactors, metals, ions, glycosylation, etc. (Efforts to extend to such: see AlphaFill below; and glycolsylations.)
  • Does not deal with conformational dynamics.
  • Does not predict intrinsically unstructured segments.
  • Does not predict the folding pathway.
  • Has not been trained to predict structural consequences of point mutations.

Nevertheless, these predictions have many potential benefits[14], including:

  • Simplifying X-ray crystallography by enabling solution of the phase problem by molecular replacement using the predicted model.
  • Assisting crystallographers in defining domain boundaries in order to crystallize domains when crystallization of full length proteins is problematic.
  • Helping to interpret >5,000 cryo-EM maps previously deposited in the EMDB that could not be interpreted as atomic models, as well as helping to interpret lower resolution EM maps as atomic models.

Ligands: AlphaFill

The AlphaFold Database has been enhanced by "transplanting" ligands from empirical structures similar to predicted structures. Results are in the AlphaFill Database (preprint). The authors caution

"AlphaFill models are not meant or suitable for precise quantification of interactions between the transferred ligand(s) and the protein (e.g. hydrogen bonds, π-π or cation-π interactions, van der Waals interactions, hydrophobic interactions, halogen bonds). These require coordinate precision that is not provided by either the AlphaFold or the AlphaFill models at the current stage, and the models should only be interpreted in a qualitative manner."

AlphaFold published July 2021

AlphaFold was published in July, 2021[17]. Methods were described in considerable detail. The source code, trained weights, and inference script were made available under an open-source license. Structure prediction required about one GPU (Graphics Processing Unit) minute per model of about 384 amino acids.

Impressively, AlphaFold had remarkable success predicting a set of 10,795 protein chain structures (filtered for high reliability, lengths restricted to 80-1,400 residues) published in the PDB after AlphaFold's training set[18]. Overall alpha carbon accuracy had a median of 1.46 Å RMSD at 95% coverage. The majority of chain structures were predicted with full-chain alpha carbon RMSD values <2 Å. About 25% were predicted with RMSD >4 Å.

Importantly, each prediction comes with a confidence score that reliably predicts the accuracy of the predicted structure.

Accurate prediction of sidechains required accurate prediction of the main chain. Accurate prediction required a multiple sequence alignment depth >~30 sequences, with a depth of ~100 sequences being adequate.

Free AlphaFold-based Servers

If you want an AlphaFold-predicted structure for a protein sequence:

RoseTTAFold

Also in July, 2021, Minkyung Baek and a large team in the group of David Baker published their RoseTTAFold employing a three-track network, based in part on methods inspired by AlphaFold but not yet fully-detailed by DeepMind. They reported "accuracies approaching those of DeepMind in CASP14"[19]. At the time of its release in July, 2021, it had outperformed all other available structure prediction servers[19].

The RoseTTAFold Server was made freely available. (Open the Structure Prediction menu at the top and choose Submit. At the form, be sure to check RoseTTAFold before submitting your job).

AlphaFold Colab

Google provides "Colaboratories" (Colabs). A Colab "allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education"[20].

DeepMind has provided an Alphafold Colab that uses a "slightly simplified" version of AlphaFold version 2.0: "While accuracy will be near-identical to the full AlphaFold system on many targets, a small fraction have a large drop in accuracy due to the smaller MSA and lack of templates.". The AlphaFold Colab is free to use. The code is executed in a virtual machine private to your account, and data are stored on Google Drive. Nothing is installed on your computer; "everything happens in the cloud on Google Colab"[21]

For those unfamiliar with Colabs, the user interface may look unfamiliar, but the instructions are clear and straightforward to use. The mentions of "Runtime -> Run after" refer to the Runtime pull-down menu at the very top of the page. Getting a result may take several hours.

ColabFold: AlphaFold2 with MMSeqs2

A colab by Sergey Ovchinnikov, Milot Mirdita and Martin Steinegger. In their accompanying publication they state:

"MMseqs2’s MSAs [multiple sequence alignments] produce more accurate predictions while being ~16 faster compared to the AlphaFold2’s MSA stage. ColabFold also offers many advanced features, such as homo- and hetero-complex modeling and exposes AlphaFold2 internals."

Current offerings via Colab

Work is ongoing and other offerings are now available on Colab for RoseTTAFold and AlphaFold2 besides the ones detailed above. This summary guide & video should help in choosing how to analyze your proteins of interest:

- A Guide to the free RoseTTAFold and AlphaFold 2 Colab notebooks

- ColabFold: A video covering an overview, comparison of some of the methods and how people are already extending them, how to submit and interpret, and a tutorial on how to use AlphaFold2 Colab is available. The video was recorded on August 4th, 2021 presented by Sergey Ovchinnikov and Martin Steinegger, hosted by Chris Bahl for the Boston Protein Design and Modeling Club

Advances since 2021

  • RoseTTAFoldNA[22] offers a leap forward in predicting structures of complexes of proteins and nucleic acids, but in November 2023 is not yet available as a free server.

See Also

References

  1. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Zidek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan, 15. PMID:31942072 doi:http://dx.doi.org/10.1038/s41586-019-1923-7
  2. AlphaFold at Wikipedia.
  3. AlphaFold: a solution to a 50-year-old grand challenge in biology, DeepMind Blog, November 30, 2020.
  4. DeepMind at Wikipedia.
  5. 5.0 5.1 5.2 AlphaFold2 @ CASP14: “It feels like one’s child has left home.” by Mohammed AlQuraishi, December 8, 2020.
  6. 6.0 6.1 Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionise’ medical research, CASP Press Release, November 30, 2020.
  7. 7.0 7.1 Callaway E. 'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures. Nature. 2020 Dec;588(7837):203-204. doi: 10.1038/d41586-020-03348-4. PMID:33257889 doi:http://dx.doi.org/10.1038/d41586-020-03348-4
  8. 8.0 8.1 DeepMind and CASP14 by John R. Helliwell, International Union of Crystallography Newsletter, December 4, 2020.
  9. 9.0 9.1 Perrakis A, Sixma TK. AI revolutions in biology: The joys and perils of AlphaFold. EMBO Rep. 2021 Oct 20:e54046. doi: 10.15252/embr.202154046. PMID:34668287 doi:http://dx.doi.org/10.15252/embr.202154046
  10. Lasker Award for Revolutionizing Protein Structure Predictions, Laura Tran, The Scientist, September, 2023.
  11. Strzyz P. Lasker Award for AlphaFold. Nat Rev Mol Cell Biol. 2023 Nov;24(11):774. PMID:37752227 doi:10.1038/s41580-023-00671-2
  12. We’ve made AlphaFold predictions freely available to anyone in the scientific community at DeepMind.com (date of release not specified, approximately July 2021).
  13. AlphaFold’s protein structure predictions now available to explore at the European Bioinformatics Institute, July 23, 2021.
  14. 14.0 14.1 14.2 Great expectations – the potential impacts of AlphaFold DB at the European Bioinformatics Institute, July 22, 2021
  15. 15.0 15.1 15.2 DeepMind and EMBL release the most complete database of predicted 3D structures of human proteins at the European Bioinformatics Institute, July 22, 2021.
  16. Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Zidek A, Bridgland A, Cowie A, Meyer C, Laydon A, Velankar S, Kleywegt GJ, Bateman A, Evans R, Pritzel A, Figurnov M, Ronneberger O, Bates R, Kohl SAA, Potapenko A, Ballard AJ, Romera-Paredes B, Nikolov S, Jain R, Clancy E, Reiman D, Petersen S, Senior AW, Kavukcuoglu K, Birney E, Kohli P, Jumper J, Hassabis D. Highly accurate protein structure prediction for the human proteome. Nature. 2021 Jul 22. pii: 10.1038/s41586-021-03828-1. doi:, 10.1038/s41586-021-03828-1. PMID:34293799 doi:http://dx.doi.org/10.1038/s41586-021-03828-1
  17. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Zidek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Jul 15. pii: 10.1038/s41586-021-03819-2. doi:, 10.1038/s41586-021-03819-2. PMID:34265844 doi:http://dx.doi.org/10.1038/s41586-021-03819-2
  18. The training set cutoff was 2018/04/30. The test set was obtained between then and 2021/02/15.
  19. 19.0 19.1 Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, Millan C, Park H, Adams C, Glassman CR, DeGiovanni A, Pereira JH, Rodrigues AV, van Dijk AA, Ebrecht AC, Opperman DJ, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy MK, Dalwadi U, Yip CK, Burke JE, Garcia KC, Grishin NV, Adams PD, Read RJ, Baker D. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021 Jul 15. pii: science.abj8754. doi: 10.1126/science.abj8754. PMID:34282049 doi:http://dx.doi.org/10.1126/science.abj8754
  20. Collaboratory FAQ at Google.
  21. Alphafold Colab.
  22. Baek M, McHugh R, Anishchenko I, Jiang H, Baker D, DiMaio F. Accurate prediction of protein-nucleic acid complexes using RoseTTAFoldNA. Nat Methods. 2023 Nov 23. PMID:37996753 doi:10.1038/s41592-023-02086-5

Further reading

  • AlphaFold protein structure predictions - a step change for biology.
(Report by Oana Stroe, Senior Communications Officer at EMBL-EBI. 28 July 2021 at FEBS Network)
Sameer Velankar and Gerard Kleywegt, from the Protein Data Bank in Europe, and Alex Bateman, Head of Protein Sequence Resources, all at EMBL’s European Bioinformatics Institute (EMBL-EBI), explore the research avenues opened up by the AlphaFold database and explain the method's limitations.


  • A structural biology community assessment of AlphaFold 2 applications.
Akdel et al., 2021
https://biorxiv.org/cgi/content/short/2021.09.26.461876
Several findings:
AlphaFold 2 can often predict the correct homo-oligomer structure when given the correct oligomeric state (number of copies in complex); however, it's not always able to predict the correct oligomeric state a priori.
"AF2 models can be used across diverse applications equally well compared to experimentally determined structures, when the confidence metrics are critically considered."


  • Protein complex prediction with AlphaFold-Multimer
Evans et al., 2021
https://www.biorxiv.org/content/10.1101/2021.10.04.463034
Highlights:
Fine tuned Alphafold 2 model for protein interaction predictions.
"The source code and weights for the trained models will be made available shortly."

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Wayne Decatur, Joel L. Sussman, Angel Herraez

Personal tools