AlphaFold

From Proteopedia

Jump to: navigation, search

In 2020, the AlphaFold2[1][2] system of DeepMind[3][4] demonstrated a major breakthrough[5][6][7][8]. At CASP14, AlphaFold2 was far better able, among over 100 competing groups, to predict structures, including sidechain positions, so close to the subsequently revealed X-ray crystallographic structures as to differ by little more than the differences between two independently-determined X-ray structures of the same molecule. It did this for about two-thirds of the targets in the competition. AlphaFold2 has been hailed as largely solving the protein structure prediction problem for single-chain proteins[5][6][7][8]. "Never in my life had I expected to see a scientific advance so rapid." said Mohammed AlQuraishi of Columbia University[5].

Contents

AlphaFold Database of Predictions

In July, 2021, DeepMind made available over 300,000 structure predictions from amino acid sequences in their free AlphaFold DB[9][10][11][12]. These predictions include nearly all ~20,000 proteins in the human proteome, 36% with very high confidence, and another 22% with high confidence[12][13]. Also included are E. coli, fruit fly, mouse, zebrafish, malaria parasite and tuberculosis bacteria[12]. Limitations of these predictions were enumerated[11], including:

  • Inability to predict protein-protein or protein-DNA/RNA/ligand complexes. #RoseTTAFold claims to have made progress on this.
  • Does not predict ligands, cofactors, metals, ions, glycosylation, etc.
  • Does not deal with conformational dynamics.
  • Does not predict intrinsically unstructured segments.
  • Does not predict the folding pathway.
  • Has not been trained to predict structural consequences of point mutations.

Nevertheless, these predictions have many potential benefits[11], including:

  • Simplifying X-ray crystallography by enabling solution of the phase problem by molecular replacement using the predicted model.
  • Assisting crystallographers in defining domain boundaries in order to crystallize domains when crystallization of full length proteins is problematic.
  • Helping to interpret >5,000 cryo-EM maps previously deposited in the EMDB that could not be interpreted as atomic models, as well as helping to interpret lower resolution EM maps as atomic models.

AlphaFold published July 2021

AlphaFold was published in July, 2021[14]. Methods were described in considerable detail. The source code, trained weights, and inference script were made available under an open-source license. Structure prediction required about one GPU (Graphics Processing Unit) minute per model of about 384 amino acids.

Impressively, AlphaFold had remarkable success predicting a set of 10,795 protein chain structures (filtered for high reliability, lengths restricted to 80-1,400 residues) published in the PDB after AlphaFold's training set[15]. Overall alpha carbon accuracy had a median of 1.46 Å RMSD at 95% coverage. The majority of chain structures were predicted with full-chain alpha carbon RMSD values <2 Å. About 25% were predicted with RMSD >4 Å.

Importantly, each prediction comes with a confidence score that reliably predicts the accuracy of the predicted structure.

Accurate prediction of sidechains required accurate prediction of the main chain. Accurate prediction required a multiple sequence alignment depth >~30 sequences, with a depth of ~100 sequences being adequate.

Free AlphaFold-based Servers

RoseTTAFold

Also in July, 2021, Minkyung Baek and a large team in the group of David Baker published their RoseTTAFold employing a three-track network, based in part on methods inspired by AlphaFold but not yet fully-detailed by DeepMind. They reported "accuracies approaching those of DeepMind in CASP14"[16]. At the time of its release in July, 2021, it had outperformed all other available structure prediction servers[16].

The RoseTTAFold Server was made freely available. (Open the Structure Prediction menu at the top and choose Submit. At the form, be sure to check RoseTTAFold before submitting your job).

AlphaFold Colab

Google provides "Colaboratories" (Colabs). A Colab "allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education"[17].

DeepMind has provided an Alphafold Colab that uses a "slightly simplified" version of AlphaFold version 2.0: "While accuracy will be near-identical to the full AlphaFold system on many targets, a small fraction have a large drop in accuracy due to the smaller MSA and lack of templates.". The AlphaFold Colab is free to use. The code is executed in a virtual machine private to your account, and data are stored on Google Drive. Nothing is installed on your computer; "everything happens in the cloud on Google Colab"[18]

For those unfamiliar with Colabs, the user interface may look unfamiliar, but the instructions are clear and straightforward to use. The mentions of "Runtime -> Run after" refer to the Runtime pull-down menu at the very top of the page. Getting a result may take several hours.

Current offerings via Colab

Work is ongoing and other offerings have on Colab are now available, for RoseTTAFold and AlphaFold 2 Colab beside the ones detailed above. This summary guide & video should help in choosing how to analyze your proteins of interest:

- A Guide to the free RoseTTAFold and AlphaFold 2 Colab notebooks

- A video covering an overview, comparison of some of the methods and how people are already extending them, how to submit and interpret, and a tutorial on how to use AlphaFold2 Colab is available. The video was recorded on August 4th, 2021 presented by Sergey Ovchinnikov and Martin Steinegger, hosted by Chris Bahl for the Boston Protein Design and Modeling Club

References

  1. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Zidek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan, 15. PMID:31942072 doi:http://dx.doi.org/10.1038/s41586-019-1923-7
  2. AlphaFold at Wikipedia.
  3. AlphaFold: a solution to a 50-year-old grand challenge in biology, DeepMind Blog, November 30, 2020.
  4. DeepMind at Wikipedia.
  5. 5.0 5.1 5.2 AlphaFold2 @ CASP14: “It feels like one’s child has left home.” by Mohammed AlQuraishi, December 8, 2020.
  6. 6.0 6.1 Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionise’ medical research, CASP Press Release, November 30, 2020.
  7. 7.0 7.1 Callaway E. 'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures. Nature. 2020 Dec;588(7837):203-204. doi: 10.1038/d41586-020-03348-4. PMID:33257889 doi:http://dx.doi.org/10.1038/d41586-020-03348-4
  8. 8.0 8.1 DeepMind and CASP14 by John R. Helliwell, International Union of Crystallography Newsletter, December 4, 2020.
  9. We’ve made AlphaFold predictions freely available to anyone in the scientific community at DeepMind.com (date of release not specified, approximately July 2021).
  10. AlphaFold’s protein structure predictions now available to explore at the European Bioinformatics Institute, July 23, 2021.
  11. 11.0 11.1 11.2 Great expectations – the potential impacts of AlphaFold DB at the European Bioinformatics Institute, July 22, 2021
  12. 12.0 12.1 12.2 DeepMind and EMBL release the most complete database of predicted 3D structures of human proteins at the European Bioinformatics Institute, July 22, 2021.
  13. Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Zidek A, Bridgland A, Cowie A, Meyer C, Laydon A, Velankar S, Kleywegt GJ, Bateman A, Evans R, Pritzel A, Figurnov M, Ronneberger O, Bates R, Kohl SAA, Potapenko A, Ballard AJ, Romera-Paredes B, Nikolov S, Jain R, Clancy E, Reiman D, Petersen S, Senior AW, Kavukcuoglu K, Birney E, Kohli P, Jumper J, Hassabis D. Highly accurate protein structure prediction for the human proteome. Nature. 2021 Jul 22. pii: 10.1038/s41586-021-03828-1. doi:, 10.1038/s41586-021-03828-1. PMID:34293799 doi:http://dx.doi.org/10.1038/s41586-021-03828-1
  14. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Zidek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Jul 15. pii: 10.1038/s41586-021-03819-2. doi:, 10.1038/s41586-021-03819-2. PMID:34265844 doi:http://dx.doi.org/10.1038/s41586-021-03819-2
  15. The training set cutoff was 2018/04/30. The test set was obtained between then and 2021/02/15.
  16. 16.0 16.1 Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, Millan C, Park H, Adams C, Glassman CR, DeGiovanni A, Pereira JH, Rodrigues AV, van Dijk AA, Ebrecht AC, Opperman DJ, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy MK, Dalwadi U, Yip CK, Burke JE, Garcia KC, Grishin NV, Adams PD, Read RJ, Baker D. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021 Jul 15. pii: science.abj8754. doi: 10.1126/science.abj8754. PMID:34282049 doi:http://dx.doi.org/10.1126/science.abj8754
  17. Collaboratory FAQ at Google.
  18. Alphafold Colab.

Further reading

  • AlphaFold protein structure predictions - a step change for biology.
(Report by Oana Stroe, Senior Communications Officer at EMBL-EBI. 28 July 2021 at FEBS Network)
Sameer Velankar and Gerard Kleywegt, from the Protein Data Bank in Europe, and Alex Bateman, Head of Protein Sequence Resources, all at EMBL’s European Bioinformatics Institute (EMBL-EBI), explore the research avenues opened up by the AlphaFold database and explain the method's limitations.

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Wayne Decatur, Joel L. Sussman, Angel Herraez

Personal tools