AlphaFold2 examples from CASP 14
From Proteopedia
| Line 74: | Line 74: | ||
===X-ray crystal structure=== | ===X-ray crystal structure=== | ||
| - | The crystallographic structure, not available to the prediction teams during the CASP competition, is [[6vr4]], with [[resolution]] 3.5 Å, and an Rfree "reliability" of "much better than average for this resolution" (according to [[FirstGlance in Jmol]]). The termini of the chain are far apart from each other (~85 Å), and there are no disulfide bonds. | + | The crystallographic structure, not available to the prediction teams during the CASP competition, is [[6vr4]], with [[resolution]] 3.5 Å, and an Rfree "reliability" of "much better than average for this resolution" (according to [[FirstGlance in Jmol]]). The termini of the chain are far apart from each other (~85 Å), and there are no disulfide bonds. The [[asymmetric unit]] contains 2 chains. The reference structure was taken from '''chain B''' because it has a lower average [[temperature factor]] than chain A. |
The CASP 14 target T1037 sequence of 404 residues begins at sequence number 337 and ends at 901, a span of 565 residues. The target sequence is 404 residues because it excludes residues 370-530 (length 161), which form a different domain. <scene name='87/875686/6vr4_b_2180_residues/4'>Here is the full 2,180 residue chain with 337-901 (565 residues) opaque</scene>. Here is <span style="color:#d000d0;"><b>the 404-residue target sequence </b></span> with the <span class="text-gray"><b>intervening domain (excluded from the CASP target)</b> <scene name='87/875686/6vr4_b_2180_residues/5'>highlighted within the full 2,180-residue chain</scene>. | The CASP 14 target T1037 sequence of 404 residues begins at sequence number 337 and ends at 901, a span of 565 residues. The target sequence is 404 residues because it excludes residues 370-530 (length 161), which form a different domain. <scene name='87/875686/6vr4_b_2180_residues/4'>Here is the full 2,180 residue chain with 337-901 (565 residues) opaque</scene>. Here is <span style="color:#d000d0;"><b>the 404-residue target sequence </b></span> with the <span class="text-gray"><b>intervening domain (excluded from the CASP target)</b> <scene name='87/875686/6vr4_b_2180_residues/5'>highlighted within the full 2,180-residue chain</scene>. | ||
| - | <scene name='87/875686/T1037_length_404/1'>The X-ray structure of CASP 14 domain T1037</scene> (length 404 residues) consists of residues 337-369 + 531-901 of [[6vr4]]. It is an <scene name='87/875686/T1037_length_404/2'>alpha/beta domain with secondary structure</scene> <span style="color:#ff0080;font-weight:bold;">45% helices</span>, <span style="color:#ffc800;background-color:black;font-weight:bold;"> 19% beta strands </span>, and 37% loops and turns. The N- and C-termini are 10 Å apart, and there are no cysteines (thus no disulfide bonds). | + | <scene name='87/875686/T1037_length_404/1'>The X-ray structure of CASP 14 domain T1037</scene> (length 404 residues) consists of residues 337-369 + 531-901 of [[6vr4]] (taken from chain B). It is an <scene name='87/875686/T1037_length_404/2'>alpha/beta domain with secondary structure</scene> <span style="color:#ff0080;font-weight:bold;">45% helices</span>, <span style="color:#ffc800;background-color:black;font-weight:bold;"> 19% beta strands </span>, and 37% loops and turns. The N- and C-termini are 10 Å apart, and there are no cysteines (thus no disulfide bonds). |
===AlphaFold2 prediction for T1037=== | ===AlphaFold2 prediction for T1037=== | ||
Revision as of 21:47, 7 March 2021
This page is under construction. Eric Martz 01:03, 22 February 2021 (UTC)
Prediction of protein structures from amino acid sequences, theoretical modeling, has been extremely challenging. In 2020, breakthrough success was achieved by AlphaFold2[1], a project of DeepMind. For an overview of this breakthrough, documented by the bi-annual prediction competition CASP, please see 2020: CASP 14. Below are illustrated some examples of predictions from that competition.
| |||||||||||
Contents |
ORF8 Sidechain Accuracy
AlphaFold2's predictions for sidechain positions seem fairly good, while sidechain positions in the 2nd best prediction seem poor. This conclusion is based on three types of observations:
- Table I gives RMSD values for all atoms, which is one indication of sidechain accuracy.
- Prediction of salt bridges and cation-pi interactions.
- Visualization of the distributions of charges on the surfaces.
Salt Bridges and Cation-Pi Interactions
- AlphaFold2's prediction was correct for 4/5 interactions, with one incorrect interaction.
- AlphaFold2's prediction was correct for one of two salt bridges, and predicted no incorrect salt bridges.
- AlphaFold2's prediction was correct for three of three cation-pi interactions, but predicted one incorrect interaction.
- The 2nd best prediction was correct for 1/5 interactions, with 2 incorrect interactions.
- The 2nd best prediction was correct for one of two salt bridges, but predicted two incorrect salt bridges.
- The 2nd best prediction failed to predict any of the three cation-pi interactions, predicting zero interactions.
| 7JX6 | 7JTL | AlphaFold2 | 2nd Best |
|---|---|---|---|
| R101:D112 (AB) | R101:D113 (AB) | R86:D98 | R86:D98 |
| R115:D119 (AB) | R115:D119 (AB) | – | R100:E4 |
| K44:E59 (AB) | K44:E59 (AB) | K29:E44 | – |
| – | – | – | K78:E77 |
- Bridges in the same row are identical (except for red residues). Subtract 15 from the sequence numbers in the X-ray structures for the equivalent sequence numbers in the predictions.
- Black: Shortest sidechain nitrogen to sidechain oxygen distance ≤4.0 Å.
- Gray: Shortest sidechain nitrogen to sidechain oxygen distance 4.4 to 4.8 Å.
- –: Shortest sidechain nitrogen to sidechain oxygen distance 6 to 16 Å.
- (AB): The two chains in each X-ray model.
- Italics: erroneous prediction.
| 7JX6 | 7JTL | AlphaFold2 | 2nd Best |
|---|---|---|---|
| R101:Y46+Y108 (AB) | R101:Y46+Y108 (AB) | R86:Y31+Y96 | – |
| K44:F108 (B) | K44:F108 (AB) | K29:F93 | – |
| – | – | K79:F105 | – |
- All interactions listed are deemed energetically significant by the CaPTURE Server.
- Interactions in the same row are identical. Subtract 15 from the sequence numbers in the X-ray structures for the equivalent sequence numbers in the predictions.
- Italics: erroneous prediction.
- The 2nd best prediction has no cation-pi interactions.
- (AB): The two chains in each X-ray model.
Visualization of Surface Charge Distributions
GDT_TS Calculations
GDT_TS values for predictions are taken from CASP 14 results. GDT_TS values for 7JTL and 5A2F vs. 7JX6 chain A were calculated using the AS2TS server of Adam Zemla[21]. See instructions for Calculating GDT_TS. CASP 14 reported GDT_TS 86.96 for the AlphaFold2 prediction, while the AS2TS server calculated GDT_TS 86.41 vs. 7jx6 chain A, and 88.59 vs. 7JTL chain A.
References
- ↑ Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Zidek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan, 15. PMID:31942072 doi:http://dx.doi.org/10.1038/s41586-019-1923-7
- ↑ 2.0 2.1 2.2 Flower TG, Buffalo CZ, Hooy RM, Allaire M, Ren X, Hurley JH. Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein. Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). pii: 2021785118. doi:, 10.1073/pnas.2021785118. PMID:33361333 doi:http://dx.doi.org/10.1073/pnas.2021785118
- ↑ For SARS-CoV-2 ORF8, at the CASP 14 Table Browser, check T1064-D1 and press Show Results.
- ↑ CASP14: what Google DeepMind’s AlphaFold 2 really achieved, and what it means for protein folding, biology and bioinformatics, a blog post by Carlos Outeir al Rubiera, December 3, 2020.
- ↑ 5.0 5.1 5.2 Drobysheva AV, Panafidina SA, Kolesnik MV, Klimuk EI, Minakhin L, Yakunina MV, Borukhov S, Nilsson E, Holmfeldt K, Yutin N, Makarova KS, Koonin EV, Severinov KV, Leiman PG, Sokolova ML. Structure and function of virion RNA polymerase of a crAss-like phage. Nature. 2020 Nov 18. pii: 10.1038/s41586-020-2921-5. doi:, 10.1038/s41586-020-2921-5. PMID:33208949 doi:http://dx.doi.org/10.1038/s41586-020-2921-5
- ↑ 6.0 6.1 Summary and Classifications of Domains for CASP 14.
- ↑ 7.0 7.1 7.2 7.3 Superposition by Swiss-PdbViewer's iterative magic fit. This starts with a sequence alignment-guided structural superposition, and then superposes subsets of the structures to minimize the RMSD. Eight intermediate structures were generated by the Theis Morph Server by linear interpolation.
- ↑ Cuff AL, Sillitoe I, Lewis T, Clegg AB, Rentzsch R, Furnham N, Pellegrini-Calace M, Jones D, Thornton J, Orengo CA. Extending CATH: increasing coverage of the protein structure universe and linking structure with function. Nucleic Acids Res. 2011 Jan;39(Database issue):D420-6. doi: 10.1093/nar/gkq1001. , Epub 2010 Nov 19. PMID:21097779 doi:http://dx.doi.org/10.1093/nar/gkq1001
- ↑ Holm L. DALI and the persistence of protein shape. Protein Sci. 2020 Jan;29(1):128-140. doi: 10.1002/pro.3749. Epub 2019 Nov 5. PMID:31606894 doi:http://dx.doi.org/10.1002/pro.3749
- ↑ Using Swiss-PdbViewer's Fit from Selection with 102 residues selected from each structure, followed by Improve Fit.
- ↑ Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan, 16. PMID:23329690 doi:http://dx.doi.org/10.1093/molbev/mst010
- ↑ Structural superposition by Dali. Interpolation by the Yale Morph2 Server. Homogenization method: homology modeling. No minimization. This produced a 9-model file where model 1 was 7jx6, and models 2-9 were interpolations. 5a2f residues 28-133 were added as model 10 (black in the molecular scene).
- ↑ The interpretation of Dali's result to mean that ORF8 does not have a novel fold was kindly confirmed by Liisa Holm, personal communication to Eric Martz, February, 2021.
- ↑ Download AlphaFold2's predicted structure for ORF8 from T1064TS427_1-D1.pdb.
- ↑ 15.0 15.1 15.2 See #GDT_TS Calculations.
- ↑ See #ORF8 is not a novel fold.
- ↑ Superposition by Swiss-PdbViewer's magic fit. This is a sequence alignment-guided structural superposition. Eight intermediate structures were generated by the Theis Morph Server by linear interpolation.
- ↑ Superposition by Swiss-PdbViewer's Explore Fragment Alternate Fits, which does not use sequence information. Eight intermediate structures were generated by the Theis Morph Server by linear interpolation.
- ↑ For all targets in CASP 14, the top two servers were QUARK and Zhang-server (which were not significantly different at a Z-score sum of 62.9), followed by Zhang-CEthreader (55.9) and BAKER-ROSETTASERVER (55.3).
- ↑ Johansen JE, Nielsen P, Sjoholm C. Description of Cellulophaga baltica gen. nov., sp. nov. and Cellulophaga fucicola gen. nov., sp. nov. and reclassification of [Cytophaga] lytica to Cellulophaga lytica gen. nov., comb. nov. Int J Syst Bacteriol. 1999 Jul;49 Pt 3:1231-40. doi: 10.1099/00207713-49-3-1231. PMID:10425785 doi:http://dx.doi.org/10.1099/00207713-49-3-1231
- ↑ Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003 Jul 1;31(13):3370-4. doi: 10.1093/nar/gkg571. PMID:12824330 doi:http://dx.doi.org/10.1093/nar/gkg571

