AlphaFold2 examples from CASP 14
From Proteopedia
(Difference between revisions)
| Line 15: | Line 15: | ||
<scene name='87/875686/Chain_a_of_7jx6/1'>Here is one chain of ORF8</scene> from the higher resolution X-ray structure, [[7jx6]]. These chains form [http://firstglance.jmol.org/fg.htm?mol=7jx6 disulfide-linked dimers], and the dimers form higher order multimers<ref name="multimers">PMID: 33361333</ref> (not shown). Notice that the <span class="text-blue"><b>amino</b></span> and <span class="text-red"><b>carboxy</b></span> '''ends of the chain come together''' to form two parallel beta strands of a beta sheet. Also notice that there are '''3 disulfide bonds'''. An accurate prediction would include both of these features. | <scene name='87/875686/Chain_a_of_7jx6/1'>Here is one chain of ORF8</scene> from the higher resolution X-ray structure, [[7jx6]]. These chains form [http://firstglance.jmol.org/fg.htm?mol=7jx6 disulfide-linked dimers], and the dimers form higher order multimers<ref name="multimers">PMID: 33361333</ref> (not shown). Notice that the <span class="text-blue"><b>amino</b></span> and <span class="text-red"><b>carboxy</b></span> '''ends of the chain come together''' to form two parallel beta strands of a beta sheet. Also notice that there are '''3 disulfide bonds'''. An accurate prediction would include both of these features. | ||
| - | <scene name='87/875686/Morf_lin_7jx6_imf_7jtl/3'>The two X-ray structures agree very well</scene><ref name="imf"> | + | <scene name='87/875686/Morf_lin_7jx6_imf_7jtl/3'>The two X-ray structures agree very well</scene><ref name="imf">Superposition by Swiss-PdbViewer's ''iterative magic fit''. This starts with a sequence alignment-guided structural superposition, and then superposes subsets of the structures to minimize the RMSD. Eight intermediate structures were generated by the [[Morphs#Linear_Morph_Server|Theis Morph Server]] by linear interpolation.</ref>. The only substantial disagreement is for a large surface loop, sequence range 48-57. See the Table I below for [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD] values. |
===ORF8 is not a novel fold=== | ===ORF8 is not a novel fold=== | ||
| - | Less than 2% of new [[empirically-determined structures]] have novel folds; that is, folds not aready represented in the [[PDB]]<ref name="cath2011">PMID: 21097779</ref>. When chain A of [[7jx6]] was submitted to Dali<ref name="dali2020">PMID: 31606894</ref> (February, 2021), the top hit was the N-terminal domain of the two domains in [[5a2f]], the CD166 human cell surface receptor involved in activation of T lymphocytes. The Z-score was 7.1, and 88 alpha carbons | + | Less than 2% of new [[empirically-determined structures]] have novel folds; that is, folds not aready represented in the [[PDB]]<ref name="cath2011">PMID: 21097779</ref>. When chain A of [[7jx6]] was submitted to Dali<ref name="dali2020">PMID: 31606894</ref> (February, 2021), the top hit was the N-terminal domain of the two domains in [[5a2f]], the CD166 human cell surface receptor involved in activation of T lymphocytes. The Z-score was 7.1, and 88 alpha carbons superposed with RMSD 3.2 Å. Swiss-PdbViewer obtained RMSD 1.95 Å for 48 alpha carbons<ref name="fitselimprov">Using Swiss-PdbViewer's ''Fit from Selection'' with 102 residues selected from each structure, followed by ''Improve Fit''.</ref>. Dali reported the identity as 6% in its structure-based sequence alignment. Sequence alignment by MAFFT<ref name="mafft">PMID: 23329690</ref> obtained 18% sequence identity using more and larger gaps. <scene name='87/875686/Dali_5a2f_vs_7jx6_yale/2'>The structural similarity between Dali's top hit and 7jx6</scene><ref name="yale">Structural superposition by Dali. Interpolation by the [http://www2.molmovdb.org/wiki/info/index.php/Morph2_Server Yale Morph2 Server]. Homogenization method: homology modeling. No minimization. This produced a 9-model file where model 1 was 7jx6, and models 2-9 were interpolations. 5a2f residues 28-133 were added as model 10 (black in the molecular scene).</ref> is not as close as for AlphaFold2's prediction, but is closer than the 2nd best prediction (see Table I below). Dali's top hit has a single disulfide bond (compare with Table I). In conclusion, '''ORF8 does not have a novel fold'''<ref name="holm">The interpretation of Dali's result to mean that ORF8 does not have a novel fold was kindly confirmed by Liisa Holm, personal communication to [[User:Eric Martz|Eric Martz]].</ref>. |
===AlphaFold2 Prediction for ORF8=== | ===AlphaFold2 Prediction for ORF8=== | ||
| Line 25: | Line 25: | ||
{| style="text-align:center;" class="wikitable" | {| style="text-align:center;" class="wikitable" | ||
| - | |+ Table I. ORF8 Predictions | + | |+ Table I. ORF8 Predictions Superposed With Chain A of [[7jx6]] |
|- | |- | ||
| - | ! Model || GDT_TS || Disulfde<br>Bonds || Cα [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD], Å || Cα | + | ! Model || GDT_TS || Disulfde<br>Bonds || Cα [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD], Å || Cα Superposed || [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD] Including<br>Sidechains, Å || Atoms Superposed |
|- | |- | ||
| [[7jtl]]:A || || 3 || 4.02<br>'''0.66''' || 102/102 (100%)<br>'''87/102 (85%)''' || 4.3<br>'''1.58''' || 829/829 (100%)<br>'''709/829 (86%)''' | | [[7jtl]]:A || || 3 || 4.02<br>'''0.66''' || 102/102 (100%)<br>'''87/102 (85%)''' || 4.3<br>'''1.58''' || 829/829 (100%)<br>'''709/829 (86%)''' | ||
| Line 41: | Line 41: | ||
| Rosetta<br>Server || 26 || (2‡) || 14.99<br>† || 92/92 (100%)<br>† || 16.07<br>† || 747/748 (100%)<br>† | | Rosetta<br>Server || 26 || (2‡) || 14.99<br>† || 92/92 (100%)<br>† || 16.07<br>† || 747/748 (100%)<br>† | ||
|} | |} | ||
| - | : | + | :Superpositions by "Magic Fit"<ref name="mf">Superposition by Swiss-PdbViewer's ''magic fit''. This is a sequence alignment-guided structural superposition. Eight intermediate structures were generated by the [[Morphs#Linear_Morph_Server|Theis Morph Server]] by linear interpolation.</ref> of Swiss-PdbViewer 4.1.<br> |
| - | :''' | + | :'''Superpositions by "Iterative Magic Fit"<ref name="imf" /> of Swiss-PdbViewer 4.1.'''<br> |
:*Second best: Group of Xian Ming Pan, Tsinghua University, Beijing.<br> | :*Second best: Group of Xian Ming Pan, Tsinghua University, Beijing.<br> | ||
:§Third best: Group of Alberto Perez, University of Florida, Gainsville.<br> | :§Third best: Group of Alberto Perez, University of Florida, Gainsville.<br> | ||
| - | :† Iterative Magic Fit was unable to | + | :† Iterative Magic Fit was unable to superpose.<br> |
:‡ Neither disulfide bond is correct. | :‡ Neither disulfide bond is correct. | ||
| Line 53: | Line 53: | ||
===Third Best Prediction for ORF8=== | ===Third Best Prediction for ORF8=== | ||
| - | The third best prediction for ORF8 was by the Perez Lab, with GDT_TS 33 (see Table I above). It '''correctly predicted the parallel beta strands formed by the amino and carboxy terminal ends of the chain'''. <scene name='87/875686/3rd_best_orf8/1'>When the 2-stranded parallel beta strands formed by the ends of the chains are | + | The third best prediction for ORF8 was by the Perez Lab, with GDT_TS 33 (see Table I above). It '''correctly predicted the parallel beta strands formed by the amino and carboxy terminal ends of the chain'''. <scene name='87/875686/3rd_best_orf8/1'>When the 2-stranded parallel beta strands formed by the ends of the chains are superposed, the remainder superposes poorly</scene>. This prediction has '''no disulfide bonds'''. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with two incorrectly predicted salt bridges. |
===Top Prediction by an Automated Server=== | ===Top Prediction by an Automated Server=== | ||
| - | Among predictions by automated servers for all ~100 CASP 14 targets, the top ranking server was QUARK from the Yang Zhang group (Univ. Michigan). For ORF8, the Zhang-TBM server made the best server prediction with a '''GDT_TS of 27'''. (The prediction by QUARK was almost as good, GDT_TS 26.) The prediction has the '''two chain termini not parallel, and the amino terminus is not a beta strand''', differing in both respects from the X-ray model. Also, '''no disulfide bonds''' are predicted. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with several incorrectly predicted salt bridges. The structural | + | Among predictions by automated servers for all ~100 CASP 14 targets, the top ranking server was QUARK from the Yang Zhang group (Univ. Michigan). For ORF8, the Zhang-TBM server made the best server prediction with a '''GDT_TS of 27'''. (The prediction by QUARK was almost as good, GDT_TS 26.) The prediction has the '''two chain termini not parallel, and the amino terminus is not a beta strand''', differing in both respects from the X-ray model. Also, '''no disulfide bonds''' are predicted. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with several incorrectly predicted salt bridges. The structural superposition is very poor and is not shown. |
===Baker Rosetta Server Prediction for ORF8=== | ===Baker Rosetta Server Prediction for ORF8=== | ||
| - | Among predictions for all ~100 CASP 14 targets, the group of David Baker [https://predictioncenter.org/casp14/zscores_final.cgi ranked second]. The Rosetta Server of the Baker group ranked 18th overall, but was the 4th ranked server<ref name="serverranks">For all targets in CASP 14, the top two servers were QUARK and Zhang-server (which were not significantly different at a Z-score sum of 62.9), followed by Zhang-CEthreader (55.9) and BAKER-ROSETTASERVER (55.3).</ref>. [https://predictioncenter.org/casp14/results.cgi?view=tables&target=T1064-D1&model=1&groups_id= For ORF8, the Rosetta Server prediction GDT_TS was 26], a bit better than the median of 23. The Rosetta Server's prediction for ORF8 has '''the two termini far apart''' (Cα 13 Å or farther apart), a substantial difference from the X-ray structure (Cα mostly ~5 Å apart). It predicts '''two disulfide bonds, but neither matches''' the pairs of Cys residues in the actual disulfide bonds. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with one incorrectly predicted salt bridge. The structural | + | Among predictions for all ~100 CASP 14 targets, the group of David Baker [https://predictioncenter.org/casp14/zscores_final.cgi ranked second]. The Rosetta Server of the Baker group ranked 18th overall, but was the 4th ranked server<ref name="serverranks">For all targets in CASP 14, the top two servers were QUARK and Zhang-server (which were not significantly different at a Z-score sum of 62.9), followed by Zhang-CEthreader (55.9) and BAKER-ROSETTASERVER (55.3).</ref>. [https://predictioncenter.org/casp14/results.cgi?view=tables&target=T1064-D1&model=1&groups_id= For ORF8, the Rosetta Server prediction GDT_TS was 26], a bit better than the median of 23. The Rosetta Server's prediction for ORF8 has '''the two termini far apart''' (Cα 13 Å or farther apart), a substantial difference from the X-ray structure (Cα mostly ~5 Å apart). It predicts '''two disulfide bonds, but neither matches''' the pairs of Cys residues in the actual disulfide bonds. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with one incorrectly predicted salt bridge. The structural superposition is very poor and is not shown. |
</StructureSection> | </StructureSection> | ||
Revision as of 20:17, 1 March 2021
This page is under construction. Eric Martz 01:03, 22 February 2021 (UTC)
Prediction of protein structures from amino acid sequences, theoretical modeling, has been extremely challenging. In 2020, breakthrough success was achieved by AlphaFold2[1], a project of DeepMind. For an overview of this breakthrough, documented by the bi-annual prediction competition CASP, please see 2020: CASP 14. Below are illustrated some examples of predictions from that competition.
| |||||||||||
ORF8 Sidechain Accuracy
Table I gives RMSD values for all atoms, which is one indication of sidechain accuracy. Another is prediction of salt bridges and cation-pi interactions. As detailed in Tables II and III:
- AlphaFold2's prediction was correct for 4/5 interactions, with one incorrect interaction.
- AlphaFold2's prediction was correct for one of two salt bridges, and predicted no incorrect salt bridges.
- AlphaFold2's prediction was correct for three of three cation-pi interactions, but predicted one incorrect interaction.
- The 2nd best prediction was correct for 1/5 interactions, with 2 incorrect interactions.
- The 2nd best prediction was correct for one of two salt bridges, but predicted two incorrect salt bridges.
- The 2nd best prediction failed to predict any of the three cation-pi interactions, predicting zero interactions.
| 7JX6 | 7JTL | AlphaFold2 | 2nd Best |
|---|---|---|---|
| R101:D112 (AB) | R101:D113 (AB) | R86:D98 | R86:D98 |
| R115:D119 (AB) | R115:D119 (AB) | – | R100:E4 |
| K44:E59 (AB) | K44:E59 (AB) | K29:E44 | – |
| – | – | – | K78:E77 |
- Bridges in the same row are identical (except for red residues). Subtract 15 from the sequence numbers in the X-ray structures for the equivalent sequence numbers in the predictions.
- Black: Shortest sidechain nitrogen to sidechain oxygen distance ≤4.0 Å.
- Gray: Shortest sidechain nitrogen to sidechain oxygen distance 4.4 to 4.8 Å.
- –: Shortest sidechain nitrogen to sidechain oxygen distance 6 to 16 Å.
- (AB): The two chains in each X-ray model.
- Italics: erroneous prediction.
| 7JX6 | 7JTL | AlphaFold2 | 2nd Best |
|---|---|---|---|
| R101:Y46+Y108 (AB) | R101:Y46+Y108 (AB) | R86:Y31+Y96 | – |
| K44:F108 (B) | K44:F108 (AB) | K29:F93 | – |
| – | – | K79:F105 | – |
- All interactions listed are deemed energetically significant by the CaPTURE Server.
- Interactions in the same row are identical. Subtract 15 from the sequence numbers in the X-ray structures for the equivalent sequence numbers in the predictions.
- Italics: erroneous prediction.
- The 2nd best prediction has no cation-pi interactions.
- (AB): The two chains in each X-ray model.
References
- ↑ Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Zidek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan, 15. PMID:31942072 doi:http://dx.doi.org/10.1038/s41586-019-1923-7
- ↑ CASP14: what Google DeepMind’s AlphaFold 2 really achieved, and what it means for protein folding, biology and bioinformatics, a blog post by Carlos Outeir al Rubiera, December 3, 2020.
- ↑ Flower TG, Buffalo CZ, Hooy RM, Allaire M, Ren X, Hurley JH. Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein. Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). pii: 2021785118. doi:, 10.1073/pnas.2021785118. PMID:33361333 doi:http://dx.doi.org/10.1073/pnas.2021785118
- ↑ 4.0 4.1 Summary and Classifications of Domains for CASP 14.
- ↑ Flower TG, Buffalo CZ, Hooy RM, Allaire M, Ren X, Hurley JH. Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein. Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). pii: 2021785118. doi:, 10.1073/pnas.2021785118. PMID:33361333 doi:http://dx.doi.org/10.1073/pnas.2021785118
- ↑ 6.0 6.1 6.2 6.3 Superposition by Swiss-PdbViewer's iterative magic fit. This starts with a sequence alignment-guided structural superposition, and then superposes subsets of the structures to minimize the RMSD. Eight intermediate structures were generated by the Theis Morph Server by linear interpolation.
- ↑ Cuff AL, Sillitoe I, Lewis T, Clegg AB, Rentzsch R, Furnham N, Pellegrini-Calace M, Jones D, Thornton J, Orengo CA. Extending CATH: increasing coverage of the protein structure universe and linking structure with function. Nucleic Acids Res. 2011 Jan;39(Database issue):D420-6. doi: 10.1093/nar/gkq1001. , Epub 2010 Nov 19. PMID:21097779 doi:http://dx.doi.org/10.1093/nar/gkq1001
- ↑ Holm L. DALI and the persistence of protein shape. Protein Sci. 2020 Jan;29(1):128-140. doi: 10.1002/pro.3749. Epub 2019 Nov 5. PMID:31606894 doi:http://dx.doi.org/10.1002/pro.3749
- ↑ Using Swiss-PdbViewer's Fit from Selection with 102 residues selected from each structure, followed by Improve Fit.
- ↑ Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan, 16. PMID:23329690 doi:http://dx.doi.org/10.1093/molbev/mst010
- ↑ Structural superposition by Dali. Interpolation by the Yale Morph2 Server. Homogenization method: homology modeling. No minimization. This produced a 9-model file where model 1 was 7jx6, and models 2-9 were interpolations. 5a2f residues 28-133 were added as model 10 (black in the molecular scene).
- ↑ The interpretation of Dali's result to mean that ORF8 does not have a novel fold was kindly confirmed by Liisa Holm, personal communication to Eric Martz.
- ↑ Download AlphaFold2's predicted structure for ORF8 from T1064TS427_1-D1.pdb.
- ↑ Superposition by Swiss-PdbViewer's magic fit. This is a sequence alignment-guided structural superposition. Eight intermediate structures were generated by the Theis Morph Server by linear interpolation.
- ↑ For all targets in CASP 14, the top two servers were QUARK and Zhang-server (which were not significantly different at a Z-score sum of 62.9), followed by Zhang-CEthreader (55.9) and BAKER-ROSETTASERVER (55.3).
