Theoretical models

From Proteopedia

(Redirected from Theory)
Jump to: navigation, search

This article needs to be improved, expanded, and more references need to be cited.

The term theoretical model refers to a molecular model obtained, wholly or in part, by the use of theory, such as homology modeling, energy minimization, molecular mechanics or molecular dynamics. Such theoretical models are distinguished from empirical models, which are usually obtained by X-ray crystallography, nuclear magnetic resonance (NMR), or cryo-electron microscopy.

The distinction between theoretical and empirical models is important because when theoretical models are compared with empirical models, the theoretical models often contain significant errors. In contrast, when the structure of a particular macromolecule is determined using empirical methods by different laboratories, or both by crystallography and NMR, the agreement is usually quite good.

1,390 theoretical models were historically deposited in the Protein Data Bank but removed from the main database in 2002. The structure displayed in the pages automatically generated in Proteopedia for these theoretical models should be interpreted with caution (see Category:Theoretical Model).


Empirical Models

Empirical models are not theoretical models, but are mentioned here for the sake of completeness. Empirical models, usually determined by X-ray crystallography, nuclear magnetic resonance or cryo-electron microscopy, are the most reliable and accurate models available. Methods for judging the reliability and quality of empirical models are discussed at Quality assessment for molecular models. Independent determinations of the same protein by empirical methods generally agree within about 0.5 Å RMS for carbon alphas (reference needed).

Homology Models

Method & Limitations

Homology models, also called comparative models, are obtained by folding a query protein sequence (also called the target sequence) to fit an empirically-determined template model. The registration between residues in the query and template is determined by an amino acid sequence alignment between the query and template sequences.

Imagine that the template’s polypeptide backbone is a folded glass tube. Now imagine that the query sequence is a thin metal chain that can be pulled through the tube. The chain (query) will adopt the same fold as the tube (template). The sequence alignment specifies how far the chain should be pulled into the tube; that is, how the residues in the query sequence match up with the structure of the template.

Errors or uncertainties in the sequence alignment result in errors or uncertainties in the homology model. Portions of the query sequence cannot be modeled reliably when there are Insertions/deletions in either sequence, or portions of the template that lack coordinates due to crystallographic disorder. Provided there is sufficient sequence identity between the query and template, the main chain in homology models is usually mostly correct. However, the positions of sidechains in homology models are usually incorrect.

Nevertheless, homology models are useful for seeing low-resolution features, such as which residues are on the surface or buried, which are close to other features of interest (such as a putative active site), and the overall distribution of charges and evolutionary conservation.

Attempts to improve homology models by molecular dynamics simulations have not been successful: "in most cases, simulations initiated from homology models drift away from the native structure"[1].

For further information, please see Practical Guide to Homology Modeling.

Paucity of Templates

Empirically-determined templates with adequate sequence identity are available for less than half of all protein sequences. One of the major goals of structural genomics is to increase the sequence diversity of the available empirically-determined structures that can be used as templates for homology modeling.

A number of free servers have libraries of homology models generated in advance for protein sequences, and many will create homology models for a submitted protein sequence. For more, please see

When no suitable template exists, the Structural Genomics Target Database should be searched with your sequence. In some cases, a sequence-similar protein has already been crystallized and diffracted, but the model may not have been completed, or the completed model may not yet have been deposited in the PDB. In such cases, it may be worthwhile to contact the team that has made the most progress on a closely related sequence.

See Also


Ab Initio Models

When there is no template with sufficient sequence identity to use for homology modeling, one can use ab initio or de novo folding theory to predict the structure of a target protein sequence. Such theory is about 70% successful at predicting secondary structure[2].


The success of fold prediction methods is assessed biannually in the Critical Assessment of techniques for protein Structure Prediction (CASP) competitions[3]. Crystallographers submit sequences which they have solved, but for which the structures have not yet been published. Modelers predict the folds which are then compared with subsequently published structures. Beginning in CASP5 (2002), the ability to predict intrinsic disorder was included[4]. Assessment of CASP results is done in a double-blind manner: the predictors do not know the empirical structures, and the assessors do not know the identities of the predictors, which are coded.

There are also competitions to predict protein-protein docking interactions[5]


In 2005, for about one out of four cases of small domains of less than 85 amino acids, the best predictions were within about 1.5 Å (RMS for carbon alphas) of the true structure[6]. (Independent determinations of the same protein by empirical methods generally agree within about 0.5 Å RMS for carbon alphas.)


In CASP8 (2008), there were 13 "template free" targets, that is, sequences for which no significant sequence identity occurred for any empirically solved entry in the PDB. These are the most difficult to predict, as they must be predicted by ab initio methods. 102 groups submitted predictions. Assessing the quality of a prediction is not simple, given that even "good" predictions can have high root mean square (RMS) deviations for alpha carbon alignment, e.g. due to a hinge[7]. Several assessment methods were used, each emphasizing different qualities. A number of groups submitted good predictions for six of the thirteen targets[7]. None of the submitted models was judged to be satisfactory for four of the thirteen templates[7].


CASP 13 was held in 2018. Excerpts from the conclusions: "... the ability of predicting hard protein folds at the tertiary level has increased enormously ..." "On the other hand, important global and local features of prediction models are still seldom as accurate as in the experimental structure. This is the case of enzyme active sites and ligand binding sites, where the predicted arrangement of the amino acids side chains involved in ligand binding and substrate specificity has not achieved the level of accuracy required to confidently infer their function .... Accurate prediction of loops is still a challenging task*. As they are often involved in protein interactions, their incorrect prediction can compromise the accuracy of the interacting surface and overall structure of the complex." "... the ability of current methods in modeling the correct quaternary structure of proteins remains rudimentary and shows little progress compared to what observed at the tertiary level."[8]

"The most recent experiment (CASP13 held in 2018) saw dramatic progress in structure modeling without use of structural templates (historically 'ab initio' modeling). Progress was driven by the successful application of deep learning techniques to predict inter-residue distances. In turn, these results drove dramatic improvements in three-dimensional structure accuracy: With the proviso that there are an adequate number of sequences known for the protein family, the new methods essentially solve the long-standing problem of predicting the fold topology of monomeric proteins."[9]

*Fig. 4 in Kryshtafovych et al.[9] illustrates how, in the case of 6cci (~350 residues), the core of the protein is well-predicted, while the surface loops are poorly predicted. Surfaces of folded proteins are generally critical in their functions.

See Also

References & Links

  1. Raval A, Piana S, Eastwood MP, Dror RO, Shaw DE. Refinement of protein structure homology models via long, all-atom molecular dynamics simulations. Proteins. 2012 Aug;80(8):2071-9. doi: 10.1002/prot.24098. Epub 2012 May 15. PMID:22513870 doi:10.1002/prot.24098
  2. Accuracy of Protein Structure Prediction at Stanford University.
  3. Critical Assessment of techniques for protein Structure Prediction (CASP).
  4. Noivirt-Brik O, Prilusky J, Sussman JL. Assessment of disorder predictions in CASP8. Proteins. 2009 Aug 21. PMID:19774619 doi:10.1002/prot.22586
  5. CAPRI: Critical Assessment of PRediction of Interactions.
  6. Bradley P, Misura KM, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005 Sep 16;309(5742):1868-71. PMID:16166519 doi:309/5742/1868
  7. 7.0 7.1 7.2 Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins. 2009 Aug 21. PMID:19774550 doi:10.1002/prot.22591
  8. Lepore et al., in press in Proteins: Structure, Function, and Bioinformatics, 2019. DOI: 10.1002/prot.25805
  9. 9.0 9.1 Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical Assessment of Methods of Protein Structure Prediction (CASP) - Round XIII. Proteins. 2019 Oct 7. doi: 10.1002/prot.25823. PMID:31589781 doi:
  10. Cooper S, Khatib F, Treuille A, Barbero J, Lee J, Beenen M, Leaver-Fay A, Baker D, Popovic Z, Players F. Predicting protein structures with a multiplayer online game. Nature. 2010 Aug 5;466(7307):756-60. PMID:20686574 doi:10.1038/nature09304
  11. Zhou M, Robinson CV. When proteomics meets structural biology. Trends Biochem Sci. 2010 Jun 3. PMID:20627589 doi:10.1016/j.tibs.2010.04.007

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Wayne Decatur, Jaime Prilusky

Personal tools