Theoretical models

From Proteopedia

Jump to: navigation, search

This article needs to be improved, expanded, and more references need to be cited.

The term theoretical model refers to a molecular model obtained, wholly or in part, by the use of theory, such as homology modeling, energy minimization, molecular mechanics or molecular dynamics. Such theoretical models are distinguished from empirical models, which are usually obtained by X-ray crystallography or nuclear magnetic resonance (NMR).

The distinction between theoretical and empirical models is important because when theoretical models are compared with empirical models, the theoretical models often contain significant errors. In contrast, when the structure of a particular macromolecule is determined using empirical methods by different laboratories, or both by crystallography and NMR, the agreement is usually quite good.

1,390 theoretical models were historically deposited in the Protein Data Bank but removed from the main database in 2002. The structure displayed in the pages automatically generated in Proteopedia for these theoretical models should be interpreted with caution (see Category:Theoretical Model).


Empirical Models

Empirical models are not theoretical models, but are mentioned here for the sake of completeness. Empirical models, usually determined by X-ray crystallography, nuclear magnetic resonance or cyro-electron microscopy, are the most reliable and accurate models available. Methods for judging the reliability and quality of empirical models are discussed at Quality assessment for molecular models. Independent determinations of the same protein by empirical methods generally agree within about 0.5 Å RMS for carbon alphas (reference needed).

Homology Models

Method & Limitations

Homology models, also called comparative models, are obtained by folding a query protein sequence (also called the target sequence) to fit an empirically-determined template model. The registration between residues in the query and template is determined by an amino acid sequence alignment between the query and template sequences.

Imagine that the template’s polypeptide backbone is a folded glass tube. Now imagine that the query sequence is a thin metal chain that can be pulled through the tube. The chain (query) will adopt the same fold as the tube (template). The sequence alignment specifies how far the chain should be pulled into the tube; that is, how the residues in the query sequence match up with the structure of the template.

Errors or uncertainties in the sequence alignment result in errors or uncertainties in the homology model. Portions of the query sequence cannot be modeled reliably when there are Insertions/deletions in either sequence, or portions of the template that lack coordinates due to crystallographic disorder. Provided there is sufficient sequence identity between the query and template, the main chain in homology models is usually mostly correct. However, the positions of sidechains in homology models are usually incorrect.

Nevertheless, homology models are useful for seeing low-resolution features, such as which residues are on the surface or buried, which are close to other features of interest (such as a putative active site), and the overall distribution of charges and evolutionary conservation.

Attempts to improve homology models by molecular dynamics simulations have not been successful: "in most cases, simulations initiated from homology models drift away from the native structure"[1].

For further information, please see Practical Guide to Homology Modeling.

Paucity of Templates

Empirically-determined templates with adequate sequence identity are available for less than half of all protein sequences. One of the major goals of structural genomics is to increase the sequence diversity of the available empirically-determined structures that can be used as templates for homology modeling.

A number of free servers have libraries of homology models generated in advance for protein sequences, and many will create homology models for a submitted protein sequence. For more, please see

When no suitable template exists, the Structural Genomics Target Database should be searched with your sequence. In some cases, a sequence-similar protein has already been crystallized and diffracted, but the model may not have been completed, or the completed model may not yet have been deposited in the PDB. In such cases, it may be worthwhile to contact the team that has made the most progress on a closely related sequence.

See Also


Ab Initio Models

When there is no template with sufficient sequence identity to use for homology modeling, one can use ab initio or de novo folding theory to predict the structure of a target protein sequence. Such theory is about 70% successful at predicting secondary structure[2]. Tertiary structure prediction has modest success for small protein chains (80 amino acids or less), but is generally unable to predict the fold for longer chains. In about one out of four cases of small domains of less than 85 amino acids, the best predictions are within about 1.5 Å (RMS for carbon alphas) of the true structure[3]. (Independent determinations of the same protein by empirical methods generally agree within about 0.5 Å RMS for carbon alphas.)

The success of fold prediction methods is assessed biannually in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competitions[4]. Crystallographers submit sequences which they have solved, but for which the structures have not yet been published. Modelers predict the folds which are then compared with subsequently published structures. Beginning in CASP5 (2002), the ability to predict intrinsic disorder was included[5]. There are also competitions to predict protein-protein docking interactions[6]

Assessment of CASP results is done in a double-blind manner: the predictors do not know the empirical structures, and the assessors do not know the identities of the predictors, which are coded. In CASP8 (2008), there were 13 "template free" targets, that is, sequences for which no significant sequence identity occurred for any empirically solved entry in the PDB. These are the most difficult to predict, as they must be predicted by ab initio methods. 102 groups submitted predictions. Assessing the quality of a prediction is not simple, given that even "good" predictions can have high root mean square (RMS) deviations for alpha carbon alignment, e.g. due to a hinge[7]. Several assessment methods were used, each emphasizing different qualities. A number of groups submitted good predictions for six of the thirteen targets[7]. None of the submitted models was judged to be satisfactory for four of the thirteen templates[7].

See Also

References & Links

  1. Raval A, Piana S, Eastwood MP, Dror RO, Shaw DE. Refinement of protein structure homology models via long, all-atom molecular dynamics simulations. Proteins. 2012 Aug;80(8):2071-9. doi: 10.1002/prot.24098. Epub 2012 May 15. PMID:22513870 doi:10.1002/prot.24098
  2. Accuracy of Protein Structure Prediction at Stanford University.
  3. Bradley P, Misura KM, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005 Sep 16;309(5742):1868-71. PMID:16166519 doi:309/5742/1868
  4. Critical Assessment of Techniques for Protein Structure Prediction (CASP).
  5. Noivirt-Brik O, Prilusky J, Sussman JL. Assessment of disorder predictions in CASP8. Proteins. 2009 Aug 21. PMID:19774619 doi:10.1002/prot.22586
  6. CAPRI: Critical Assessment of PRediction of Interactions.
  7. 7.0 7.1 7.2 Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins. 2009 Aug 21. PMID:19774550 doi:10.1002/prot.22591
  8. Cooper S, Khatib F, Treuille A, Barbero J, Lee J, Beenen M, Leaver-Fay A, Baker D, Popovic Z, Players F. Predicting protein structures with a multiplayer online game. Nature. 2010 Aug 5;466(7307):756-60. PMID:20686574 doi:10.1038/nature09304
  9. Zhou M, Robinson CV. When proteomics meets structural biology. Trends Biochem Sci. 2010 Jun 3. PMID:20627589 doi:10.1016/j.tibs.2010.04.007

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Wayne Decatur, Jaime Prilusky

Personal tools