Calculating GDT TS

From Proteopedia

Jump to: navigation, search

Contents

What is GDT_TS?

The Global Distance Test - Total Score (GDT_TS)[1][2][3][4] is used to quantitate the similarity between a predicted protein structure (or any query protein structure), and a reference structure, which is typically an empirical model. The sequences of the two structures need not be the same. GDT_TS gives an overall average measure of how close each amino acid in the predicted model is to those in the empirical model, taking into account many different superpositions of the two models. When the two structures differ in detail, GDT_TS is better at detecting similarities in fold than is the Root Mean Square Deviation. "RMSD uses the actual distances between alpha carbons, where GDT works with the percentage of alpha carbons that are found within certain cutoff distances of each other."[5] Both tests compare the positions of only the alpha carbon atoms. GDT_TS values range from 0 (a meaningless prediction) to 100 (a perfect prediction). "Random predictions give around 20; getting the gross topology right gets one to ~50; accurate topology is usually around 70; and when all the little bits and pieces, including side-chain conformations, are correct, GDT_TS begins to climb above 90."[6].

The accuracies of predictions submitted to the biannual CASP competitions are judged largely by GDT_TS. Proteopedia pages using GDT_TS include Theoretical models and AlphaFold2 examples from CASP 14.

Server for calculating GDT_TS

GDT_TS can be calculated with the AS2TS Server provided by Adam Zemla[1][7]. Below are detailed instructions kindly provided by Zemla in March, 2021.

Run 1: Superposition

You will need to do two runs on the server. The first run produces the best superposition. The second run calculates GDT_TS based on that superposition.

1. Go to the AS2TS server: linum.proteinmodel.org.

2. Under Protein Structure Analysis services, click LGA = pairwise protein structure comparison. (Technical information is available at the Service description link on the same line.)

3. Fill in your email address.

4. Provide the two structures to be compared using method 1, 2, or 3. Specify the predicted or query structure first -- this will be superposed on the reference structure. Specify the reference structure second.

For a concrete example, we'll use SARS-CoV-2 ORF8, which was a target in CASP 14. There are two X-ray structures, 7jtl and 7jx6. Chain A of the latter has the highest resolution and fewest missing residues. When it was submitted (February, 2021) to the DALI Server in PDB 25 mode, the top hit for structural similarity was the N-terminal domain of 5a2f. So in slot 1 was entered 5a2f_A 7jx6_A.

5. Caveat: If you let your browser auto-fill the email address slot, make sure to clear any other slots that got auto-filled inadvertently. Otherwise you may get an error message.

6. Add at the end of the default parameters "-d:4.0". Thus you will submit your job with these parameters:

-4 -o2 -gdc -lga_m -stral -d:4.0
(Without this, LGA defaults to 5.0 Å, but CASP uses 4.0.)

7. Press the START button.

In the results for Run 1, you may be interested in the RMSD and Seq_Id for the superposition deemed optimal by this server.

The LGA_S value (range 0 to 100) is a structure similarity score for the number of alpha carbons given under N. LGA_S values below ~40 indicate that the two structures have different folds. The LGA_S score for our example is 49.46 for 87 alpha carbons, indicating similar folds.

Caveat: If the first structure is half the length of the second reference structure, then the maximum possible LGA score is 50%. On the other hand, if the second structure is half the length of the first one, then the maximum possible LGA score is 100%. In our example, the length of 5a2f_A is 218 amino acids, and the length of the reference structure 7jx6_A is 104. Therefore, a score of 100 is not impossible.

Run 2: GDT_TS

Run 2 uses the superposition determined in Run 1.

8. Copy the entire output of Run 1 to the clipboard.

9. In a separate tab, get the same form LGA = pairwise protein structure comparison.

10. Make sure there are no molecules specified in sections 1, 2, or 3 of the form. If necessary, press the Clear Form button. If you leave molecules from the previous run in sections 1, 2, or 3, what you paste into Box 4 will be ignored!

11. Enter your email address.

12. Paste the entire output of Run 1 into box 4.

13. In the parameters slot, change -4 to -3, and add -d:4.0  -al at the end. So the complete parameters for Run 2 should be

-3 -o2 -gdc -lga_m -stral  -d:4.0  -al.

14. Press the START button. The output for our example:

15. The GDT_TS score reported by the server needs to be adjusted to reflect the similarity for the entire reference structure, namely 104 residues in our example. The final GDT_TS score for structure similarity between 5a2f_A and 7jx6_A can be estimated as follows:

GDT_TS = 63.068 * 88/104 = 53.37

However, the CASP 14 target for ORF8 had only 92 residues[8]. Therefore, to calculate GDT_TS values for comparision with CASP 14 ORF8 prediction values, the correction denominator should be 92 instead of 104:

GDT_TS = 63.068 * 88/92 = 60.33

See Also

References

  1. 1.0 1.1 Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003 Jul 1;31(13):3370-4. doi: 10.1093/nar/gkg571. PMID:12824330 doi:http://dx.doi.org/10.1093/nar/gkg571
  2. GDT_TS definition at the CASP 14 website.
  3. GDT description at the CASP website.
  4. Global distance test at Wikipedia.
  5. GDT in the Foldit Wiki.
  6. AlphaFold2 @ CASP14: “It feels like one’s child has left home.” by Mohammed AlQuraishi, December 8, 2020.
  7. Zemla A, Zhou CE, Slezak T, Kuczmarski T, Rama D, Torres C, Sawicka D, Barsky D. AS2TS system for protein structure modeling and analysis. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W111-5. doi:, 10.1093/nar/gki457. PMID:15980437 doi:http://dx.doi.org/10.1093/nar/gki457
  8. CASP 14 Domain Definitions and Classifications.

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz

Personal tools