Standard residues (standard amino acids and nucleotides) are defined in the PDB data file format, and have record type ATOM in PDB-format atomic coordinate files. Standard residues are:

  • The 22 standard amino acids, plus ambiguous residue codes ASX, GLX, and undetermined UNK. In 2014, the PDB added genetically encoded proteogenic SEC (U) and PYL (O) as "standard"[1][2].
  • Eleven standard nucleotides A, C, G, I, U, DA, DC, DG, DI, DT, and DU[3], plus N for an unknown nucleotide. (I is inosine.) The PDB provides this list under HET. Note that the PDB does not use "T" to designate ribo-thymidine, but rather 5MU for 5-methyl uridine[3].

The distinction between ribonucleotides (A, C, G, I, U) and deoxyribonucleotides (DA, DC, DG, DI, DT, DU[3]) was first made when the PDB was remediated, effective August 1, 2007. The unremediated files can still be obtained, see Getting Unremediated PDB Files.

Note that, in Jmol, A, C, G, I, T, U select nucleotides in either DNA or RNA for backward compatibility, while DA, DC, DG, DI, DT, and DU select only DNA nucleotides. You can select RNA nucleotides with e.g. "(A, U) and RNA", or by enclosing the single-letter nucleotide names in brackets, e.g. "([A],[C],[G],[I],[U])".

At RCSB.Org, using the Advanced Search and query type Chemical ID, you can find all entries in the database that contain a particular chemical component. For example, in December, 2019, five entries contain PYL (explained in Non-Standard Residues).

A complete list of all compounds in the PDB is available, including hydrogens, 3D structures, and bond orders, in the Chemical Components Dictionary of the Worldwide Protein Data Bank. This includes all Standard Residues as well as Non-Standard Residues, carbohydrate adducts, Ligands and Hetero Groups. It is updated weekly for newly released entries.

Notes & References

  1. Announcement: Standardization of Amino Acid Nomenclature, World Wide Protein Data Bank News, January 8, 2014.
  2. 1fdo, released 1997, had selenocysteine 140 in chain A coded as HETATM CSE through the WWPDB snapshot of 2014-01-02, but had it coded aa ATOM SEC in the 2014-12-03 snapshot. See Getting Unremediated PDB Files.
  3. 3.0 3.1 3.2 In December, 2019, there are over 80 entries in the Protein Data Bank containing deoxyribo-U ("DU"), and over 250 containing ribo-T. However, note that the PDB does not use "T" to designate ribo-T, but rather 5MU for 5-methyl uridine. 5-methyl-uracil is thymine.

