Non-Standard Residue
From Proteopedia
Any residue of protein or nucleic acid that is not included in the list of Standard Residues is considered Non-Standard. The atomic coordinates for atoms in non-standard residues are given in records of type HETATM in the PDB file format.
In addition to the 20 historically "standard" amino acids, two additional unusual but genetically encoded amino acids have been considered to be "standard" by the PDB since 2014[1][2]: selenocysteine and pyrrolysine. However, elsewhere these may still be designated as "non-standard".
Examples:
- SEP and TPO in 1bkx
- 1MA, 2MG, 5MC, 5MU, 7MG, H2U, M2G, OMC, OMG, PSU, and YG in 1evv
- PSU (Pseudouridine) and several others in tRNA
- MSE in 2ab5 (see Selenomethionine)
- D-amino acids, present in >700 entries in the PDB, for example 5e5t.
Although phosphoserine and phosphothreonine are given as non-standard residues SEP and TPO in 1bkx, an alternative convention may also be followed. For example, a non-standard phenylethane amino acid sidechain in 1b07 is named GLY (GLY 5 on chain C in ATOM records) plus PYJ (sequence number 1005 in chain C in HETATM records).
The former scheme of designating modified standard nucleotides with plus signs (+A, +C, +G, +I, +T, +U) was discontinued in the PDB remediation project, effective August 1, 2007. The unremediated files can still be obtained, see Getting Unremediated PDB Files.
At RCSB.Org, using the Advanced Search and query type Chemical ID, you can find all entries in the database that contain a particular chemical component. For example, in December, 2019, five entries contain PYL (explained in Non-Standard Residues).
A complete list of all compounds in the PDB is available, including hydrogens, 3D structures, and bond orders, in the Chemical Components Dictionary of the Worldwide Protein Data Bank. This includes all Standard Residues as well as Non-Standard Residues, carbohydrate adducts, Ligands and Hetero Groups. It is updated weekly for newly released entries.
See Also
References
- ↑ Announcement: Standardization of Amino Acid Nomenclature, World Wide Protein Data Bank News, January 8, 2014.
- ↑ 1fdo, released 1997, had selenocysteine 140 in chain A coded as HETATM CSE through the WWPDB snapshot of 2014-01-02, but had it coded aa ATOM SEC in the 2014-12-03 snapshot. See Getting Unremediated PDB Files.