Protein Data Bank

From Proteopedia

Jump to: navigation, search

The World Wide Protein Data Bank (wwPDB)[1][2] is the internationally recognized sole repository[3] of all published, empirically-determined atomic resolution macromolecular three-dimensional (3D) structure data. Founded in 1971 by Drs. Edgar Meyer and Walter Hamilton at Brookhaven National Laboratory[4][5], management of the Protein Data Bank was headed by Tom Koestle until 1994 and then by Joel L. Sussman till 1999, when it was transferred to members of the Research Collaboratory for Structural Bioinformatics (RCSB). RCSB is managed at Rutgers University and the San Diego Supercomputer Center. It was directed by Helen M. Berman until July 2014, when Stephen K. Burley took over the directorship[6]. In 2008, the PDB has three official branches: the Research Collaboratory for Structural Bioinformatics (RCSB, USA), the European Bioinformatics Institute (PDBe, UK), and the Protein Data Bank Japan (PDBj, Osaka).

Contents

New Releases Cycle

The wwPDB releases new entries once per week. These can be seen by clicking on the most recent release date, shown at the upper right of the main page at PDB.Org. In 2007, 7,280 new entries were released (an average of 140/week). In 2011, 8,101 new entries were released (average 155/week).[7]

While the traditional entry consisted of an atomic coordinate file molecular model, more recently, the experimental data (structure factors in the case of crystallography) have been deposited along with the the model. After February 1, 2008, deposition of experimental data is required along with all new entries.

Many derivative databases copy, derive information from, or add value to the atomic coordinate files available from the wwPDB. Often, these automatically update their databases weekly, shortly after the new releases become available at the PDB. Proteopedia is one example.

PDB Statistics

At pdb.org, at the upper right corner of the main page, click on PDB Statistics for a wealth of interesting information, including proteins solved by multiple experimental methods, sequence redundancy in the PDB, the distribution of resolutions, the 100 journals that have published the most new macromolecular structures, and graphs of the growth of the database (under Content Growth).

Some interesting statistics (maxima, minima, means) for the contents of the PDB are summarized at Believe It or Not.

Remediation

Periodically, the PDB remediates its archived data files. Remediation improves consistency and nomenclature and corrects some errors. Remediation involves changes in the PDB data format. Remediations occurred in August, 2007 and March, 2009. Details will be found at the World Wide PDB.

Here are some examples of changes that occurred in remediations affecting the PDB format.

  • DNA: Prior to August, 2007, both DNA and RNA nucleotides were named A, C, G, T, and U. After August, 2007, DNA nucleotides were changed to DA, DC, DG, DT and DU, while RNA nucleotides continued to use the older one-letter names. (An example of a model that contains both DNA and RNA is 104d.) This change required changes in software packages such as Jmol, and left unmaintained packages such as Protein Explorer unable to deal properly with the remediated nucleic acids.
  • Non-standard residues: Some PDB files represented non-standard residues as a standard residue (ATOM records) plus an adduct (HETATM records). Some of these were changed to a uniform name for a non-standard residue, so that all atoms in the same residue have the same name (and all are HETATM records). For example, phosphoserine in 1apm was SER plus PHO; phosphothreonine THR plus PHO. These were remediated to SEP and TPO. In another example, methylated ribonucleotides in 310d had been named e.g. +C1 plus CH3. These were remediated to OMC and so forth.
  • Order of atoms: In the March, 2009 remediation, the order of chains and atoms changed in some PDB files in a non-systematic manner. This broke some scenes that had been saved in Proteopedia, and required redesign of some portions of Proteopedia (see Proteopedia avoids remediation-related problems).

Obsolete (unremediated) versions of the data files were saved by the PDB before each remediation, and may be obtained: see Getting Unremediated PDB Files.

Sequence Numbering Anomalies

Entries in the PDB often contain anomalies in sequence numbering (see Homology_modeling_servers#Sequence_Numbering_Anomalies).

Improving Published Models

There are several free automated servers that can improve most published models. See Improving published models and Quality assessment for molecular models.

More About The Protein Data Bank

See Also in Proteopedia

External Sources

References and Notes

  1. Berman H, Henrick K, Nakamura H. Announcing the worldwide Protein Data Bank. Nat Struct Biol. 2003 Dec;10(12):980. PMID:14634627 doi:10.1038/nsb1203-980
  2. Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007 Jan;35(Database issue):D301-3. Epub 2006 Nov 16. PMID:17142228 doi:10.1093/nar/gkl971
  3. . Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 2019 Jan 8;47(D1):D520-D528. doi: 10.1093/nar/gky949. PMID:30357364 doi:http://dx.doi.org/10.1093/nar/gky949
  4. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535-42. PMID:875032
  5. Sussman JL, Lin D, Jiang J, Manning NO, Prilusky J, Ritter O, Abola EE. Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr D Biol Crystallogr. 1998 Nov 1;54(Pt 6 Pt 1):1078-84. PMID:10089483
  6. Leadership Transition, RCSB Newsletter, Fall 2014.
  7. In May 2012, the following numbers were reported by advanced search on release dates at RCSB. 2011: 8,101. 2010: 7907. 2009: 7388. 2008: 6964. 2007: 7199.
  8. Berman HM. Synergies between the Protein Data Bank and the community. Nat Struct Mol Biol. 2021 May;28(5):400-401. doi: 10.1038/s41594-021-00586-6. PMID:33963295 doi:http://dx.doi.org/10.1038/s41594-021-00586-6

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Joel L. Sussman, Wayne Decatur, Jaime Prilusky

Personal tools