Amino acid composition

From Proteopedia

Jump to: navigation, search

The amino acid composition of a protein refers to the percentages of each amino acid in the sequence of that protein. The percentage, sometimes called the Mole percentage, is calculated for each of the 22 standard amino acids as the count of that amino acid divided by the total number of amino acids in the protein chain or molecule.



As an example, here is the amino acid composition of acetylcholinesterase of Torpedo californica (the Pacific electric ray), whose structure is 2ace. The canonical isoform sequence has length 586. In its mature form, a signal peptide is removed from the amino-terminus, and a pro-peptide is removed from the carboxy-terminus, leaving a mature length of 537, with this composition:

This composition bar graph was created by the Protein Information Resource's (PIR's) Composition/Molecular Weight Calculator. Protein sequences are easily obtained from UniProt.Org or by viewing a PDB entry in FirstGlance in Jmol and clicking on Sequences. You may wish to align the genomic full-length sequence from UniProt with the experimentally crystallized sequence. Here are instructions.

Average Compositions

Average compositions have been calculated for large numbers of proteins from diverse taxa. These are tabulated in the downloadable spreadsheet It is reassuring to see the agreement between tabulations generated in 1993, 1998, and 2008 (citations are in the spreadsheet).

The above percentages were determined for several thousand sequences of diverse proteins of length 200 residues, with sequence identities below 50%[1]. These data are included in the above-linked spreadsheet.

Determinants of Amino Acid Composition

GC-content of the organism's genome is the strongest genome-level determinant of amino acid composition.[2][3][4].

Other, weaker influences are:

  • Growth temperatures (mesophily/thermophily/hyperthermophily). Thermophiles have more glutamic acid (with reduction in glutamine), and more lysine and arginine[2]. This likely relates to the larger number of salt bridges in proteins of thermophiles, believe to contribute to thermostability[5].
  • Chain length. Proteins of thermophiles are, on average, shorter than those of mesophiles. Average lengths are 283 and 340, respectively[2]. A study of ~550,000 proteins with lengths 50-200 amino acids[1] concluded:
    • Increased with length, reaching a plateau: Ala, Asp, Glu, Gly, Pro, Val; less increase for Gln and Thr.
    • Decreased with length: Cys, Phe, His, Ile, Lys, Met, Asn, Ser.
    • Leu and Tyr are highest in short and long chains, and less frequent in middle-sized proteins.
    • Arg peaks in middle-sized proteins.
    • Trp is constant at about 1.4% for lengths 75-200.
  • Linkers vs. domains: Linkers between domains have more polar residues, while compact domains have more hydrophobic residues[3].
  • Habitat: The environment in which an organism lives has a minor effect on the average composition of its proteins[4].
  • Compositional variability ranks archaea > baceteria > eukaryotes[3].

Composition Calculators

  • EMBL-EBI's EMBOSS-PepStats generates a table readily imported into a spreadsheet. The table has both 1-letter and 3-letter amino acid abbreviations, sorted by 1-letter codes.
Importing Composition Data Into Excel: Copy the data columns only, paste into a plain text editor and save to a plain text file. In Excel, in an existing (possibly empty) spreadsheet, File, Import, Text. Check 3 delimiter options: Tab, Space, Treat consecutive delimiters as one. Proceed to import.
  • ExPASy's ProtParam generates a table readily imported into a spreadsheet. The table has both 1-letter and 3-letter amino acid abbreviations, sorted by 3-letter codes. It also offers a CSV output, an alternative format understood by spreadsheets.


  1. 1.0 1.1 Carugo O. Amino acid composition and protein dimension. Protein Sci. 2008 Dec;17(12):2187-91. doi: 10.1110/ps.037762.108. Epub 2008 Sep, 9. PMID:18780815 doi:
  2. 2.0 2.1 2.2 Tekaia F, Yeramian E, Dujon B. Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene. 2002 Sep 4;297(1-2):51-60. doi: 10.1016/s0378-1119(02)00871-5. PMID:12384285 doi:
  3. 3.0 3.1 3.2 Brune D, Andrade-Navarro MA, Mier P. Proteome-wide comparison between the amino acid composition of domains and linkers. BMC Res Notes. 2018 Feb 9;11(1):117. doi: 10.1186/s13104-018-3221-0. PMID:29426365 doi:
  4. 4.0 4.1 Moura A, Savageau MA, Alves R. Relative amino acid composition signatures of organisms and environments. PLoS One. 2013 Oct 25;8(10):e77319. doi: 10.1371/journal.pone.0077319., eCollection 2013. PMID:24204807 doi:
  5. Chan CH, Yu TH, Wong KB. Stabilizing salt-bridge enhances protein thermostability by reducing the heat capacity change of unfolding. PLoS One. 2011;6(6):e21624. Epub 2011 Jun 24. PMID:21720566 doi:10.1371/journal.pone.0021624

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz

Personal tools