How To Align Protein Sequences
and Display Multiple Sequence Alignments

A support document for FirstGlance in Jmol.

Why Align Sequences?

You may wish to compare the sequence of the protein used in the experiment (such as crystallization) with the full-length genomic sequence.

Getting The Sequences

While viewing the structure in FirstGlance, in the Molecule Information Tab, under Chain: click Sequences. There you can get
  1. The sequence of each chain in the experimental protein, and separately
  2. The full-length genomic sequence from UniProt.
In UniProt, click the blue Sequences button (orange arrow). Notice the full length (red arrow). Click the FASTA button (magenta arrow) to copy the FASTA format of the sequence, which is used by alignment programs.

Be sure to take a look at the PTM/Processing section (green arrow). Here you will often find a signal peptide, or leading methionine, that is removed from the mature protein -- and it is the mature protein that is crystallized.

Aligning The Sequences

Here are two methods for aligning protein sequences.

Methods for Aligning Protein Sequences
Method Pros Cons
I. UniProt (online) ✦Very Easy ✦Loses organism and gene names
✦Clustal Omega Algorithm Only
II. Jalview ✦Keeps organism and gene names
✦Choose alignment algorithm
(MAFFT, TCOFFEE, MUSCLE, CLUSTAL)
✦A bit fussier to use (still easy)

Alignment Method I. UniProt

Alignment Method II. Jalview

Step by step, illustrated instructions are in the Help for MSAReveal.Org.

Displaying the Alignment

Personal opinion: Although I love the simplicity of aligning sequences with Jalview, I find its alignment display too complicated, too small and hard to read, and lacking in some features that I want. Nor was I able to find another MSA display program that met my needs. These are the reasons why I created MSAReveal.Org. (If you wish to pursue Jalview, there are many tutorials at YouTube.)

Here are two methods for displaying a protein multiple sequence alignment.

Displaying Multiple Sequence Alignments
Method Pros Cons
I. MSAReveal.Org ✦Displays organism and gene names
✦Touching a residue pops up its 3-letter name and sequence number
✦Optionally specify where to start numbering each sequence
✦Search for sequence fragment - works despite gaps
✦Sequence fragment search accepts ambiguous residues
✦Sequence fragment search counts hits, links to each occurrence
✦Touching the consensus row gives frequencies in that column
✦When identity is >50% identical, difference can be highlighted
✦Comments per alignment, per sequence displayed on touch

✦Reports % identity, freq. and % aromatics, charges, Cys Gly His Pro
✦Report sortable on any column
✦Touch report column headings for explanation
✦Options for compact display for slides or figures
✦All methods explained
✦Numerous errors detected & reported
✦Built-in demos and tests
✦Slightly fussier to use (still very easy)
II. UniProt (online) ✦Display is automatic after alignment completed
✦Help is provided
✦Option to color signal peptide
✦Display omits organism and gene names
✦No sequence fragment search
✦No help with 1-letter amino acid codes
✦Sequence numbers hard to find (counting along rows)

Display Method I. MSAReveal.Org

An illustrated overview of MSAReveal displays is provided in its help under What will MSAReveal do for you?. Note the many "pros" in the table above.

You can display your MSA in MSAReveal regardless of whether the alignment was done by Jalview or UniProt.

Jalview saves the FASTA-format alignment in a text file. You simply open that file, copy and paste the alignment into the box at MSAReveal.Org.

UniProt offers to display the alignment in FASTA format. You simply copy that and paste it into the box at MSAReveal.Org. But caution: the organism and gene names are stripped out of the FASTA headers.

Display Method II. UniProt.Org

Here is a UniProt sequence alignment display, which it displays automatically upon completion of an alignment. There are two dozen checkboxes for annotations and amino acid properties (not shown). Checking "signal peptide" has colored that region pink. The percentage identity (here 62.4%) is given in the Result information section (not shown), provided Result info is checked near the upper left (not shown).
The symbols below each column are explained in the sequence alignment help as follows:
Suggestions for improvement? Please contact