Jmol/Visualizing large molecules

From Proteopedia

Jump to: navigation, search

Half-capsid of human hepatitis B virus displaying only the alpha carbon atoms for the biological assembly of 2g33.

Drag the structure with the mouse to rotate

This page was written in 2011 and needs major revisions to take into account (i) the 2022 ability of FirstGlance in Jmol to automatically simplify and display very large biological units; and (ii) the ability of JSmol to generate biological units. Please see:

Eric Martz 22:17, 13 November 2022 (UTC)

Contents

Inadequate Memory May Preclude Display

Some molecular models ("molecules") are so large that they will not fit within the default amount of computer memory allocated to Jmol (which is the default amount of memory allocated to java). While it is possible to increase the memory allocated to java, most users will not do this, and hence, will not be able to display, in Proteopedia or Jmol, molecules that exceed a certain size.

Solutions

Below are explained various strategies for reducing the sizes of large PDB files, enabling their main features to be displayed in the default Jmol/java memory. These strategies include displaying only the backbones (alpha carbons for proteins and phosphorus atoms for nucleic acids), and displaying one, or a subset, of the models in multiple-model files. These "reduced" files can be uploaded for use in molecular scenes in Proteopedia. An example is shown in the Jmol at the upper right corner of this article.

Maximum Size Per Model

99,999 Atoms

Strictly speaking, the format of PDB files is limited to 99,999 atoms in a single model, because there are only 5 columns allocated to atom serial numbers. (Files in the mmCIF format can be read by Jmol, and do not suffer from this limitation.) 3cc2 is a model of a large ribosomal subunit containing 99,049 atoms (close to the limit for a single PDB file). Most likely it will display in Jmol when you go to that page. Jmol ignores the atom serial number, columns 7-11 in the PDB file, instead assigning its own atomIndex number, unique for each atom, and not redundant between models. Jmol can handle PDB files containing >100,000 atoms.

This limitation requires that models containing >=100,000 atoms be split into two or more PDB files, or else represented as artificially separated models in a single PDB file. These work-arounds are awkward for visualization. An example is the combination of portions of the two files 1jgo and 1giy for visualization of a complete Ribosome.

Image:RatLiverVault2.png

Image:RatLiverVault1.png

Rat Liver Vault, alpha carbon atoms only.

Example with 241,956 Atoms: Rat Liver Vault

The rat liver vault needed to be split into 3 PDB files: 2zuo, 2zv4, and 2zv5. Each file contains 80,652 atoms in 13 chains, for a total in the asymmetric unit of 241,956 atoms in 39 chains (A-Z, a-m). The biological unit contains 2 asymmetric units. Fortunately, the authors provide PDB files containing complete asymmetric units. However, these are 18 megabyte files, and do not fit in Jmol/java default memory. The methods explained below will enable you to visualize an asymmetric unit as alpha carbon atoms only (31,668 atoms). First, run the Jmol application. Now try these commands (the load command will take about a full minute):

load http://www.protein.osaka-u.ac.jp/olabb/tsukihara/mvp/mvp_39mer.pdb filter "*.ca"
color chain

62 Chains

In the final update of the PDB data format specification (Version 3.3) and the current remediation of PDB data, chain IDs (names) must be single alphanumeric characters (A-Z, a-z, 0-9). This permits a maximum of 62 chains. This limit is not much of a problem for asymmetric units. In January, 2011, there is only one PDB entry with 62 chains (2zkr), and 4 more with 55-60 chains.

Generally, the first 26 chains are given IDs A-Z. Above 26, it is apparently arbitrary whether numerals or lower case letters are used first. For example, for the 28 chains in 3krd or 3hln or 3gpt, those beyond A-Z are 1-2. Alternatively, for the 28 chains in 3lo3, the extra two are identified a-b, and in the 42-chain 3jqo, lower case ID's are present but no numerals. Also, when numerals are used, they may begin with 1, or with 0 (3fic). Occasionally, the letters A-Z are not used up before lower case ID's are employed: 1tzn has 28 chains with ID's A-O and a-o. 7sya has 12 chains a-l, with no chains having upper case names.

Jmol can automatically apply a distinct color to each chain, up to 36 chains (Jmol Colors). However, it can distinguish 62 chains by selection (see set chainCaseSensitive).

Chains Longer Than 10,000 Amino Acids

The number of non-hydrogen atoms in the average amino acid in a protein is about 8. Where did this value come from?

The average molecular weight of an amino acid, weighted by amino acid frequencies in proteins, is 110[1]. Half of the atoms in protein are hydrogen[2], and the other half are mostly carbon (12), with some oxygen (16) and nitrogen (14). So if we take 13 as the average weight of a non-hydrogen atom, and average that with 1 for the other 50% of the atoms (hydrogen), we get (13 + 1)/2 = 7 as the approximate molecular weight of the average atom in protein. 110/7 is about 16 atoms for the average amino acid in protein. But half of those are hydrogen, missing from most PDB files. So the number of non-hydrogen atoms in the average amino acid is about 8.

Since the maximum number of atoms in a single model in a PDB file is 99,999 (see above), dividing by 8 non-hydrogen atoms per amino acid gives a maximum of about 12,500 amino acids in a single model in a single PDB file (containing nothing but protein and no hydrogen atoms). In fact, longer chains can be represented if only the alpha-carbon atoms are present in the PDB file.

The PDB files containing the longest chains are listed at Believe It or Not!.

Multiple Model Files

The largest PDB files in the Protein Data Bank are those containing multiple models of large molecules. Since the atom serial numbers start at 1 in each model, these files can get very large (>1,000,000 atoms is possible). An example is 3ezb, which contains 40 models (determined by solution NMR). Each model contains 5,323 atoms (including 2,694 hydrogen atoms); the 40 model file contains 212,920 atoms, and the PDB file is 16.5 megabytes in size. When you visit the page 3ezb, the ensemble will fail to display, producing an "out of memory" error (unless you have allocated more than the default amount of memory to java on your computer).

There are files in the PDB several-fold larger than 3ezb. For example, 2hyn is a 64 megabyte file containing 826,896 atoms in 184 models. In January, 2011, the largest PDB file in the Protein Data Bank is 2ku2, containing nearly one million atoms, with a file size of 100 megabytes. It consists of fifty models (determined by solution NMR), each of which has seven chains and nearly 26,000 atoms.

Displaying Only The First Model

Jmol can be instructed to load only the first model of a multiple-model PDB file. This is best done with the Jmol application (outside of Proteopedia). Later, the single model could be uploaded to Proteopedia for use in a scene.

  • Demonstrate Out Of Memory: Type the following command into the white console window:
load =2hyn
The equal sign tells Jmol to obtain the PDB file from the Protein Data Bank. A red "OutOfMemory" error message should appear in Jmol in less than 30 seconds (depending on the speed of your Internet connection).
  • Load The First Model: Type the following two commands into the white console window:
zap
load models {1 1 1} =2hyn
In less than 30 seconds, the first model from the ensemble in 2hyn should appear in Jmol.
  • Save The First Model: Type this command:
write pdb 2hyn_model1.pdb
Now you should find a new file 2hyn_model1.pdb in your working folder. You can load it with this command:
load 2hyn_model1.pdb
You can also upload it to Proteopedia for use in molecular scenes generated with Proteopedia's SAT.

Displaying Only Alpha Carbon Atoms

With large multiple-chain assemblies, or multiple-model ensembles, typically you want to see only the backbone traces. Backbone traces can be visualized from only the alpha carbon atoms (or for nucleic acids, the phosphorus atoms). There are several methods for discarding all atoms except alpha carbons, listed under Help:Uploading_molecules#Additional_considerations_for_large_files. Below, we will describe the use of the Jmol application to do this.

Jmol can extract ("filter") specified atoms from the PDB file, thereby saving memory. For example, 2hyn contains 4,494 atoms/model (half of which are hydrogen atoms), and 184 models, totaling 826,896 atoms. There are 260 alpha carbon atoms/model, or a total of 47,840 atoms. The alpha carbons represent less than 6% of the original atoms, or a nearly 20-fold reduction in memory requirements.

Using the Jmol application from your working folder (see instructions), enter this command:

load =2hyn filter "*.ca"

"*.ca" means "all carbon alpha atoms". After about a full minute (depending on the speed of your Internet connection), a backbone trace of the first model will appear, which means that loading and filtering are complete. These commands will display the backbone traces for all 184 models:

frame all
color chain

If you wish, you can save the alpha-carbon atom models:

write pdb 2hyn_ca_only.pdb

This file could be uploaded to Proteopedia for use in the SAT. Here are instructions.

If your molecule contains nucleic acid, you will also want the nucleic backbone traces. The command (for PDB code 2o5i, 52,717 atoms including protein, DNA and RNA) is

load =2o5i filter "*.ca, *.p"
Alpha carbons for 16 of 184 models from 2hyn.
Alpha carbons for 16 of 184 models from 2hyn.

Displaying Alpha Carbons For A Subset Of Models

Suppose that you want the alpha carbons for a subset of models in the published ensemble. You can get 16 models from the 184 models in 2hyn by taking either the first 16

load models {1 16 1} =2hyn filter "*.ca"

or by taking every 12th model plus the last model

load models {1 184 12} =2hyn filter "*.ca"

Biological Assemblies

The functional forms of molecules, often called biological units or biological assemblies, may contain many copies of the chains present in the published PDB file (the asymmetric unit).

Virus Capsids

An extreme example is a virus capsid. The capsid of the Simian Virus 40 (SV40) contains 360 copies of the VP1 protein chain, present in 6 copies in the published PDB file 1sva. An extremely simplified model of the capsid is displayed at SV40_Capsid_Simplified, but this model is oversimplified for some purposes, and required special techniques to construct. We can get the full capsid model from any of several servers. We recommend getting it from the ViperDB at Scripps, a server specialized in virus capsid structures.

The full SV40 capsid model (360 copies of the VP1 protein chain, minus hydrogen atoms) is a PDB file of 70 megabytes, much too large for default java/Jmol memory. Below are instructions for getting the much smaller alpha carbon atom model. These instructions should work for most virus capsids.

SV40 Capsid Alpha Carbons

  • Get the Address of the Capsid Structure: Go to ViperDB, and submit the PDB code 1sva. Right click on the link full capsid, then copy link location. For 1sva, it is http://viperdb.scripps.edu/OLIGOMERS/1sva_full.vdb.gz.
  • Display The Alpha Carbons: In the Jmol application, enter these commands
load http://viperdb.scripps.edu/OLIGOMERS/1sva_full.vdb.gz filter "*.ca"
color chain

It may take about a minute for the first command to work. After the capsid appears, enter the second command -- it may take half a minute. The resulting display will include 123,420 atoms, which is about the maximum that Jmol can display in the default java memory.

If the SV40 full capsid display fails, try the half-capsid model (also available from a link at ViperDB). You will probably want to look at the half capsid anyway, as that shows the inside better than using Jmol's slab command.

Human Heptatitis B Capsid Alpha Carbons

A smaller virus capsid is human hepatitis B, 2g33. Its half-capsid (17,460 atoms) is displayed near the top of this page.

Non-Capsid Biological Assemblies

This section is incomplete and remains under construction. Eric Martz 00:52, 3 January 2011 (IST)

References

  1. Average molecular weight of an amino acid is about 138. When the average is weighted according to the occurrences of amino acids in proteins, it is about 128. Subtracting 18 for the weight of water removed when a peptide bond is formed, the average is 110. This is explained in Lehninger Principles of Biochemistry.
  2. There are approximately 1.01 hydrogens per non-hydrogen atom in proteins. The source of this value is given in the article Hydrogen in macromolecular models.

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Wayne Decatur

Personal tools