Phillips Academy Computer-Aided Protein Visualization Lab
From Proteopedia
Introduction to Computer-Aided Protein Visualization Lab
Computer-Aided Protein Visualization LabKnowing the three-dimensional structure of a protein can be a very powerful tool for biologists. Much can be learned about enzyme function, interaction of molecules in your immune system, the appearance of the surface of viruses, and the interaction of ligands and receptors. One particularly key area of current research is the design of drugs against specific protein targets. Scientists may look for drugs to block the activity of an enzyme of an attacking bacteria, protist or virus. After first finding an enzyme that is slightly different between humans and the invading species, the scientists can then use computers to look at the enzyme and try to fit tens of thousands of compounds into the enzyme in such a way as to block its activity; most commonly this involves plugging the active site of the invading species enzyme selectively. Using computers to analyze this problem can speed up the screening of 50,000 potential drugs from many years down to one week! Once a few compounds with potential are approved by the computer, the scientist can look to chemically modify those compounds to make them even better and then try them out in drug trials against the enzyme in test tubes, and eventually in drug trials in animals and humans. First some background: (make sure that you understand the bold words)Proteins are synthesized on ribosomes by linking together many amino acids into a long chain. If you could observe a protein as it is made, it would look like a string of pearls (amino acids) feeding out the end of the ribosome as it floats in the cytoplasm of the cell (Video of Translation (DNALC)). This structure is called the primary structure (or 1° structure) and refers to the sequence of amino acids of the protein. After protein synthesis has started, the sequence of amino acids will begin to fold into a 3-dimensional structure. This process is called protein folding: Protein folding: The most important rule about protein structure is that it is determined by the primary sequence of the protein. Protein folding is a complicated multi-step process. The first step results in the secondary structure (or 2° structure) of the protein. Secondary structures come in two flavors: alpha helices and beta sheets (or beta-pleated sheets). Alpha helices are spiral staircase structures (see structure 1 below), and beta-pleated sheets are flat regions where the amino acids run back and forth next to each other in long ribbons (see structure 2 below). These two structures form spontaneously based on the shape/hydrophobicity/charges of the amino acids and are held together by hydrogen bonds. The protein will now look like a string of pearls with twists or zig-zags at intervals along its length. For all of the protein structures you will visualize below, once you click on the green link, the structure will appear in the structure window on the right side of the page. In the structure window, click on "Popup" button to open a larger popup window of this structure. You can toggle the spin of the structure on or off by clicking on the "Spin" button. Clicking and holding on the structure in the window will allow you to manipulate the structure, rotating in three-dimension. On the right side of this window you will our first example of a protein represented in three dimensions. This is protein G from the Streptococcal bacterium....a small and very simple polypeptide that binds to antibodies and messes up their organization such that their ability to further activate an immune response is hampered. Secondary Structure: As you can see in this cartoon representation of protein G, there are two main sub-structures (secondary structure) of this protein. In red is the alpha helix, while a beta sheet is in gold. The regions linking the alpha helix and beta sheets together are called turns or linking regions (in white) and are not considered to be discrete secondary structures since they are not tightly structured and tend to be floppy. 1.Alpha helix Here you can see the alpha helix of protein G in red and in ball and stick representation. The beta sheet is gold, in cartoon representation. Now . Here the alpha helix is completely isolated. The rest of the protein is hidden. The amino acid backbone (the parts of the amino acids that are linked together by a peptide bond to form the primary sequence) is shown in red. The amino acid side chains are shown in tan (each type of amino acid has its own unique side chain, one of 20 different types). 2.Beta sheet Here you can see the beta sheet of protein G gold, ball and stick representation. The alpha helix is red and in cartoon representation. Now . Here it the beta sheet is completely isolated. The rest of the protein is hidden. The amino acid backbone is in gold, the side chains in light blue.
Tertiary structure: The second step of protein folding results in the tertiary structure (or 3° structure). Tertiary structure gives the protein an overall three-dimensional structure. The tertiary structure of a protein is determined by a combination of factors including hydrogen bonds, ionic bonds (between positively and negatively charged amino acids), covalent bonds disulfide bonds (between cysteine residues), and Van der Waals interactions. Tertiary structure can also be affected by repulsive forces between similarly charged amino acids, as well as hydrophobic and hydrophilic interactions with a solvent (commonly water). At a distance many proteins form what look to be large globs at this point, and it is only upon more careful and close up inspection that one can see the true uniqueness of the shape.
Proteins may contain only alpha helices, only beta sheets, or a combination of the two. The same holds true for the bonds giving a protein its tertiary structure - all, some or none may be present. These different folding patterns existing in different proteins are what give the proteins their distinctive shapes and sizes. A protein that is 300 amino acids long will be 100 nm as an extended chain. If the protein is an alpha helix, it will be 45 nm long; a beta sheet will be 7 x 7 x 0.8 nm; and a small globular form will form a sphere only 4.5 nm in diameter! Domains: Parts of the secondary and tertiary structures of a protein are usually arranged to form domains, functional units associated with a particular structure. For example, a pair of alpha helices situated side by side might form a binding site, or a particular folding pattern might form the active site of an enzyme, where it binds to its substrate, or the site at which it binds to a coenzyme such as NAD+. The structure of the domain (though not necessarily the exact amino acid sequence) is frequently preserved in different proteins from the same organism that have a similar function (to move phosphate groups, for instance). Domains are also conserved in proteins from different species that have the same function (such as hemoglobins for oxygen transport or cytochromes in the electron transfer system of mitochondria). Variations in the amino acid sequences in similar domains (or in the nucleotide sequences or genes that code for the proteins) give important clues about evolutionary relationships between organisms. Individual domains are sometimes found (but not always, a fact that makes this a very controversial topic) contained within single exons of eukaryotic genes. In other words, a single exon might represent all of the protein coding sequence required to generate a functional domain within the context of the whole protein structure. This finding has implications for the evolution of eukaryotic genes, since it implies that new proteins can be generated by simply duplicating preexisting protein domain encoding exons and recombining them into new combinations (a process known as exon-shuffling). Thus, a vast variety of proteins with new functions can be generated from preexisting genes, allowing great evolutionary flexibility. Looking at the genes of many eukaryotic organisms shows that this is exactly what appears to happen. A good example of all of these principles can be found in immunoglobulins (see figure 4). They are protein molecules that form one of the main lines of defense against foreign organism invasion of the body and are part of the humoral immune response (this is the branch of the immune system that is activated when you are given a vaccine). They are made up of four subunits: two identical heavy chains and two identical light chains. Each is synthesized as an individual protein and then later complexed into the complex secondary, tertiary and quatenary structure you see below. Immunoglobulins are divided into several domains, including the 2 variable domains on the tips of the “Y” arms and are involved in binding specific antigens, and the constant domains which make up the rest of the molecule. The constant domains serve to determine the type of antibody (IgG, IgM, IgA, etc) the molecule represents and to mediate the response of the immune system to the antibody tagged antigen. Each of these domains is defined by it’s own exon within the immunoglobulin gene structure. Figure 4: Immunoglobulins. Three views of the immunoglobulin complex IgG.
Determining the 3-Dimensional Structure of a ProteinScientists can use several techniques to observe the folding of a protein. (1) The most commonly used technique is called x-ray crystallography. This technique requires the scientist to form crystals of the protein of interest - very similar to how you can form sugar crystals by dangling a string in a super-saturated sucrose solution! The crystal is then bombarded with x-rays, and the diffraction pattern of the x-rays is recorded on film. By analyzing the diffraction pattern, the spacing of atoms in the protein can be determined. Rosalind Franklin also used this technique on DNA crystals; her diffraction pictures were in turn used by James Watson and Francis Crick to determine the double-helix shape of DNA. (2) A second technique used is NMR, or Nuclear Magnetic Resonance, (also called MRI in medicine). In this technique proteins are placed in a magnetic field. The resonance frequency of the field can be varied. Different atoms in different chemical environments will absorb maximally at different frequencies. By viewing a spectrum of absorbance vs. resonance frequency, it is possible to specify the identity of atoms and their location with the protein. This technique is particularly useful where it can detect movement in molecules as proteins fold and/or as they bind with other molecules. (3) A third technique that has been developed is the use of computers to simulate protein folding strategies. Originally programs were developed to allow scientists to predict the structural effect of a relatively small change in a protein sequence. The computer will look at the three-dimensional structure, as determined by x-ray crystallography or NMR, of a closely related protein (a homologue from another species or a slight variant from the same species) and predict what the effect of the amino acid changes would be. This process is done by having the computer determine the "lowest energy configuration" of the protein - or simply put, which folding of the protein puts the least stress on the molecule. It looks to make sure that two amino acids will not be pushing into each other, that two similarly charged amino acids will not be opposing each other, that hydrogen bonds and disulfide bonds are formed where they can be, etc. New programs aim to predict the three-dimensional structure of proteins from scratch - where no known homologue has ever been studied. This technique is quite powerful because forming crystals of many proteins is hard, if not impossible. Instead, these programs start at the same point that protein folding starts in the cell. They take the primary sequence of the protein and look for the correct sequences of amino acids to form alpha helices and beta-pleated sheets. Once these are in place, the program searches through for tertiary structures that obey the "lowest energy configuration" rules. Viewing the 3D structure of a proteinThis now brings us back to where we started at the top of page 1 - once we know the structure, how can we look at it? There are many computer programs in existence to visualize proteins in three-dimensions. For the drug design studies, powerful computers and programs are necessary to analyze the energetics of drug fit. (Simulating structures and their interactions is a powerful “weeding out” tool in deciding which drugs to test in laboratory studies, making the development of new drugs more efficient and less costly.) For this lab you will be working with a very commonly used computer modeling program known as Jsmol to look at "pdb" files of proteins with a known structure (using the methods outlined above). Jsmol is an open-source JavaScript viewer for chemical structures in 3D: http://wiki.jmol.org/index.php/JSmol. We are using the website proteopedia.org as a wiki host site to house and access these exercises.
Preliminary QuestionsAnswer the next few questions to review your organic molecule knowledge and to get up to speed quickly.
Using Jmol through proteopedia.orgA VERY BRIEF GUIDE TO USING Jmol Logon to your computer, then navigate to the following website: All of the structures hosted these sites initially are viewable in the small window on the right side of the page. However, if you click on "popup" in the lower left of that window, a larger window containing the structure will become visible. This window can be resized to any size by dragging the window edges to desirable dimensions.
TO VIEW SELECTED GROUPS (carbon backbones or side chains of amino acids): Make sure the protein is not spinning (click on "toggle spin" in the bottom left corner of the image window to stop spinning). Then, simply move your cursor to any portion of the molecule you want to identify. After a second or two, a small window should popup containing a chain of information--a three letter code (amino acid) followed by a number (the position of the amino acid in the protein primary structure), a chain designation (indicating which protein chain/subunit you are pointing at) and then the element you are pointing at (C,H,N,O,P,S,Zn,Mg,Ca,etc). Here are the structures you will want to examine1.Protein 1: MHC Class I
|