User:Jeremiah C Hagler/Protein 1
From Proteopedia
Introduction to Computer-Aided Protein Visualization Lab
Computer-Aided Protein Visualization LabKnowing the three-dimensional structure of a protein can be a very powerful tool for biologists. Much can be learned about enzyme function, interaction of molecules in your immune system, the appearance of the surface of viruses, and the interaction of ligands and receptors. One particularly key area of current research is the design of drugs against specific protein targets. Scientists may look for drugs to block the activity of an enzyme of an attacking bacteria, protist or virus. After first finding an enzyme that is slightly different between humans and the invading species, the scientists can then use computers to look at the enzyme and try to fit tens of thousands of compounds into the enzyme in such a way as to block its activity; most commonly this involves plugging the active site of the invading species enzyme selectively. Using computers to analyze this problem can speed up the screening of 50,000 potential drugs from many years down to one week! Once a few compounds with potential are approved by the computer, the scientist can look to chemically modify those compounds to make them even better and then try them out in drug trials against the enzyme in test tubes, and eventually in drug trials in animals and humans. First some background: (make sure that you understand the underlined words)Proteins are synthesized on ribosomes by linking together many amino acids into a long chain. If you could observe a protein as it is made, it would look like a string of pearls (amino acids) feeding out the end of the ribosome as it floats in the cytoplasm of the cell (Video of Translation (DNALC)). This structure is called the primary structure (or 1° structure) and refers to the sequence of amino acids of the protein. After protein synthesis has started, two choices are possible: (1) if the protein is destined to be secreted or to reside in an organelle of the secretory pathway, the first twenty or so amino acids will comprise a signal sequence. These act to direct the ribosome to the endoplasmic reticulum (ER) where the protein will be fed through a channel in the membrane into the interior of the ER. Once inside the ER, the protein will fold and receive sugar modifications called glycosylations. (2) if the protein is destined to remain in the cytoplasm or move to the mitochondria or nucleus, the ribosome will remain free in the cytoplasm. The protein would then be folded as it emerges from the ribosome. Protein folding: The most important rule about protein structure is that it is determined by the primary sequence of the protein. Protein folding is a complicated multi-step process. The first step results in the secondary structure (or 2o structure) of the protein. Secondary structures come in two flavors: alpha helices and beta sheets (or beta-pleated sheets). Alpha helices are spiral staircase structures (see structure 1 below), and beta-pleated sheets are flat regions where the amino acids run back and forth next to each other in long ribbons (see structure 2 below). These two structures form spontaneously based on the shape/hydrophobicity/charges of the amino acids and are held together by hydrogen bonds. The protein will now look like a string of pearls with twists or zig-zags at intervals along its length. 1.
Proteins may contain only alpha helices, only beta sheets, or a combination of the two. The same holds true for the bonds giving a protein its tertiary structure - all, some or none may be present. These different folding patterns existing in different proteins are what give the proteins their distinctive shapes and sizes. A protein that is 300 amino acids long will be 100 nm as an extended chain. If the protein is an alpha helix, it will be 45 nm long; a beta sheet will be 7 x 7 x 0.8 nm; and a small globular form will form a sphere only 4.5 nm in diameter! Domains: Parts of the secondary and tertiary structures of a protein are usually arranged to form domains, functional units associated with a particular structure. For example, a pair of alpha helices situated side by side might form a binding site, or a particular folding pattern might form the active site of an enzyme, where it binds to its substrate, or the site at which it binds to a coenzyme such as NAD+. The structure of the domain (though not necessarily the exact amino acid sequence) is frequently preserved in different proteins from the same organism that have a similar function (to move phosphate groups, for instance). Domains are also conserved in proteins from different species that have the same function (such as hemoglobins for oxygen transport or cytochromes in the electron transfer system of mitochondria). Variations in the amino acid sequences in similar domains (or in the nucleotide sequences or genes that code for the proteins) give important clues about evolutionary relationships between organisms. Individual domains are sometimes found (but not always, a fact that makes this a very controversial topic) contained within single exons of eukaryotic genes. In other words, a single exon might represent all of the protein coding sequence required to generate a functional domain within the context of the whole protein structure. This finding has implications for the evolution of eukaryotic genes, since it implies that new proteins can be generated by simply duplicating preexisting protein domain encoding exons and recombining them into new combinations (a process known as exon-shuffling). Thus, a vast variety of proteins with new functions can be generated from preexisting genes, allowing great evolutionary flexibility. Looking at the genes of many eukaryotic organisms shows that this is exactly what appears to happen. A good example of all of these principles can be found in immunoglobulins (see figure 4). They are protein molecules that form one of the main lines of defense against foreign organism invasion of the body and are part of the humoral immune response (this is the branch of the immune system that is activated when you are given a vaccine). They are made up of four subunits: two identical heavy chains and two identical light chains. Each is synthesized as an individual protein and then later complexed into the complex secondary, tertiary and quatenary structure you see below. Immunoglobulins are divided into several domains, including the 2 variable domains on the tips of the “Y” arms and are involved in binding specific antigens, and the constant domains which make up the rest of the molecule. The constant domains serve to determine the type of antibody (IgG, IgM, IgA, etc) the molecule represents and to mediate the response of the immune system to the antibody tagged antigen. Each of these domains is defined by it’s own exon within the immunoglobulin gene structure. Figure 4: Immunoglobulins. Three views of the immunoglobulin complex IgG.
Determining the 3-Dimensional Structure of a ProteinScientists can use several techniques to observe the folding of a protein. (1) The most commonly used technique is called x-ray crystallography. This technique requires the scientist to form crystals of the protein of interest - very similar to how you can form sugar crystals by dangling a string in a super-saturated sucrose solution! The crystal is then bombarded with x-rays, and the diffraction pattern of the x-rays is recorded on film. By analyzing the diffraction pattern, the spacing of atoms in the protein can be determined. Rosalind Franklin also used this technique on DNA crystals; her diffraction pictures were in turn used by James Watson and Francis Crick to determine the double-helix shape of DNA. (2) A second technique used is NMR, or Nuclear Magnetic Resonance, (also called MRI in medicine). In this technique proteins are placed in a magnetic field. The resonance frequency of the field can be varied. Different atoms in different chemical environments will absorb maximally at different frequencies. By viewing a spectrum of absorbance vs. resonance frequency, it is possible to specify the identity of atoms and their location with the protein. This technique is particularly useful where it can detect movement in molecules as proteins fold and/or as they bind with other molecules. (3) A third technique that has been developed is the use of computers to simulate protein folding strategies. Originally programs were developed to allow scientists to predict the structural effect of a relatively small change in a protein sequence. The computer will look at the three-dimensional structure, as determined by x-ray crystallography or NMR, of a closely related protein (a homologue from another species or a slight variant from the same species) and predict what the effect of the amino acid changes would be. This process is done by having the computer determine the "lowest energy configuration" of the protein - or simply put, which folding of the protein puts the least stress on the molecule. It looks to make sure that two amino acids will not be pushing into each other, that two similarly charged amino acids will not be opposing each other, that hydrogen bonds and disulfide bonds are formed where they can be, etc. New programs aim to predict the three-dimensional structure of proteins from scratch - where no known homologue has ever been studied. This technique is quite powerful because forming crystals of many proteins is hard, if not impossible. Instead, these programs start at the same point that protein folding starts in the cell. They take the primary sequence of the protein and look for the correct sequences of amino acids to form alpha helices and beta-pleated sheets. Once these are in place, the program searches through for tertiary structures that obey the "lowest energy configuration" rules. Viewing the 3D structure of a proteinThis now brings us back to where we started at the top of page 1 - once we know the structure, how can we look at it? There are many computer programs in existence to visualize proteins in three-dimensions. For the drug design studies, powerful computers and programs are necessary to analyze the energetics of drug fit. (Simulating structures and their interactions is a powerful “weeding out” tool in deciding which drugs to test in laboratory studies, making the development of new drugs more efficient and less costly.) For this lab you will be working with a very commonly used computer modeling program known as Jsmol to look at "pdb" files of proteins with a known structure (using the methods outlined above). Jsmol is an open-source JavaScript viewer for chemical structures in 3D: http://wiki.jmol.org/index.php/JSmol. We are using the website proteopedia.org as a wiki host site to house and access these exercises.
Preliminary QuestionsAnswer the next few questions to review your organic molecule knowledge and to get up to speed quickly.
Using Jmol through proteopedia.orgA VERY BRIEF GUIDE TO USING Jmol Logon to your computer, then navigate to the following website: All of the structures hosted these sites initially are viewable in the small window on the right side of the page. However, if you click on "popup" in the lower left of that window, a larger window containing the structure will become visible. This window can be resized to any size by dragging the window edges to desirable dimensions.
TO VIEW SELECTED GROUPS (carbon backbones or side chains of amino acids): Make sure the protein is not spinning (click on "toggle spin" in the bottom left corner of the image window to stop spinning). Then, simply move your cursor to any portion of the molecule you want to identify. After a second or two, a small window should popup containing a chain of information--a three letter code (amino acid) followed by a number (the position of the amino acid in the protein primary structure), a chain designation (indicating which protein chain/subunit you are pointing at) and then the element you are pointing at (C,H,N,O,P,S,Zn,Mg,Ca,etc). Here are the structures you will want to examine1.Protein 1: MHC Class I
|