Structure of the protein homologues: 15.5kD, Snu13, and L7Ae
Introduction
The human protein 15.5kD and its yeast (Snu13p) and archaeal (L7Ae) homologues function in the processing of pre-ribosomal RNA as part of the box C/D and H/ACA small ribonucleoprotein particle (sRNP – archaea) or small nucleolar ribonucleoprotein particle (snoRNP – eukarya) nucleotide modification complexes (s(no)RNPs)[1][2]. In addition, 15.5kD and Snu13p function in U4 small nuclear ribonucleoprotein particle (snRNP) spliceosomal biogenesis[2]. The capability to function in dual roles lies in the ability to recognize a helix-bulge-helix (kink-turn) RNA motif that is present in the different RNPs[3]
A variation of the kink-turn motif, known as the kink-loop motif, can be found in the C/D and H/ACA RNAs [1]. Interestingly, the eukaryotic proteins and their archaeal homologue do not interact with the different motifs in the same manner, even though share a conserved sequence similarity [3]. For example, while L7Ae exhibits the same binding affinity for both the kink-turn and kink-loop sRNA motifs, its eukaryotic homologues only bind specifically to the kink-turn motif and discriminate against the kink-loop motif [3][1].
Solved structures of the proteins include:
Role in pre-ribosomal RNA processing
Ribosomes consist of both RNA and protein, and are designated large ribonucleprotein (RNP) particles. Each ribosome contains two subunits (60S and 40S), four ribosomal RNAs (5S, 5.8S, 18S, and 25/28S rRNA), and approximately 75 associated proteins [4]. The processing of the pre-rRNAs requires a complex set of posttranscriptional modification steps after transcription [4]. One such step involves extensive processing through pseudouridylation and 2’-O-ribose methylation at sites specified by various s(no)RNAs (C/D box s(no)RNAs specify 2’-O-ribose methylation and H/ACA s(no)RNA specify pseudouridylation) and associated proteins to form s(no)RNPs [4][5]. Specifically, the 5’ region of U3 s(no)RNA containing C’/D and B/C box pairs interacts with 5’-ETS and 17S/18S areas of the pre-rRNA[5]. U3 also binds a set of proteins to form the U3 s(no)RNP complex [1].
Snu13p/15.5kD/L7Ae interacts with U3 s(no)RNA through a kink-turn RNA motif [4]. The protein initiates box C/D assembly by binding the kink-turn of the C/D RNAs [1]. Once the s(no)RNP is fully assembled the RNA regions bind to complementary regions in target pre-rRNA. This is followed by catalysis of the methyl transferase reaction by the associated proteins [1].
Role in pre-messenger RNA splicing
The processing of pre-mRNA takes place through the use of a large dynamic machine known as the spliceosome, through which introns are removed and exons are spliced together to create a mature mRNA[6][7]. The spliceosome is comprised of five snRNA molecules (snRNAs U1, U2, U4, U5, and U6) and over one hundred associated proteins[6][7]. Assembly of the spliceosome is thought to take place in a stepwise manner around the pre-mRNA transcript[6][7]. The first step involves recognition of the 5’ splice site by U1 snRNP, followed by recognition of the branch point sequence by U2 snRNP[6][7]. From this point the tri-snNRP consisting of U4/U6•U5[6][7] together with the five snRNPs form the precatalytic spliceosome, which must undergo a series of changes before it can actively splice[8][7][9][6].
U4 snRNA has a 5’ stem-loop containing a kink-turn that has been shown to interact with 15.5kD [10]. There is evidence to suggest that 15.5kD plays a role in late stage spliceosomal assembly, prior to splicing catalysis [10]. In addition, it may be involved in binding other proteins that have been found to indirectly associate with U4 snRNA such as 61k (Prp31p in yeast), as well as the 20/60/90k complex which interacts with the U4/U6 duplex [10]. The homologues for 60k and 90k in yeast are Prp4p and Prp3p respectively; there is no yeast homologue for 20k.
Structure of 15.5kD in complex with U4 snRNA fragment
Overall structure
15.5kD exhibits a globular domain structure, characterized by an α-β-α fold [2]. The partial U4 snRNA oligonucleotide makes contacts with the protein at a pocket through nucleotide U31, which is located in the internal loop. The portion of oligonucleotide not contacting the protein, folds into two double helices meeting to form a loop at the junction, where both are capped by a purine. The RNA fold is stabilized through multiple hydrogen bond interactions and base stacking. The overall structure of the protein and RNA is such that there is little interaction between the RNA and protein, leaving much of the proteins surface area exposed for interactions with other components in the U4 snRNP [10].
15.5kD structure
15.5kD is 128 amino acids long and folds into a compact α-β-α sandwich, which is the most common family of protein folds, and resembles the L30 ribosomal protein. The β-sheetcontains four β-strands, one parallel and three antiparallel, and are ordered , , , and . To one side of the β-sheet α helices are closely packed, and on the other side are . The residues 63-66 of form a 310 helix that is important in RNA binding; helix α2 also contains residues important for binding[10].
U4 snRNA fragment structure
The U4 snRNA fragment contains nucleotides of the full length 5’stem loop. The oligonucleotide forms into two distorted A-form RNA stems (stem 1 and stem 2) that are connected by an asymmetric 5+2 internal loop ( originate from the 3’ strand and originate from the 5’ strand). The internal loop has a complex fold, where from sequential G-A base pairs, and the other three are left unpaired. Of the unpaired nucleotides is flipped out, and the other two act as purine caps by stacking onto A44 of stem two, and the G45-C28 base pair of stem 1, respectively. Between two of the G-A base pairs G32-A44 and G43-A33, the helix is overwound and causes cross stand stacking of the two adenines , leaving the guanines displaced. One of the unpaired adenines also participates in the cross stacking. The overall structure represents the kink-turn motif[10].
In addition to base stacking and pairing interactions that help stabilize the RNA structure, there is a network of hydrogen bond interactions that also contribute. These predominantly involve ribose 2’OH groups and nitrogen or phosphate atoms. The 2’OH of A44, A29, A33, U31, G32, and G43 hydrogen bond, or are within hydrogen bonding distance, of A30(N6), A44(N1), G45(N3), A30(P), G43(N2), and A44(P) respectively[10].
Protein-RNA interactions
The residues of 15.5kD that play a main role in RNA binding, through interactions with the 5+2 internal loop, include those located in α2, α4, β1, and loops β1-α2, β2-α3, and α4-β4. There are four major interactions; the first involves the flipped out U31 nucleotide; the second, the sequential G-A base pairs; the third, the unpaired and stacked adenines; and the fourth, the electrostatic interactions[10].
The flipped out U31 nucleotide
As previously mentioned the residue U31 is flipped out, which makes it an optimal site for RNA-protein interactions. It is located in pocket formed by the residues . Multiple hydrogen bonds and Vander walls interactions are formed between U31 and the four residues. The O4 and 3-imino group of U31 form a hydrogen bond with the main chain amide and main chain oxygen of Glu61, respectively. O4 of U31 hydrogen bonds to the amino group of Lys86, and the U31 phosphate hydrogen bonds to the main chain amide of Ile 100. In addition, the base of U31 is in Vander walls contact with the hydrophobic regions of the Ile65, Ile100, and Lys 86. U31 also forms a hydrogen bond with a residue not found in the pocket through its phosphate to the main chain oxygen of Ala39[10].
The G-A base pairs
Of the G-A base pairs G32-A44 and G43-A33, bases G32 and G43 interact with residues Asn40, Glu41, and Lys44, found in α2 and loop β2-α1, through their exposed atoms in the major groove. Atoms within hydrogen bonding distance include N1, N2, and O6 of G32 to the carboxylate group of Glu41, N7 and O6 of G43 to the ɛ-amino group of Lys44, O6 of G32 to the main chain amide of Asn40, and N7 of G32 to the ND2 of Asn40. The adenines (A33 and A44) that base pair with the guanines (G32 and G43) do not form close interactions with the protein themselves[10].
The unpaired and stacked adenines
The adenines that do not form base pair interactions (A29 and A30) participate in hydrophobic interactions with 15.5kD. This takes place through their side regions which are not in contact with the base pairs of stem 1 (A29) and stem two (A30). A29 packs with Arg97 in the α4β4 loop, and A30 packs with Val95 in loop α4-β4 and Lys37 on loop β1-α2[10].
The electrostatic interactions
The RNA carries with it a negative charge due to the phosphate backbone. This charge is stabilized by several basic residues in the 15.5kD protein. The negative charges of phosphates C42 and A29 are stabilized by residues Lys44 and Agr97 respectively. Residues Arg36, Arg48, and Lys37 are in close proximity to the RNA backbone (7-8Å) and help contribute to the electrostatic state of the RNA-protein complex[10].
Structure comparison between 15.5kD, Snu13p, and L7Ae homologues
Structurally, Snu13p and 15.5kD are more similar than either to L7Ae; however, they exhibit different binding affinities to cognate RNAs[5][3]. The eukaryotic proteins exhibit very specific binding (ie. will only bind to RNA with the kink-turn motif), whereas their archaeal homologue does not (ie. will bind to RNA exhibiting either the kink-turn or kink-loop motifs)[5][3]. Differences in structure between the archaeal and eukaryotic proteins lie in the α2-β2 loop and the β4-α6 loop which do not directly bind DNA, however have been shown to contribute to the structural integrity of the RNA binding elements[3]. In the eukaryotic proteins there is the addition of two amino acids that creates further hydrogen bonding between β2 and β4, which in turn may provide further stabilization to α2[3]. The N-terminus also carries an area of structural differentiation; preceding α1 in L7Ae is a random coil, whereas in 15.5kD and Snu13p there is a β-strand which may participate in further stabilization of the protein[3]. Overall, the structures are very similar, such that the small differences do not seem likely to contribute to their differential binding.
The structure itself may not the most important aspect when comparing the homologues, rather the amino acid composition. There are five amino acids located at the RNA binding region that are conserved within each of archaea and eukarya, however vary between the two. One such amino acid lies towards the N-terminal side of the RNA binding region, in L7Ae it is Lys26 (Methanocaldococcus jannashii), and in 15.5kD it is Gln34. Towards the C-terminal side of the RNA binding region located in loop 9 lie the four remaining residues Leu-Glu-Aal-Ala (L7Ae) and (15.5kD). It is the difference between these amino acids that allow L7Ae to bind the kink-loop motif[5].