Cas9 is a large multifunctional protein that plays a central role in the CRISPR-Cas adaptive defense mechanism found in a vast amount of bacteria and archaea [1]. It accomplishes this through the use of antisense RNAs which serve as signatures from past viral invasions [2]. The adaptive immunity occurs in three stages: insertion of invading DNA into CRISPR locus, transcription of precursor crRNA from CRISPR locus that will be used to generate crRNA that matches its target sequence for 20 nucleotides, and crRNA-directed cleavage of foreign nucleic acids by Cas9. PAM (protospacer adjacent motif) sequences must be present adjacent to the crRNA-targeted sequence to be cleaved [1]. In addition to the crRNA, Cas9 incoporates another RNA chain that serves to anchor the crRNA to the protein. This tracrRNA is partially complimentary to a piece of the crRNA and interacts with an arginine-rich alpha helix to anchor both pieces of RNA to cas 9 [3]. Just in the last few years, this defensive mechanism and the Cas9 protein has been used to develop genome engineering applications. TracrRNA:crRNA has been replaced by an engineered single guide RNA (sgRNA) that maintains the two main features of the RNA: the complementary 20-nucleotide long sequence at the 5' end and the double-stranded anchor at the 3' end to bind to cas9 [1]. The programmable Cas9 protein is then used to create double-stranded breaks in genomic DNA, at which points the genetic sequence can then be altered.
Overall Structure
The Cas9 protein complex has a seahorse shaped structure that is composed of 11 cas subunits. Cascade (CRISPR-associated complex for antiviral defense) is from the type-I CRISPR-cas system, and the crystal structure of this surveillance complex gives insight into the overall structure of Cas9. The body is comprised of six subunits (Cas7.1-7.6) wrapped around the crRNA in a helical filament with a dimer of Cse2 in the center [4]. The head of the Cas7 body is capped by Cas6e and the 3' end of the crRNA while the 5' end and Cas5 cap the tail. The N-terminal end of Cse1 is also at the tail and the C-terminal end contains a bundle of 4 helices that contact Cse2.2. The Cse2 dimer, Cas7 filament, and four-helix bundle of Cse1 form a groove immediately next to guide region of the crRNA. This is where the ssDNA target fits into the complex [4].
The groove in which the ssDNA target fits is not formed until cas9 undergoes a conformational change upon association with a target dsDNA. The arginine-rich alpha helix to which tracrRNA binds serves as a hinge between the structural lobes of the overall structure. The conformational change is thought to take part in the R-loop formation that unwinds the target dsDNA and allows for interactions between crRNA and its complementary section [1].
DNA Interactions with PAM
The PAM sequence has been shown to be critical to inducing DNA binding, as Cas9 is unable to recognize even fully complementary sequences without it [1]. Upon formation of the substrate-protein complex, the nuclease and helical recognition lobes of Cas9 and the target ssDNA form a four-way junction straddling the arginine-rich alpha helix mentioned previously [5]. Nucleotides on either side of the PAM containing region (-1 to -8 on the target strand and +1 to +8 on the target strand) are base paired, and strand separation begins at the first base pair on the target strand (+1). The kink formed from the strand separation places the PAM sequence in a positively-charged groove known as the PAM-interacting domain [5].
The 5'-NGG- 3' PAM with the trinucleotide on the non-target strand is crucial for loading target DNA into Cas9, as the guanine bases participate in base-specific hydrogen bonding with [5]. The residues are extended into the major groove by a beta-hairpin in the C-terminal domain of Cas9. These two arginines have been mutated to alanine residues to test the necessity of the formed hydrogen bonds. Upon substitution of either one, the target DNA binding in vitro is substantially reduced [5]. The same NGG trinucleotide is not required in the complementary target strand because the target-strand nucleotides complementary to the PAM sequence are not recognized by major groove interactions. This explains why some mismatches are tolerated in the PAM sequence, as long as the guanine residues are present in the non-target strand. The Cas9 sequence motif that contains the needed arginine residues has been found in various species with type-II A Cas9 [5].
The minor groove of the target DNA and PAM sequence interacts with Ser 1136 through a water-mediated hydrogen bond [5]. The interaction helps orient the target DNA so that it can base pair with sgRNA/crRNA. The +1 phosphate in the target DNA strand forms hydrogen bonds with Glu 1108 and Ser 1109 through its oxygen atoms. This helps form the phosphate lock loop, allowing the guide RNA to begin base pairing with the target RNA [5].
DNA Interactions with HNH and RuvC Nuclease Domains
The HNH nuclease domain is responsible for cleaving the DNA strand complementary to the RNA guide. The is composed of a beta-beta-alpha metal fold made up of three secondary structures (forming a super secondary structure) and a magnesium ion [6]. The important residues include Asn 863, Asp 839, His 840, and Asn 854.The 3'-5' phosphate bond is cleaved by a water molecule activated by the histidine residue. The oxygen from the water performs a nucleophilic attack on the phosphate,as the magnesium ion is coordinated with the phosphate, making it more electrophilic. The other three active site residues coordinate with the magnesium ion as well, using their side chains [6].
The RuvC nuclease domain is responsible for cleaving the DNA strand not complementary to the RNA guide. This nuclease contains a , which is unsurprising since it is responsible for cleaving single-stranded DNA [6]. The active site residues include His 983, Asp 10, Asp 986, and Glu 762. Mutation of any of these residues results in loss of catalytic function [6]. The mechanism of DNA cleavage is similar to that of the HNH domain, with the His residue activating a water molecule for nucelophilic attack and the side chains of the other three residues coordinating with the magnesium ion [6].