SEE CRISPR-Cas
The prototype type V effector Cpf1 (subtype V-A) contains only one nuclease domain (RuvC-like) that is identifiable by sequence analysis. However, analysis of the recently solved structure of (from Acidaminococcus sp. BV3L6, 5b43) has revealed a second nuclease domain, the fold of which is unrelated to HNH or any other known nucleases. In analogy to the HNH domain in Cas9, the , and it is responsible for cleavage of the target strand.[1][2]
Screening of microbial genomes and metagenomes for undiscovered class 2 systems has resulted in the identification of three novel CRISPR-Cas variants. These include subtypes V-B and V-C, which resemble Cpf1 in that their predicted effector proteins contain a single, RuvC-like nuclease domain. Cleavage of target DNA by the type V-B effector, denoted C2c1, has been experimentally demonstrated.[3]
Subtype V-A (Cpf1)
Crystal Structure of Cpf1 in Complex with Guide RNA and Target DNA[4]
Cpf1 is an RNA-guided endonuclease of a type V CRISPR-Cas system that has been recently harnessed for genome editing. The crystal structure of Acidaminococcus sp. Cpf1 (AsCpf1) in complex with the guide RNA and its target DNA at 2.8 A˚ resolution was reported. AsCpf1 adopts a bilobed architecture, with the RNA-DNA heteroduplex bound inside
the central channel. The structural comparison of AsCpf1 with Cas9, a type II CRISPR-Cas nuclease, reveals both striking similarity and major differences, thereby explaining their distinct functionalities. AsCpf1 contains the RuvC domain and a putative novel nuclease domain, which are responsible for cleaving the non-target and target strands, respectively, and for jointly generating staggered DNA double-strand breaks. AsCpf1 recognizes the 5'-TTTN-3' protospacer adjacent motif (PAM) by base and shape readout mechanisms. These findings provide mechanistic insights into RNA-guided DNA cleavage by Cpf1 and establish a framework for rational engineering of the CRISPR-Cpf1 toolbox.
The microbial adaptive immune system CRISPR-Cas helps bacteria and archaea defend themselves against the invasion of foreign nucleic acids. The CRISPR-Cas systems encompass arrays of direct repeats that are separated by unique spacers derived from foreign DNA. The repeat arrays are transcribed into long transcripts (precursors of CRISPR RNAs), which are then processed to yield small CRISPR RNAs (crRNAs), consisting of a spacer and a portion of the adjacent direct repeat. The crRNAs form a complex with Cas endonucleases, and in some cases with accessory Cas proteins as well, and serve as guidesto target and cleave the cognate foreign nucleic acid, thus achieving interference. DNA recognition by Cas-crRNA complexes requires the presence of a protospacer adjacent motif (PAM) near the target site, which contributes to self versus non-self discrimination.
Recently, a second class 2 (type V) effector protein, Cpf1, has been harnessed for genome editing. Similar
to Cas9, Cpf1 can be reprogrammed to target DNA sites of interest through complementarity to a guide RNA. However, Cpf1
possesses several unique features that distinguish it from Cas9 and could provide for a substantial expansion of the genome
editing toolbox. First, Cpf1 is guided by a single crRNA, whereas Cas9 uses a crRNA and a second small RNA species, a transactivating crRNA (tracrRNA). Second, Cpf1 recognizes a T-rich PAM, in contrast to the G-rich PAM favored by Cas9. Third, Cpf1 generates staggered ends in its PAM-distal target site, whereas Cas9 creates blunt ends within the
PAM-proximal target site. Fourth, Cpf1 contains the RuvC domain but lacks a detectable second endonuclease domain, whereas Cas9 uses the HNH and RuvC endonuclease domains to cleave the target and non-target DNA strands, respectively. Together, these observations imply major differences in the target DNA recognition and cleavage mechanisms between Cas9 and Cpf1.
To clarify how Cpf1 recognizes and cleaves DNA targets, the crystal structure of Acidaminococcus sp. Cpf1 (AsCpf1) in complex with the crRNA and its double-stranded DNA target containing the 5'-TTTN-3' PAM was determined. AsCpf1 adopts a
bilobed architecture that accommodates the crRNA-target DNA heteroduplex in the central channel. AsCpf1 recognizes the crRNA scaffold and the 5'-TTTN-3' PAM in structure- and sequence-dependent manners. AsCpf1 contains a RuvC endonuclease
domain and a putative novel nuclease domain, which are located at positions suitable to induce staggered DNA double-strand
breaks. The structural comparison of AsCpf1 with Cas9 reveals both striking structural similarity and substantial
differences between the two class 2 effector proteins, thus explaining their distinct functionalities and suggesting their functional convergence.
Overall Structure of the AsCpf1-crRNA-Target DNA Complex
The overall structure of the (from Acidaminococcus sp. BV3L6, 5b43). The structure revealed that consisting of an α-helical and a , with the bound to the . . The , whereas the . The , and the . play functional roles similar to those of the WED (Wedge) and PI (PAM-interacting) domains of Cas9 (see CRISPR-Cas9), respectively, although the two domains of AsCpf1 are structurally unrelated to the WED and PI domains of Cas9. is involved in DNA cleavage (described below). Thus, domains A, B, and C are referred to as the WED, PI, and Nuc domains of Cas9, respectively. The in the Cpf1 sequence. The comprises seven α-helices and a β-hairpin. The that form the endonuclease active center. A characteristic helix (referred to as the bridge helix, ) is located between the RuvC-I and RuvC-II motifs and connects the REC and NUC lobes. The is inserted between the RuvC-II and RuvC-III motifs.
Structure of the crRNA and Target DNA
. The crRNA consists of the 24-nt guide segment (G1–C24) and the 19-nt scaffold (A(19)–U(1)) (referred to as the 5' handle). The nucleotides G1–C20 in the crRNA and dC1–dG20 in the target DNA strand form the 20-bp RNA-DNA heteroduplex. The nucleotide A21 in the crRNA is flipped out and adopts a single-stranded conformation. No electron density was observed for the nucleotides A22–C24 in the crRNA and dT21–dG24 in the target DNA strand, suggesting that these regions are flexible and disordered in the crystal structure. The nucleotides dG(10)–dT(1) in the target DNA strand and dC(10*)–dA(1*) in the non-target DNA strand form a duplex structure (referred to as the PAM duplex).
Recognition of the 5' Handle of the crRNA
The . The U(1),U(16) base pair in the 5' handle is recognized by the WED domain in a base-specific manner. . The , respectively.
Recognition of the crRNA-Target DNA Heteroduplex
The crRNA-target DNA heteroduplex is accommodated within the positively charged, central channel formed by the and is recognized by the protein in a sequence-independent manner. The PAM-distal and PAM-proximal regions of the heteroduplex are recognized by the , respectively. domain. in the bridge helix, which interact with the sugar-phosphate backbone of the target DNA strand, are conserved among the Cpf1 family members. Notably, the sugar-phosphate backbone of the nucleotides G1–A8 in the crRNA forms multiple contacts with the and domains, and the base pairing within the 5-bp is important for Cpf1-mediated DNA cleavage. The side chain of forms a stacking interaction with the C20:dG20 base pair in the heteroduplex and thus prevents base pairing between A21 and dT21.
Recognition of the 5'-TTTN-3' PAM
The PAM duplex adopts a distorted conformation with a narrow minor groove, as often observed in AT-rich DNA, and is bound to the . The PAM duplex is recognized by the , respectively. Lys607 in the PI domain is inserted into the narrow minor groove and plays critical roles in the PAM recognition. The O2 of dT(2*) forms a hydrogen bond with the side chain of Lys607, whereas the nucleobase and deoxyribose moieties of dA(2) form van der Waals interactions with the , respectively. Structural observations can explain the requirement of the third T in the 5'-TTTN-3' PAM. The 5-methyl group of dT(3*) forms a van der Waals interaction with the , respectively. The 5-methyl group of dT(4*) is surrounded by the side-chain methyl groups of . Notably, the N3 and O4 of dT(4*) form hydrogen bonds with the N1 of dA(4) and the N6 of dA(3), respectively. Together, these results demonstrate that AsCpf1 recognizes the 5'-TTTN-3' PAM via a combination of base and shape readout mechanisms. Thr167 and Lys607 are conserved throughout the Cpf1 family, and Lys548, Pro599, and Met604 are partially conserved. These observations indicate that the Cpf1 homologs from diverse bacteria recognize their T-rich PAMs in similar manners, although the fine details of the interaction could vary.
The RuvC-like Endonuclease and a Putative Second Nuclease Domain
The comprises a typical RNase H fold, consisting of a five-stranded mixed β-sheet (β1–β5) flanked by three α-helices (α1–α3), and two additional α-helices and three β-strands. The conserved, negatively charged residues . Notably, the is inserted between strand β3 and helix α1 in the RNase H fold and interacts with the REC2 domain. The main-chain carbonyl group of . In addition, Trp958 in the RuvC domain is accommodated in the hydrophobic pocket formed by . These residues, with the exceptions of Leu467 and Ala521, are highly conserved among the Cpf1 family members, and the W958A mutant exhibited reduced activity. These observations highlight the functional importance of the bridge helix-mediated interaction between the REC and NUC lobes. The crystal structure revealed the presence of the Nuc domain, which is inserted between the RuvC-II (strand β5) and RuvC-III (helix α3) motifs in the RuvC domain. The . The Nuc domain comprises five α-helices and nine β-strands and lacks detectable structural or sequence similarity to any known nucleases or proteins. Notably, the conserved polar residues . The S1228A mutant showed DNA cleavage activity comparable to that of wild-type AsCpf1. In contrast, the D1235A mutant exhibited reduced activity, whereas the R1226A mutant showed almost no activity, indicating that Arg1226 is critical for DNA cleavage. Further characterization revealed that the R1226A mutant acts as a nickase that cleaves the non-target DNA strand, but not the target strand, indicating that the Nuc and RuvC domains cleave the target and non-target DNA strands, respectively. The mutations of the catalytic residues in the AsCpf1 RuvC domain abolished the cleavage of both DNA strands, suggesting that the cleavage of the non-target strand by the RuvC domain is a prerequisite for the target strand cleavage by the Nuc domain, presumably via a conformational change in the complex.
Other representatives of Cpf1 complex (Subtype V-A)
- Cpf1 complex from Acidaminococcus sp. BV3L6 (5kk5)[5].
- from Lachnospiraceae bacterium ND2006 5id6[6].
Subtype V-B (C2c1)
Structural basis of stringent PAM recognition by CRISPR-C2c1 in complex with sgRNA[7]
Class 2 CRISPR effector protein, C2c1 (classified as type V-B), has been identified to cleave DNA under the guide of crRNA:tracrRNA, distinct from a type V-A effector protein Cpf1 (type V-A, see above) that only requires a single crRNA. Furthermore, C2c1 and Cpf1 recognize different PAM sequences. Like Cpf1, C2c1 contains a conserved RuvC endonuclease domain, though it harbors a second endonuclease domain that is not well defined by sequence. C2c1 has been proved to be endonuclease-active in human cell lysates. The mechanism underlying C2c1-mediated cleavage remains elusive.
The overall structure of the (5wti, from Bacillus thermoamylovorans) is a composed of an α-helical and a . The a PAM-interacting (PI) domain, a REC1 domain, a REC2 domain, and a long α helix referred to as the bridge helix (BH). The an OBD domain, a RuvC domain, and a domain with unknown functions (termed “UK” domain).
The sgRNA consists of a (C1-U19), a (C(−18)-A(−24), and U(−57)-G(−61)). The guide segment and 19 nucleotides of the target DNA strand (dG(1′)-dA(19′)) form the , whereas the 9 nucleotides of the target DNA strand (dG(−1′)-dA(−9′)) and the non-target DNA strand (dC(−1*)-dT(−9*)) form a .
The in the NUC lobe, composed by , interfaces with the in the REC lobe to form a . The other side of the heteroduplex is recognized by the REC2 domain. The , whereas the . The negatively charged sgRNA:target DNA heteroduplex is accommodated in the . Recognition of the sgRNA:target DNA heteroduplex by BthC2c1 is mainly through interactions between sugar-phosphate backbone and the protein. The , whereas the sugar-phosphate backbone of the target DNA sequence (dT(13′)-dA(19′)) complementary to that of PAM-distal guide segment is extensively recognized by the . The repeat:anti-repeat duplex containing an anticipated base-pairing segment (U(−6):G(−25)-G(−13):C(−18)) and an unanticipated base-pairing segment (C(−1):G(−61)-A(−5):U(−57)), is recognized by domains. The 5′-ATTC-3′ . The OBD domain consists of a β-sheet barrel flanked by four short α-helices, whereas the PI domain is composed of a bundle of four α-helices connected by linkers and loop PL1 (Ser129-Arg143). The loop PL1 deeply inserts into the minor groove of PAM duplex and interacts with the target and non-target DNA strands. from the loop PL1 hydrogen-bonds with the sugar-phosphate backbone. The sugar-phosphate backbone of PAM is recognized by via hydrogen-bonding interactions.
To map the DNA cleavage site of BthC2c1, Sanger sequencing was performed to analyze the DNA ends of the cleaved products of in vitro cleavage reactionsWe found that BthC2c1-cleaved DNA products had a 7-nt 5′ overhang, differing from the blunt
DNA cleavage mode of Cas9. This staggered double-stranded cleavage occurred after the 16th nucleotide on the non-target strand and after the 23rd nucleotide on the target strand distal to the PAM sequence. The BthC2c1 cleavage site on the target strand is located outside the guide:target heteroduplex segment. This is distinct from Cas9 and Cpf1 (see above), both of which cleave the target strand within the guide:target heteroduplex segment. Interestingly, the target strand cleavage mode of BthC2c1 resembles that of C2c2 (CRISPR type VI), although C2c2 digests crRNA-guided RNA substrates.
Other representatives of C2c1 complex (Subtype V-B)
- (5wqe)[8].
- (5wqe). The crRNA segment is shown in red and the tracrRNA segment in green.