From Proteopedia
proteopedia linkproteopedia link Theoretical Model:
The protein structure described on this page was determined theoretically, and hence should be interpreted with caution. |
|
Overview
Introduction
The DnaC protein in E. coli is part of the DNA polymerase complex of proteins, which includes the replisome and primosome, and which replicates genomic DNA. DnaC is believed to be involved in loading the DnaB helicase, which separates the strands of the DNA so that each can be copied by the DnaG polymerase.
3D Structure: Homology Model
No empirical (X-ray crystallographic) 3D structure for the E. coli DnaC protein (UniProt P0AEF0) is available in November, 2012, although one or more might become available. In view of this, homology models were constructed using the automated Swiss-Model server[1][2]. In 2008 (when this article was largely written and the molecular scenes were prepared), Swiss-Model deemed the only usable template[3] for the homology model to be the crystal structure of a "putative primosome component" from Streptococcus pyogenes (2qgz) determined by the Northeast Structural Genomics Consortium, "to be published". In 2012, after some changes to the Swiss-Model server, it chose a different template, producing a very similar homology model. This second template was a crystal structure of the DnaC helicase loader of Aquafex aeolicus (3ecc)[3]. The agreement between the models built upon two templates, which templates have only 27% sequence identity with each other, gives confidence that fold and topology of the models are likely to be correct. Furthermore, the two homology models had identical registrations of sequence with structure (data not shown). Nevertheless, because the sequence identity between the templates and the target E. coli DnaC is only ~20%, there may be some error in the registration of the E. coli DnaC sequence with the model structure. Further, the positions of sidechains in homology models are generally unreliable.
We thank the authors of 2qgz for releasing their structure data at the Protein Data Bank prior to full publication.
The molecular scenes below in this article utilize the 2008 model templated upon 2qgz. As mentioned above, this model is very similar to the 2012 model templated upon 3ec2.
Intrinsically Unstructured N-Terminus
The FoldIndex server predicts that the amino terminus of DnaC residues 1-76 will be intrinsically unstructured, that it, it will not participate in the stable fold of the remainder of the protein.
Viewing and Download
In addition to the interactive scenes below, the homology models can be downloaded from the Proteopedia server:
Conclusions from Homology Model
The following analysis utilizes the homology model templated on 2qgz. The model templated on 3ecc is very similar, with identical sequence-to-structure registration (not shown). When the two homology models are structurally aligned, 115 alpha carbon atoms can be aligned with RMS deviation of 1.4 Å (not shown).
The homology model () represents 75% of the full length E. coli DnaC sequence, omitting 54 N-terminal residues, and 8 C-terminal residues. Note that the N-terminal 76 residues are predicted to be intrinsically unfolded (see above). Several surface loops in the model (shown translucent white) have high uncertainty, since these are missing in the template (see below). Note that the homology model is somewhat unreliable about which residues are actually on the surface vs. largely buried, and hence the conclusions below are tentative.
Amino Terminus |
|
|
|
|
|
|
|
Carboxy Terminus |
Evolutionary Conservation
of highly conserved residues are apparent.
- Large conserved patch. The larger of the two conserved patches is adjacent to the positively charged patch, and includes Arg237 and His114, which are highly conserved (ConSurf level 9). The larger patch also includes highly conserved surface residues Asn113, Asp169, Glu170, Asn203. (There are also three highly conserved surface glycines in this patch, not listed because surface glycines are typically conserved for reasons of secondary structure rather than function.) These highly conserved residues (ConSurf level 9) are flanked by several conserved residues (ConSurf level 8), including Ile62, Asn73, and Thr110. Conserved (level 8) residues are uncommon elsewhere on the surface (except for the other conserved patch).
- Results of a 2012 ConSurf analysis[4] (not shown) agreed quite well with the above results from the 2008 analysis[5]. All the residues highly conserved in the 2008 analysis were highly conserved (ConSurf level 9) in the 2012 analysis, except Asn113, which dropped to ConSurf level 8. The three ConSurf level 8 residues in 2008 achieved ConSurf level 9 (maxumum conservation) in the 2012 analysis. The 2012 ConSurf coloring script is listed below.
- Small conserved patch. This patch consists of highly conserved (ConSurf level 9) residues Arg216, Asp219, Arg220, flanked by conserved (ConSurf level 8) residue Asp189. Results of the 2012 analysis[4] (not shown) were in near perfect agreement, except that Asp189 achieved level 9.
- The [6] are observed on the 2.7 Å crystal structure of DnaC from Aquifex aeolicus (3ecc), where the larger conserved patch is the binding site for ATP/ADP.
Charge Distribution
: The 2008 model displays a patch, about 20 x 30 Å, containing six positively charged amino acids and no negative charges. The patch is near the bottom of this scene. Such a patch would be suitable for interaction with e.g. anionic phosphates in ATP/ADP or DNA. The positive charges are Arg55, Arg59, Arg63, Arg126, Lys128, and Arg237. His66 and His114 are near one end of this patch.
Cationic (+) / Anionic (-)
However, this cationic patch is not seen in the 2012 model, which contains an uncharged patch in the same location ringed by negative charges (not shown). Three of the charged-patch residues in the 2008 model are near the N-terminus (Args 55, 59, 63). But the 2012 model omits these residues, starting at Pro64. Furthermore the sidechains of Arg126, Lys 128 and Arg237 point away from this region in the 2012 model.
Homology Model Construction
Steve Sandler kindly provided the following sequence for DnaC from E. coli (Uniprot P0AEF0, DNAC_ECOLI):
MKNVGDLMQR LQKMMPAHIK PAFKTGEELL AWQKEQGAIR SAALERENRA
MKMQRTFNRS GIRPLHQNCS FENYRVECEG QMNALSKARQ YVEEFDGNIA
SFIFSGKPGT GKNHLAAAIC NELLLRGKSV LIITVADIMS AMKDTFRNSG
TSEEQLLNDL SNVDLLVIDE IGVQTESKYE KVIINQIVDR RSSSKRPTGM
LTNSNMEEMT KLLGERVMDR MRLGNSLWVI FNWDSYRSRV TGKEY
This sequence (245 amino acids) was submitted to Swiss Model in 2008, which generated the homology model shown here () using 2qgz chain A as a template, which has 18.6% sequence identity. Apparently Swiss Model used predicted secondary structure to help in the sequence alignment, but details are not clear to me[7]. The homology model represents residues 55-237 (183 residues representing 75% of DnaC), shown in boldface in the above sequence. Because of the low sequence identity, this model may well contain significant errors, especially in registration[8].
In 2008, Swiss Model apparently used the temperature value field in the PDB file to indicate regions that are highly unreliable, namely the regions that are red when the model is . (This was no longer true in 2012.) These regions are shown as translucent white in the initial scene (using the Jmol command select temperature >50). The uncertainty in three of these regions is explained by gaps in the template model (see below). Although the details of these regions are even more uncertain than other regions, it seems likely that these loops are on the surface, if the homology model turns out to be substantially correct.
As indicated above, in 2008, Swiss-Model found only one usable template for homology modeling, despite the existence of an empirical 3D crystal structure for DnaC with a slightly higher sequence identity.
Gaps in the Template Model
The template was 2QGZ (). The portion of the template used was Glu107-Arg300. Only the amino-terminal 6 residues were not used as template (translucent). Note that there are in this segment of the template that lack coordinates due to disorder in the crystal (marked with spacefilled alpha-carbon atoms).
The missing loops are 202-205 (NGSV), 226-231 (EQATSW), and 268-275 (TIKGSDET). These gaps, which occur between the residues marked /\ below, were apparently ignored in making the model, which has a continuous main chain.
Confirmation of Homology Model By Related Structures
When the PDB is searched with the DnaC sequence, the best match (December, 2008) is 23% sequence identity with 183 amino acids in the DnaC helicase loader of Aquifex aeolicus, 3ec2 and 3ecc. In order to find whether these structures have the same fold as the template (2qgz with 19% sequence identity to E. coli DnaC) used for the homology model, 2qgz with 3ec2[9]. The similarity of folds lends considerable confidence to the homology model of E. coli DnaC. This was further confirmed by the 2012 Swiss Model run, when 3ecc was selected as the best template (see discussion above).
The second best sequence-identity hit in the PDB is 39% identity with 54 amino acids (positions 9-63 of chain A) of replication factor C (2chg), which align with 72-124 of DnaC. When the above homology model of DnaC (made with template 2QGZ) is with residues 9-63 of 2CHG[10], 43 alpha carbons (out of 54) aligned with RMS deviation 2.3 Å. Residues 21-63 of 2CHG aligned with residues 80-124 of the DnaC homology model. (Non-aligned portions are pastel.) This result adds firther confidence to this region of the homology model, since the structural alignment of 2CHG:A21-63 occurred in the same range as the sequence alignment (which was 72-124 in DnaC).
Download the above structural alignments:
Crystal Structure of DnaC Is "In The Pipeline"
A sequence-based search at the international Structural Genomics TargetDB reveals that the closest completed structure is 2qgz, the one chosen by SwissModel as a template. (3ec2 and 3ecc were not determined by a structural genomics project.) A number of crystal and NMR structures have sequence identities up to 37% but over shorter stretches, and with higher E values.
Diffraction data have been obtained (but the solved structure not yet deposited) for a Listeria monocytogenes sequence of 307 residues, pI 5.2, with an E value of 1.6e-05, though only 21% sequence identity. Diffraction-quality crystals (but not yet diffraction data) have not been obtained for any sequence with such a low E value.
E. coli DnaC (245 residues, pI 9.4) has been crystallized by RIKEN Structural Genomics Initiative (Japan), but the crystals may not be of diffraction quality. It has been cloned, expressed as a soluble protein, and purified (but not yet crystallized) by 3 Structural Genomics Groups (RIKEN Structural Genomics Initiative (Japan), Montreal-Kingston Bacterial Structural Genomics Initiative, Midwest Center for Structural Genomics), as have several proteins with >40% sequence identity.
Thus, there is reason for optimism that either a crystal structure, or a more suitable template for homology modeling, might be forthcoming.
DnaC helicase loader 3D structures
DnaC helicase loader
Additional Resources
For additional information, see: DNA Replication, Repair, and Recombination
For additional information, see: Nucleic Acids
|
Templates for 2008 Homology Modeling of E. coli DnaC (245 amino acids)
Name | PDB Code (Resolution) | Released | Length (amino acids)a | Template alignment lengtha: range (%) | Target alignment lengtha: range (%) | Aligned Sequence Identity | Expectations | Swiss Model Result
|
Putative Primosome Component Streptococcus Pyogenes | 2qgz (2.4 Å) | Jul 24 2007 | 183 (308) | 174:107-292 (95%) [sm] | (183): 55-237 (75%) [sm] | 18.6% [sm]; 19.7% [tdb] | 3.4e-28 [sm]; 0.00027 [tdb]; >10 [pdbB]; 0.0028 [pdbF] | DnaC modeled from 2qgz chain A
|
DnaC helicase loader Aquifex aeolicus | 3ec2 (2.7 Å) | Nov 25 2008 | 175 (180) | 174: 6-179 (95%) [pdbB] | (163): 68-230 (67%) [pdbB] | 23.5% [pdbB] | 0.00059 [pdbB] | "Alignment is not good enough for Modelling"
|
Sources: Swiss-Model [sm]; targetdb.pdb.org [tdb]; pdb.org using a BLAST search [pdbB], or a FASTA search [pdbF].
(a) Lengths not in parentheses are for crystallographic results, and are counts of amino acids with coordinates; they exclude disordered residues ("gaps" in the model). Lengths in parentheses are for the target sequence of DnaC, or sequences of the crystallized protein (from SEQRES in the PDB file).
Below is the alignment produced by Swiss Model, used in making the 3D model. Vertical bars for identity were inserted by hand (I may have missed some).
| | | | ||
TARGET 55 R TFNRSGIRPL HQNCSFENYR VECEGQMNAL SKARQYVEEF
2qgzA 100 qkqaais--e riqlvslpks yrhihlsdid vnnasrmeaf saildfveqy
TARGET sssss h h hhhhhhh hhhhhhhhh
2qgzA hhh h sss h h hhhhhhh hhhhhhhhh
| | || || | | |
TARGET 96 DGN-IASFIF SGKPGTGKNH LAAAICNELL L-RGKSVLII TVADIMSAMK
2qgzA 148 psaeqkglyl ygdmgigksy llaamahels ekkgvsttll hfpsfaidvk
TARGET ssss ss hhh hhhhhhhhhh h h ssss sshhhhhhh
2qgzA ssss ss hhh hhhhhhhhhh hh ssss sshhhhhhh
|| | | || |
TARGET 144 DTFRNSGTSE EQLLNDLSNV DLLVIDEIGV QTESKYEKVI INQIVDRRSS
2qgzA 198 naiske---- --eidavknv pvlilddiga vrde-----v lqvilqyrml
/\ / \
TARGET hhh ssssss hhhhhhhhhh
2qgzA hh h ssssss hhhhhhhhhh
| | ||| | | |
TARGET 194 SKRPTGMLTN SNMEEMTKLL ---GERVMDR MRLGNSLWVI FNWDSYR
2qgzA 247 eelptfftsn ysfadlerkw awqakrvmer vr-ylarefh leganrr-
/\
TARGET h ssssss hhhhh hhhh hh ssssss s
2qgzA h ssssss hhhh hhhh hh hh ssss s
Below is the sequence with ATOM records (coordinates) from 2QGZ, numbered 100-300, showing the gaps as "...". This sequence listing was used to locate the positions marked /\ above.
1 .......... .......... .......... .......... ..........
51 .......... .......... .......... .......... .........Q
101 KQAAISERIQ LVSLPKSYRH IHLSDIDVNN ASRMEAFSAI LDFVEQYPSA
151 EQKGLYLYGD MGIGKSYLLA AMAHELSEKK GVSTTLLHFP SFAIDVKNAI
201 S....KEEID AVKNVPVLIL DDIGA..... .VRDEVLQVI LQYRMLEELP
251 TFFTSNYSFA DLERKWA... .....WQAKR VMERVRYLAR EFHLEGANRR
(Copied from Protein Explorer's sequence display.)
Below is the alignment of full-length DnaC with 2QGZ according to TargetDB (see above). Note that the 2QGZ structure begins at residue 100, and so the homology model begins with residue 55 of DnaC, indicated with > below.
ID: DR58 Center: NESGC
E-value: 0.00028 Identity: 19.737%
10 20 30
Query MKNVGDLMQRLQKMMPAHIKPAFKTGEELLAWQKEQGA
Q+ Q P++I +++ + + +
Subjct EVASFISQHHLSQEQINLSLSKFNQFLVERQKYQLKDPSYIAKGYQPILAMNEGYADVSY
40 50 60 70 80 90
40 50 > 60 70 80 90
Query IRSAALERENRAMKMQRTFNRSGIRPLHQNCSFENYRVECEGQMNALSKARQYVEEF-DG
+++ L + ++ +++ ++ ++ +++ + + V+ ++M+A+S ++VE++ ++
Subjct LETKELVEAQKQAAISERIQLVSLPKSYRHIHLSDIDVNNASRMEAFSAILDFVEQYPSA
100 110 120 130 140 150
100 110 120 130 140 150
Query NIASFIFSGKPGTGKNHLAAAICNELLLR-GKSVLIITVADIMSAMKDTFRNSGTSEEQL
+ ++ + G G GK++L AA+ +EL + G S+ ++ ++ +K+++ N++++EE
Subjct EQKGLYLYGDMGIGKSYLLAAMAHELSEKKGVSTTLLHFPSFAIDVKNAISNGSVKEE--
160 170 180 190 200
160 170 180 190 200 210
Query LNDLSNVDLLVIDEIGV-QTESKYEKVIINQIVDRRSSSKRPTGMLTNSNMEEMTK----
++ ++NV +L++D+IG+ Q+ S + +++ I++ R + PT + +N ++ ++ +
Subjct IDAVKNVPVLILDDIGAEQATSWVRDEVLQVILQYRMLEELPTFFTSNYSFADLERKWAT
210 220 230 240 250 260
220 230 240
Query LLG-------ERVMDRMRLGNSLWVIFNWDSYRSRVTGKEY
+ G +RVM+R+R
Subjct IKGSDETWQAKRVMERVRYLAREFHLEGANRR
270 280 290 300
ConSurf Coloring Script
For an explanation of the evolutionary conservation results, see above.
The script below is from the 2012 analysis[4]. It can be run in Jmol to color the amino acids of DnaC by evolutionary conservation. CON10 marks insufficient data. CON9 is the highest level of conservation, and CON1 is the lowest (most variable).
select all
color [200,200,200]
select PHE57
color [255,255,150]
spacefill
define CON10 selected
select ILE62, ASN73, GLY106, GLY109, THR110, GLY111, LYS112, HIS114, LEU115
select selected or ALA116, ALA118, GLU153, LEU165, LEU166, ASP169, GLU170
select selected or GLY172, ASP189, ARG191, ASN203, ARG216, ASP219, ARG220
select selected or TRP233, SER235, ARG237
color [160,37,96]
spacefill
define CON9 selected
select ARG55, SER60, GLY61, LEU65, PHE71, TYR74, ALA84, VAL92, PHE95, ASN113
select selected or ILE119, LEU123, VAL130, THR134, THR145, VAL163, ILE168
select selected or GLN174, SER177, GLU180, ILE187, SER192, PRO197, THR198
select selected or THR202, GLY214, MET221, SER226, PHE231
color [240,125,171]
spacefill
define CON8 selected
select HIS66, GLN81, PHE102, VAL135, SER140, LYS143, SER152, LEU156, ASP164
select selected or VAL167, ILE171, ILE184, ASN185, VAL188, GLY199, LEU213
color [250,201,222]
spacefill
define CON7 selected
select THR56, ARG59, CYS69, SER70, ALA88, TYR91, ILE99, SER101, PHE104, SER105
select selected or ALA117, CYS120, ASN121, LEU124, GLY127, SER129, ILE133
select selected or ALA136, ASP137, ILE138, MET139, PHE146, ILE183, GLN186
select selected or ARG190, SER193, MET200, LEU201, SER204, LEU223, GLY224
select selected or ASN225, VAL229
color [252,237,244]
spacefill
define CON6 selected
select ASN58, ARG63, ASN68, VAL76, GLY80, LEU85, ASN98, ALA100, ILE103, LEU131
select selected or MET142, LEU157, LEU160, SER161, VAL182, SER194, ASN205
select selected or MET209, VAL217, TYR236
color [255,255,255]
spacefill
define CON5 selected
select ARG89, GLU94, PRO108, SER149, GLU154, LYS178, TYR179, LYS181, ARG196
select selected or GLU215, LEU227
color [234,255,255]
spacefill
define CON4 selected
select PRO64, GLN67, GLU72, CYS78, MET82, ILE132, GLU176, GLU208, ASN232
color [215,255,255]
spacefill
define CON3 selected
select GLN90, ARG126, ALA141, VAL173, LYS195
color [140,255,255]
spacefill
define CON2 selected
select ARG75, GLU77, GLU79, ASN83, SER86, LYS87, GLU93, ASP96, GLY97, LYS107
select selected or GLU122, LEU125, LYS128, ASP144, ARG147, ASN148, GLY150
select selected or THR151, GLN155, ASN158, ASP159, ASN162, THR175, MET206
select selected or GLU207, THR210, LYS211, LEU212, MET218, ARG222, TRP228
select selected or ILE230, ASP234
color [16,200,209]
spacefill
define CON1 selected
Notes & References
- ↑ A model was created in 2008 by Swiss-Model using its totally automated first approach mode with template 2qgz. In 2012, Swiss-Model's automated mode chose a different template, 3ecc, and created a similar model.
- ↑ Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201. Free full text. Server: swissmodel.expasy.org
- ↑ 3.0 3.1 In December, 2008, Swiss-Model deemed the sequence alignment of E. coli DnaC with A. aeolicus DnaC to be too unreliable to permit using the 3ec2 structure of the latter as a template for homology modeling of E. coli DnaC.
- ↑ 4.0 4.1 4.2 In the 2012 analysis, ConSurf found 47 unique sequences in Clean Uniprot. The MSA had an average pairwise distance of 0.98.
- ↑ In 2008, ConSurf found only 10 sequences in SwissProt, with an average pairwise distance (APD), in the multiple sequence alignment, of 1.6. The run shown here used 100 sequences from Uniprot, with an APD of 1.4.
- ↑ ConSurf result using 50 sequences from Uniprot, with an average pairwise distance in the multiple sequence alignment of 1.6.
- ↑ Not clear to User:Eric Martz in December, 2008.
- ↑ Registration refers to the positioning of amino acids along the backbone of the homology model. Amino acids are "in register" when correctly positioned. The sequence of the target protein (DnaC) can be thought of as sliding along the template backbone, as a consequence of the process of sequence alignment (or threading). The correct registration will be known only when an empirical crystallographic structure becomes available for DnaC.
- ↑ The structural alignment of 2qgz with 3ec2 was performed with the Magic Fit function of DeepView version 3.6beta2. 2qgz 115-259 aligned with 3ec2 42-185 (3 gaps in 3ec2's alignment: 128-9, 134-5, 155-9). 135 alpha carbons were aligned with RMS 2.76 Å. The sequence identity between 2qgz and 3ec2 is 28% over the 185 amino acid length of the shorter, 3ec2. Magic Fit is a sequence-alignment-guided structural alignment (see Structural alignment tools).
- ↑ Structural alignment done with DeepView 3.6b3 using Magic Fit of carbon alphas.
Proteopedia Page Contributors and Editors (what is this?)
Eric Martz, Alexander Berchansky, Joel L. Sussman, David Canner, Michal Harel
DOI: https://dx.doi.org/10.14576/333957.1802412 (?) Citation: Martz E, Canner D, Harel M, Berchansky A, 2013, "Structure of E. coli DnaC helicase loader", Proteopedia, DOI: https://dx.doi.org/10.14576/333957.1802412