SARS-CoV-2 protein N
From Proteopedia
Model Confidence:
Very high (pLDDT > 90) Confident (90 > pLDDT > 70) Low (70 > pLDDT > 50) Very low (pLDDT < 50) AlphaFold produces a per-residue confidence score (pLDDT) between 0 and 100. Some regions below 50 pLDDT may be unstructured in isolation.
To the right is an AlphaFold2 3D model of SARS CoV-2 Protein N (length=419 amino acids, UniProt ID: P0DTC9) color coded by the pLDDT scores. It corresponds to the highest ranked model in terms of the pLDDT confidence scores, i.e., model 5[1].
FunctionThe primary function of the Nucleocapsid protein (N-protein) is to package the viral genome into a helical ribonucleoprotein (RNP) complex that protects the genomic RNA, and bind it tightly during the virus’s journey to a new host[2]. N-proteins are necessary for viral RNA transcription and replication and are also regulating these processes. The packaging of the RNA into the RNP complex is a fundamental part of the RNA assembly in infected cells. N-proteins are also involved in the replication cycle and have influence on the hosts cellular response to viral infection[2]. Structure descriptionThe N-protein consists of 419 amino acids and can be divided into two folded domains and three disordered regions. The three disordered regions dynamically change their conformation and are very flexible (see morph of the top 5 ranked AlphaFold2 models, to the left, that gives a feel for the flexibility of these domains). This flexibility allows them to rotate and enable the binding to macromolecules like RNA. The two folded domains on the other hand are well structured and have thus been modelled using X-Ray diffraction[3] and NMR. [4][5]. N-terminal flexible armThe N-terminal flexible arm is one of the disordered regions, but with parts of transient helicity. Its conformation is significantly affected by the neighbouring folded RNA binding domain (RBD), which reduces the accessible space of the flexible arm and hence supports an expanded configuration of this domain. Interactions with the RBD through fuzzy interactions cause kinetic traps for certain transient configurations. Furthermore, some attractive and repulsive interactions with the RBD are supported by the arginine-rich region (residues 31 - 41) and by residue Phe 17 of the NTD. The arginine-rich motif was found to form a transient alpha helix (H2)[5]. C-terminal flexible tail:The disordered C-terminal flexible tail incorporates two more transient helices (H5 and H6). Helix H6 is amphipathic with a hydrophobic face and a positively charged inside. It is probably more highly populated than helix H5. The residues of helix H6 also contribute extensively to an intramolecular interaction with the C-terminal dimerization domain. Thus, helix-formation is in a constant competition with intramolecular interaction with the C-terminal dimerization domain. Experiments revealed “transient but non-negligible interactions [of the CTD] with the dimerization domain”[5]. Central Linker region (LKR)The Central Linker region (LKR) is a sequence of polar and charged amino acids within a serine-arginine rich motif and a low number of residues causing steric effects. The resulting electrostatic repulsion of the positively charged residues and the high flexibility due to low steric effects prevents a well structured conformation and causes the disorder. Nevertheless, the region still allows transient structure formation by forming two transient helices: a serine-arginine rich transient helix (H3) and a hydrophobic helix (H4). By measuring the rearrangement time, it has been found that the linker does not interact with the neighbouring folded domains[5]. The serine-arginine rich motif also provides several putative phosphorylation sites. These sites may regulate protein functions and interactions between membrane proteins and the N-Proteins[2]. The remaining two domains are well organized and represent the majority of the protein. They also contribute the most to the protein’s functions[5]. N-terminal RNA binding domain (RBD):The β-sheet core has five antiparallel β-strands with a β-hairpin between β2 and β5 and a short helix before β2. The β-hairpin is flexible and may undergo conformational changes during RNA binding. [2] Nevertheless, all of the N-protein’s domains and regions are involved in the RNA binding process, which is its main function[5]. C-terminal dimerization domain:The dimerization domain has more short helices than the RBD. It incorporates eight α helices and only two β-strands which are antiparallel and forming a β-hairpin. The domain is involved in the dimerization (and oligomerization) process, which makes the N-protein functional[2]. There is no complete structure of the N-protein available. The size of the protein and the flexibility of the disordered regions make it hard to create a full structure. Without a complete structure of the protein it is very difficult to get further information about the detailed binding mechanism. There are several structures of the RBD and the dimerization domain available in the PDB, but no structures of the flexible arm, the flexible tail and the Linker due to their disordered nature. DiseaseThe N-protein of SARS-CoV-2 contributes to the COVID-19 disease by protecting the fragile viral RNA genome. It thereby enables the spreading of functioning virions from person to person without getting damage. RelevanceThe N-protein is a multifunctional protein which could be a possible drug target for treatment of COVID-19. It is the most conserved structural protein in coronaviruses with a 90.52 % identity with SARS-CoV-1, which may indicate that it is less likely to mutate and thus makes the N-protein a possible drug target. Additionally, the N-protein may also be relevant for early diagnostics of a SARS-CoV-2 infection, because there is a large amount of N-proteins in infected cells[6]. See alsoCoronavirus_Disease 2019 (COVID-19) References
|