# Research Topics

My main area of research is Computational Chemistry, most particularly the molecular modeling of large biochemical systems like proteins, using a wide range of computational chemistry tools from quantum mechanics to classical molecular mechanics force fields. My different research projects focus both on methodological developments that can help in modeling such large complex biochemical systems and on the use and the applications of such tools to obtain a better understanding of the structure, the dynamics and the reactivity of peptides and proteins.

# Methodological Development

## SemiEmpirical Born-Oppenheimer Molecular Dynamics (SEBOMD)

Modeling large and reactive systems is still a challenging problem in modern computational chemistry. While full ab initio or Car-Parrinello molecular dynamics are demanding very high computational costs and are mostly limited to both "small" systems and relatively short timescales (i.e., around a couple of hundreds of atoms during a few tens of picoseconds) even using supercomputers, QM/MM methods have proven nowadays to be among the most popular methods to tackle such problems (Monard et al, 1999). However, while very attractive in terms of computational costs, these multi-scaled methods suffer from some conceptual drawbacks mainly due to uncertainties on the way the QM and MM parts should interact: how electrostatics should be handled? How van der Waals interactions should be modeled? How frontier covalent bonds should be described? etc (Monard et al., 2003).

In the AmberTools open source version of the Amber biomolecular package we have implemented what could be a third way to handle macromolecular systems for which the modeling of electronic and dynamical properties is required: the SEBOMD approach, standing for SemiEmpirical Born-Oppenheimer Molecular Dynamics. In this method, a whole system (e.g., a solute and its solvent in gas phase or encaged in a periodic box) is modeled at the full semiempirical NDDO quantum chemical level (Monard et al., 2005). Such a fast, albeit approximate, quantum method allows for the molecular dynamics simulations of rather large systems (today up to a thousand of atoms) for long simulations (in the nanosecond timescale) using commodity computers.

One of the most interesting aspects of the SEBOMD approach is that it allows for the description of large systems at the quantum chemical level, therefore accounting for charge transfer and polarization (Ingrosso et al, 2011, Farag et al., 2014).

Fig.: Charge transfer in a box of water molecules: negatively charged; neutral; positively charged

## pKa predictions in proteins

The prediction of the pKa's of ionizable residues in proteins is a matter of high importance in many fields related to biochemistry since the pKa deviation from standard values (i.e., in solution) for any amino acid in a protein can have drastic consequences, ranging from local structural changes to modified catalytic behaviors in enzymes.

In a first step towards the development of an efficient and accurate protocol to estimate aminoacids' pKa's in proteins, we have developed a protocol to reproduce the pKa's of alcohol and thiol based residues (namely tyrosine, serine, and cysteine) in aqueous solution from the knowledge of the experimental pKa's of phenols, alcohols and thiols (Ugur et al., 2014). Our protocol is based on the linear relationship between computed atomic charges of the anionic form of the molecules (being either phenolates, alkoxides, or thiolates) and their respective experimental pKa'svalues.

It has been tested with different environment approaches (gas phase or continuum solvent-based approaches), with five distinct atomic charge models (Mulliken, Löwdin, NPA, Merz-Kollman, and CHelpG), and with nine different DFT functionals combined with sixteen different basis sets. Moreover, the capability of semiempirical methods (AM1, RM1, PM3, and PM6) to also predict pKa's of thiols, phenols and alcohols has been analyzed.

From our benchmarks, the best combination to reproduce experimental pKa's is to compute NPA atomic charge using CPCM model at the B3LYP/3-21G and M062X/6-311G levels for alcohols (R$$^2$$ = 0.995) and thiols (R$$^2$$ = 0.986), respectively.

Fig.: linear regression between calculated NPA atomic charges and experimental pKa's. Calculations were performed using B3LYP/3-21G//CPCM and M062X/6-311G//CPCM for alcohols and thiols, respectively

What is interesting here is that the linear relationship between atomic charge and pKa is nearly independent of the DFT functional that is chosen or the basis set that is used.

Fig.: Mean Absolute Deviation (MAD) and maximum difference between predicted and experimental pKa (MAX-∆pKa) for nine different DFT functionals and sixteen differ ent basis sets.

By combining all the results, some uncertainties on the predicted pKa's can therefore be drawn.

Fig.: Predicted pKa overall the DFT functionals and basis sets versus experimental pKa (solvation model=CPCM, charge model=NPA). Circles show the average pKa while the error bars denote minimum and maximum predicted pKa.

# Molecular Modeling of Peptides and Proteins

Below we present two examples of researches that we have conducted in collaborations with theoretical or experimental groups.

The first example deals with the deamidation of asparaginyl residues in peptides and proteins. This is a long-standing collaboration with the computational chemistry group of Prof. Viktorya Aviyente and Assoc. Prof. Saron Catak from Bogazici University in Istanbul, Turkey.

In a second example, we present a more recent collaboration that we have done with experimentalist Laurent Vial from Université Claude Bernard in Lyon, France to better understand the structure and the dynamics of poly-L-lysine dendrigrafts.

## Deamidation in peptide and proteins

The deamidation reaction is regarded as the most commonly observed chemical degradation which causes time dependent changes in conformation and limits the lifetime of peptides and proteins. The timed processes of protein turnover, aging, and several diseases (such as eye lens cataracts, Alzheimer, and particular types of cancer) have been suggested as possible consequences of deamidation. This reaction is also of significant chemical interest because of its effect on the stability of protein pharmaceuticals. Among the 20 natural amino acids, asparagine (Asn) and glutamine (Gln) residues are known to undergo spontaneous nonenzymatic deamidation to form aspartic acid (Asp) and glutamic acid (Glu) residues under physiological conditions.

For several years now, we have use different molecular modeling tools such as quantum mechanics, molecular mechanics, QM/MM methods, molecular dynamics, free energy samplings, etc to understand the deamidation reaction of asparagine residues in small peptides and in proteins.

#### Quantum chemical models

Quantum chemical calculations on small model systems have been performed to decipher all possible reaction mechanisms yielding aspartic acid from asparagine ( Catak et al., 2006, Catak et al., 2008, Catak et al., 2009 ).

Our main results indicate that water assistance increases the rate of deamidation in all mechanisms. The tautomerization route has the lowest barrier for succinimide formation and cyclization is the rate-determining step for succinimide formation. In agreement with experiment, we have also found that succinimide hydrolysis is the rate-determining step for deamidation. This explains why succinimide intermediate has been experimentally isolated.

#### Molecular dynamics and Free energies of deamidation

It has been proposed that deamidation in the enzyme triosephosphate isomerase (TPI) regulates protein turnover, based on the fact that it is controlled by catalytic activity and that it causes protein degradation. In its active homodimeric form, mammalian TPI contains two deamidation sites per monomer. Experimental evidence shows that the primary deamidation site (Asn71-Gly72) deamidates faster than the secondary deamidation site (Asn15-Gly16). To evaluate the factors controlling the rates of these two deamidation sites in TPI, we have performed graphics processing unit- enabled microsecond long molecular dynamics simulations of rabbit TPI. The kinetics of asparagine dipeptide and two deamidation sites in mammalian TPI are also investigated using quantum mechanical/molecular mechanical tools with the umbrella sampling technique. Analysis of the simulations has been performed using independent global and local descriptors that can influence the deamidation rates: desolvation effects, backbone acidity, and side chain conformations (Ugur et al., 2012, Ugur et al., 2015).

Our findings show that all the descriptors add up to favor the primary deamidation site over the secondary one in mammalian TPI: Asn71 deamidates faster because it is more solvent accessible, the adjacent glycine NH backbone acidity is enhanced, and the Asn side chain has a preferential near attack conformation. The crucial impact of the backbone amide acidity of the adjacent glycine on the deamidation rate is shown by kinetic analysis. Our findings also shed light on the effect of high-order structure on deamidation: the deamidation in a small peptide is favored first because of the higher reactivity of the asparagine residue and then because of the stronger stability of the tetrahedral intermediate.

## Molecular Modeling of Poly‐L‐lysine Dendrigrafts

Dendrigrafts of poly-L-lysine (DGLs) recently complemented the family of polycationic dendritic macromolecules that include the prominent polyamidoamine (PAMAM) and polyethylenimine (PEI) polymers.

Because their iterative synthesis in aqueous conditions is thermodynamically controlled by precipitation and kinetically controlled by steric hindrance, DGLs share features with both PAMAM dendrimers (i.e., generation-based growth and narrow molecular weight distribution), and hyperbranched PEI (i.e.,broad number of regioisomers and rapid increase in molecular weight per generation). In contrast to PAMAM and PEI, DGLs are biodegradable, exhibit low cellular toxicities, and turned out to be non-immunogenic. As a consequence, DGLs have recently gained a huge interest in the biomedical field during the last quinquennium, with numerous applications in drug and gene delivery, biomaterials and tissue engineering, bio-imaging and biosensing.

Fig.: Schematic representation of the first- to fourth-generation DGLs G1-G4 (each dot represents a L-lysine residue, pending free amino groups are not represented).

Despite of a growing use of the poly-L-lysine dendrigrafts in biomedical applications, a deeper understanding of the molecular level properties of these macromolecules was missing. Therefore, we has developed a simple methodology for the construction of three-dimensional structures of poly-L-lysine dendrigrafts, and the subsequent investigation of their structural features using microsecond molecular dynamics simulations (Francoia et al., 2017). This methodology relies on the encoding of the polymers experimental characterizations (i.e., composition, degrees of polymerization, branching ratios, charges) into alphanumeric strings that are readable by the Amber simulation package. Such an original approach now opens avenues toward the in silico exploration of dendrigrafts and hyperbranched polymers.

Fig.: All possible L-Lysine buildings blocks A to h (and their corresponding three-letter codes in brackets) involved in the construction of DGLs G1-G4.

Fig.: The alphanumeric description of a second-generation dendrigraft G2 (Top), and the corresponding arrangement of the residues (Bottom). Residues in blue indicate the mother DGL G1. Peptidic and isopeptidic bonds are depicted as black line and grey lines, respectively.

Fig.: Snapshots of the first- to fourth-generation DGLs G1 to G4 taken after 1 microsecond of molecular dynamics at T = 300 K. Blue dots highlight the positively charged nitrogen atoms at pH 7.8. All images are to the same scale.