dynamics of the gb3 loop regions from md simulation: how much of it is real?

8
Published: March 10, 2011 r2011 American Chemical Society 3488 dx.doi.org/10.1021/jp108217z | J. Phys. Chem. B 2011, 115, 34883495 ARTICLE pubs.acs.org/JPCB Dynamics of the GB3 Loop Regions from MD Simulation: How Much of It Is Real? Tong Li, Qingqing Jing, and Lishan Yao* Lab of Biofuels, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, 266061, China b S Supporting Information INTRODUCTION Molecular dynamics (MD) simulation has been widely used to study the structure, dynamics, and function of biomolecules. The rapid development of computer power permits one to run MD simulations for microseconds, for a system with tens of thou- sands of atoms, which makes it possible to explore the conforma- tional space of macromolecules at longer time scales. For example Nederveen 1 and colleague performed a 0.2 μs MD simulation for ubiquitin, a small protein with 76 residues, and found that the internal exibility, especially for the loop and turn regions, increases considerably as judged by the decrease of the order parameters S 2 of the backbone N-H bond vectors when a longer MD trajectory was generated. This eect was attributed to the presence of nanosecond to microsecond time scale motion. This same motion was later exploited in a longer 1.2 μs MD simulation of the same protein 2 and a more recent 200-ns simulation of protein GB1. 3 However, the force elds used in simulations are rarely tested for a protein system in a long microsecond MD run, and the convergence of the sampling is another issue. Whether all conformation space explored from long MD simulations is physically real or weighted properly by the true free energy remains unclear. In fact, Lange et al. 4 found out that chopping a 1 μs MD trajectory of ubiquitin to 20 pieces of shorter blocks yields the optimum predicted residual dipolar couplings (RDCs) when compared to the experimental ones while using longer trajectory does not improve the quality. This conclusion holds true for multiple force elds tested in their work, and in several cases, using longer than 50 ns trajectory fragments led to a deterioration of the predicted RDC quality. Note that RDC probes dynamics up to a millisecond time scale, while the simulation length is in the microsecond time scale and the tted trajectory length in the nanosecond time scale, which raises the possibility that the performed MD simulations may overestimate the dynamics due to sampling error or force eld inaccuracy. It is important to distinguish these two problems since the former can be overcome by running a longer simulation or using more ad- vanced sampling methods but the latter cannot be solved by doing so; instead, a better-tuned force eld will be required. Several recent works have been published with the goal of validating and/or improving force eld accuracy by using experi- mental data. 5-8 In this work, we study the protein GB3 using the force eld AMBER99sb. The reason that AMBER99sb was selected is that it is one of the most popular force elds used in simulation. It has been tuned recently for the backbone dihedral terms 6 and shown in several studies that better accuracy is achieved, and it performs well when predicting NMR observables such as RDCs, 9 NMR scalar coupling, 10 backbone, and methyl side chain order parameters. 11,12 The system we study is the third immu- noglobulin binding domain of protein G (GB3), which has been previously characterized extensively by both X-ray crystallography Received: August 30, 2010 Revised: January 29, 2011 ABSTRACT: A total of 1.1 μs of molecular dynamics (MD) simulations were performed to study the structure and dy- namics of protein GB3. The simulation motional amplitude of the loop regions is generally overestimated in comparison with the experimental backbone N-H order parameters S 2 . Two- state behavior is observed for several residues in these regions, with the minor state population in the range of 3-13%. Further inspection suggests that the (φ, ψ) dihedral angles of the minor states deviate from the GB3 experimental values, implying the existence of nonnative states. After tting the MD trajectories of these residues to the NMR RDCs, the minor state populations are signicantly reduced by at least 80%, suggesting that MD simulations are strongly biased toward the minor states, thus overestimating the dynamics of the loop regions. The optimized trajectories produce intra, sequential H N -H R RDCs and intra 3 J HNHR that are not included in the trajectories tting for these residues that are closer to the experimental data. Unlike GB3, 0.55 μs MD simulations of protein ubiquitin do not show distinctive minor states, and the derived NMR order parameters are better converged. Our ndings indicate that the artifacts of the simulations depend on the specic system studied and that one should be cautious interpreting the enhanced dihedral dynamics from long MD simulations.

Upload: lishan

Post on 16-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Published: March 10, 2011

r 2011 American Chemical Society 3488 dx.doi.org/10.1021/jp108217z | J. Phys. Chem. B 2011, 115, 3488–3495

ARTICLE

pubs.acs.org/JPCB

Dynamics of the GB3 Loop Regions from MD Simulation:How Much of It Is Real?Tong Li, Qingqing Jing, and Lishan Yao*

Lab of Biofuels, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, 266061, China

bS Supporting Information

’ INTRODUCTION

Molecular dynamics (MD) simulation has been widely used tostudy the structure, dynamics, and function of biomolecules. Therapid development of computer power permits one to run MDsimulations for microseconds, for a system with tens of thou-sands of atoms, which makes it possible to explore the conforma-tional space of macromolecules at longer time scales. For exampleNederveen1 and colleague performed a 0.2 μs MD simulation forubiquitin, a small protein with 76 residues, and found that theinternal flexibility, especially for the loop and turn regions,increases considerably as judged by the decrease of the orderparameters S2 of the backboneN-Hbond vectors when a longerMD trajectory was generated. This effect was attributed to thepresence of nanosecond to microsecond time scale motion. Thissamemotion was later exploited in a longer 1.2 μsMD simulationof the same protein2 and a more recent 200-ns simulation ofprotein GB1.3 However, the force fields used in simulations arerarely tested for a protein system in a long microsecond MD run,and the convergence of the sampling is another issue. Whether allconformation space explored from long MD simulations isphysically real or weighted properly by the true free energyremains unclear. In fact, Lange et al.4 found out that chopping a 1μsMD trajectory of ubiquitin to 20 pieces of shorter blocks yieldsthe optimum predicted residual dipolar couplings (RDCs) whencompared to the experimental ones while using longer trajectorydoes not improve the quality. This conclusion holds true formultiple force fields tested in their work, and in several cases,

using longer than 50 ns trajectory fragments led to a deteriorationof the predicted RDC quality. Note that RDC probes dynamicsup to a millisecond time scale, while the simulation length is inthe microsecond time scale and the fitted trajectory length in thenanosecond time scale, which raises the possibility that theperformed MD simulations may overestimate the dynamicsdue to sampling error or force field inaccuracy. It is importantto distinguish these two problems since the former can beovercome by running a longer simulation or using more ad-vanced sampling methods but the latter cannot be solved bydoing so; instead, a better-tuned force field will be required.

Several recent works have been published with the goal ofvalidating and/or improving force field accuracy by using experi-mental data.5-8 In this work, we study the protein GB3 using theforce field AMBER99sb. The reason that AMBER99sb wasselected is that it is one of the most popular force fields used insimulation. It has been tuned recently for the backbone dihedralterms6 and shown in several studies that better accuracy is achieved,and it performs well when predicting NMR observables such asRDCs,9 NMR scalar coupling,10 backbone, andmethyl side chainorder parameters.11,12 The system we study is the third immu-noglobulin binding domain of protein G (GB3), which has beenpreviously characterized extensively by both X-ray crystallography

Received: August 30, 2010Revised: January 29, 2011

ABSTRACT: A total of 1.1 μs of molecular dynamics (MD)simulations were performed to study the structure and dy-namics of protein GB3. The simulation motional amplitude ofthe loop regions is generally overestimated in comparison withthe experimental backbone N-H order parameters S2. Two-state behavior is observed for several residues in these regions,with the minor state population in the range of 3-13%. Furtherinspection suggests that the (φ,ψ) dihedral angles of the minorstates deviate from the GB3 experimental values, implying theexistence of nonnative states. After fitting theMD trajectories ofthese residues to the NMR RDCs, the minor state populations are significantly reduced by at least 80%, suggesting that MDsimulations are strongly biased toward the minor states, thus overestimating the dynamics of the loop regions. The optimizedtrajectories produce intra, sequential HN-HR RDCs and intra 3JHNHR that are not included in the trajectories fitting for theseresidues that are closer to the experimental data. Unlike GB3, 0.55 μs MD simulations of protein ubiquitin do not show distinctiveminor states, and the derived NMR order parameters are better converged. Our findings indicate that the artifacts of the simulationsdepend on the specific system studied and that one should be cautious interpreting the enhanced dihedral dynamics from long MDsimulations.

3489 dx.doi.org/10.1021/jp108217z |J. Phys. Chem. B 2011, 115, 3488–3495

The Journal of Physical Chemistry B ARTICLE

and NMR spectroscopy.13-17 The dynamics has been wellstudied by NMR relaxation18 and RDCs,19,20 allowing a directcomparison to that extracted from MD simulation data. We firstexamine the motion on picosecond to nanosecond time scalesand then extend the motion to the slower than moleculartumbling time scale. Instead of running one single simulation,we performed 20 MD simulations (50 ns each) with differentstarting structures generated from a prior 100 ns MD trajectory.It has been demonstrated21,22 that multiple short simulationsusually explore the conformational space more efficiently, thusproviding more insight than from one single long simulationusing the same computational time. By comparing with experi-mental Lipari and Szabo order parameters S2 derived from NMRrelaxation data using model-free analysis,23,24 which track mo-tions faster than the rotational tumbling time (3.3 ns for GB3),we are able to tell whether the motion of backbone amide in thesubpicosecond time scale is over or underestimated in the MDsimulation. It is worth noting that a quantitative comparisonbetween the experimental and the computational order para-meters from MD simulation has been difficult,25 but recentdevelopments in solution NMR have allowed accurate determi-nation of the site-specific 15N chemical shift anisotropy (CSA) ofGB3,26,27 effectively eliminating the artifact of the experimentalS2 caused by the 15N CSA variation and properly balancing thecontributions from the CSA and dipole-dipole relaxation me-chanisms. Furthermore, determination of the N-H effectivebond length of 1.04 Å from RDCs and quantum mechanics(QM) calculations28,29 excludes the contribution to S2 from zero-point vibration, which is absent in a classical MD simulation asperformed in this work.

The residues with large discrepancies between MD predictedand experimental S2 values are identified and further investigated.The backbone RDCs (including backbone N-H, CR-HR, C0-CR, C0-N) for these residues are calculated and compared withthe experimental ones.17,30 Together we show that some of thestates (based on φ, ψ distribution plots) sampled by MD simula-tions are not native, and the relative populations of these statesare much higher than allowed experimentally.

’METHODS

Starting coordinates for theMD simulation of protein GB3 aretaken from theNMRstructure17 (pdb code 2OED).The simulationwas performed using the SANDER module in the AMBER 11program,31 with the AMBER99sb force field.6 The GB3 proteinwas solvated with ∼4800 TIP3 water molecules, and two Naþ

ions were added to neutralize the system. Energy minimizationwas performed to remove bad contacts, and then the system washeated from 0 to 300 K in a 50 ps constant volumeMD simulation,after which a 1 ns NPT simulation was run to equilibrate thewater so that the density reached 1.0 kg/L. Subsequently, a 100ns NVT simulation was performed, and the snapshots at time 5N(N = 1, 2, ..., 20) ns were extracted and used as starting structuresfor 20 further 50 ns NVT MD simulations where the velocitieswere assigned according to a Maxwell-Boltzmann distributionwith different random number seeds. The system was coupled toa temperature bath with a time constant of 0.5 ps.32 The particle-mesh-Ewald33 method was used to evaluate the contributions ofthe long-range electrostatic interactions. A nonbonded pair listcutoff of 12.0 Å was used, and the nonbonded pair list wasupdated every 25 steps. All bonds to hydrogen atoms wereconstrained using the SHAKE algorithm,34 allowing a time step

of 0.002 ps. The coordinates were saved every 20 ps, and thetrajectories were analyzed with the PTRAJ module of theAMBER tool and an in-house program.

The orientational time correlation function is given as

CðtÞ ¼ ÆP2ðμð0Þ 3 μðtÞÞæ ð1Þwhere the second Legendre polynomial P2(x) = (3x2-1)/2 andthe unit vector μ describes the orientation of the N-H bondvector in a molecular frame. This correlation function for eachN-H vector (fromQ2 to E56), was computed from the 20 50-nstrajectories up to 3.3 ns after fitting the trajectories to the startingstructure 2OED. The order parameters were computed byaveraging C(t) from 2.8 to 3.3 ns and over the 20 trajectories.

The predicted RDCs including backbone N-H, CR-HR,C0-CR, and C0-N were calculated using the equation35,36

DM ¼ DISmaxÆBæ� ÆAæ ð2Þ

where DmaxIS = -μ0(h/2π)γIγs/(4π

2rIS3 ). Here, μ0 is the mag-

netic permittivity of vacuum, h is Planck’s constant, γx is themagnetogyric ratio of spin X, and rIS the distance between nucleiI and S. B is a bond vector matrix with each row composed ofvector b, b= {(3z2- 1)/2, (31/2/2)(x2-y2), 31/2xz, 31/2yz, 31/2xy}in which (x, y, z) denotes the coordinate for the unit bond vector.The number of rows is equal to the number of bond vectorsconsidered. A is the alignment tensor matrix, with each columnconsisted of vector a, a = {(3z02-1)/2, (31/2/2)(x02-y02), 31/2x0z0,31/2y0z0, 31/2x0y0}. (x0, y0, z0) is the coordinate of the magneticfield in a molecular reference frame. The number of columns inmatrix A is the number of alignments in which the RDCs aremeasured. Æ...æ denotes the ensemble average.

A total of about 11 sets of N-H, CR-HR, and C0-CR and 5sets of C0-N RDCs are experimentally available17,30 for eachresidue. They were fitted separately for theMD trajectories of thefour fragments G9-G14, A20-D22, D40-G41, and E56. Thefitting procedure is described as follows. First, the apparent align-ment tensor matrix Dmax

IS ÆAæ is determined using N-H RDCsfrom rigid regions by an iterative DIDC method20 in which thealignment tensors and the bond vector orientations are deter-mined in a self-consistent manner. Once the alignment tensorsare determined, they are rotated to the reference frame definedby the 2OED structure. Second, the ÆBæ matrix is calculated forN-H, CR-HR, C0-CR, and C0-N vectors from 20 MD trajec-tories, and the predicted RDCs are evaluated using eq 2. The Æ...æaverage is taken as the MD time average. To save computationtime, 250 snapshots from each 50-ns trajectory (total of 250� 20snapshots) are used to calculate RDCs with the alignmenttensors fixed to the values from the iterative DIDC method.The reason that singular value decomposition was not used to fitthe experimental RDCs for each snapshot and then backcalculate the predicted RDCs is to minimize the structure noiseeffect on alignment tensors. The scaling factors of 1/2.08, 1/0.202,and 1/0.12 are applied for CR-HR, C0-CR, and C0-N experi-mental RDCs, respectively,37 before comparing to the predictedones. The fitting error χ is defined as

χ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n∑n

i¼ 1ðRDCexp - RDCpredÞ2

sð3Þ

where n is the number of RDCs included. Third, a new χ value iscalculated after excluding one snapshot. If the value is smallerthan the original one this snapshot is excluded and the χ value of

3490 dx.doi.org/10.1021/jp108217z |J. Phys. Chem. B 2011, 115, 3488–3495

The Journal of Physical Chemistry B ARTICLE

the system is updated. Otherwise, the snapshot is retained andthe χ value is kept intact. In principle, this procedure should bedone iteratively until the minimum χ value is reached. In practice,we find that by assessing each snapshot once, approximately 45%of 5000 snapshots survive and a significant improvement of thefitting can already be achieved. Thus, no more than one assess-ment is performed for any snapshot to prevent overfitting theRDC data. The fitting for each fragment is performed indepen-dently, meaning that the surviving snapshots from one fragment(e.g., G9-G14) fits are not necessarily the same as those obtainedfrom the fit of a different fragment (e.g., D40-G41), so that themotional correlation between fragments is effectively ignored.

The selection procedure is very simple. It will definitely fail ifthe real (φ,ψ) states are not sampled in the MD simulations.However, for a protein vibrating close to its native structure andoccasionally visiting some non-native states in a MD simulation,it becomes possible to tell whether the nonnative states are real ornot. For such a simple case, it can be justified that if throwingaway the non-native states improves the agreement betweenthe experimental and predicted quantities such as RDCs, it isvery likely that the non-native states in the MD simulation arenot real. Certainly, an independent validation using otherexperimental data is important as well. In this work, the intraand sequential residue HN-HR RDCs and intra 3JHNHR

38 thatare not included in the MD snapshot selection procedure areused for this purpose.

A total of 0.55 μs of MD simulations were performed foranother protein ubiquitin with a similar protocol as in the GB3system. A 50 ns simulation was carried out starting from X-raystructure 1ubq,39 and then the snapshots at time points 5, 10, 15,20, 25, 30, 35, 40, 45, and 50 ns were used for another 50 nsrestart simulation for each system.

’RESULTS AND DISCUSSIONComputational versus Experimental S2. The computed

average order parameters for the secondary structure regionsincluding β1-β4 andR1 are listed in Table 1. As can be seen, thefour regions except β2 show quite uniform order parameters withthe average 0.88 ( 0.05, close to the experimental value 0.89 (0.03. The good agreement demonstrates that for well-definedregions, MD simulations produce average order parameters withgood accuracy (Table 1). However, when comparing the site-specific order parameters, rather substantial differences betweenthe experimental and computational numbers are observed(Figure 1), notably for the regions including G9-G14, A20-D22, and N37-G41, with the computational values significantlylower. Two residues K13 and G14 considerably decrease theaverage order parameter of the β2 region (K13-K19). Whenexcluded, the average value increases to 0.87, which is muchcloser to the corresponding experimental value of 0.88.Since there are a total of 20 analyzed trajectories for GB3,

plotting the time correlation function C(t) (eq 1) for eachresidue in every trajectory can provide extra valuable information(Figure S1, Supporting Information). The C(t) function for eachresidue appears to converge, except for Q2, G9-G14, A20, D22,D40-G41, and E56. For example, the C(t) of residue G9diverges with S2 ranging from 0.54 to 0.86 (Figure 2A), wherethe experimental S2 is 0.92. A similar characteristic can be seen forA20, with the order parameter ranging from 0.41 to 0.83(Figure 2B), smaller than the experimental value of 0.88. TheC(t) typically has a drop at the second time point of 20 ps due tothe fast time scale motion, after which it either remains as aplateau, yielding an order parameter close to the experimentalvalue, or decays further and underestimates the experimentalorder parameter as seen for the residues mentioned above

Table 1. Backbone N-H Order Parameters of Protein GB3 Determined from the MD Simulations for the Secondary StructureFragments Compared to the Experimental Values

β1(Q2-N8) β2(K13-K19) R1(A23-D36) β3(V42-D46) β4(T51-T55)

S2_coma 0.89( 0.01 0.80 ( 0.10 0.90( 0.02 0.87( 0.02 0.90( 0.01

S2_expa 0.89( 0.02 0.86( 0.06 0.89 ( 0.02 0.89( 0.01 0.90( 0.01aThe order parameters were calculated fromN-H time correlation functions (see more details in the Methods section) and averaged over the residuesin individual secondary structure fragments. The residues with no experimental order parameter were excluded.

Figure 1. (A)Comparison of the backboneN-H site-specific order parameters S2 derived fromMD simulations (filled circles) and experimental values(empty circles). The order parameter S2 for each residue was calculated from the time correlation function eq 1. (B) Correlation plot between the twoorder parameters. The solid line is the best fit line y = 0.40xþ 0.54 with the correlation coefficient R value 0.80. Good agreement can be seen for rigidregions, but S2 is much smaller than from the experimental data for the residues in the relatively flexible regions.

3491 dx.doi.org/10.1021/jp108217z |J. Phys. Chem. B 2011, 115, 3488–3495

The Journal of Physical Chemistry B ARTICLE

(Figure S1, Supporting Information). This observation suggeststhat the continuous decay of the time correlation functionsoccurring in some of the trajectories from 0 to 3.3 ns is due tothe existence of nonnative states, with free energy slightly abovethe native state and barriers that can be crossed in the nanose-cond time scale.Since theseMD simulations were started with different structures

selected from a 100 ns simulation, we take a step back to examinethese structures, which may provide a hint to the divergence ofthe order parameters. The CR root-mean-square deviation (rmsd)of Q2-E56 from the starting 2OED structure is smaller than 1.5Å for most snapshots during the 100 ns run, suggesting GB3overall is a quite rigid protein (Figure 3). All the starting structuresfor the 20 MD production runs are less than 1.4 Å away from2OED (17 of which are less 1.0 Å; Figure 3). It is surprising thatsuch small deviations can cause so much difference in the timecorrelation function of a residue like G9, which is likely due to alocal structure effect, as supported by the different patterns of S2

histograms of G9 and A20 (Figure 2). The same phenomenonhas been observed in a study by Koller et al.40 where the secondloop (residues 65-75) of hen egg white lysozyme displays largeorder parameter variations, when MD simulations were startedfrom different X-ray structures with distinct loop 2 conformations.

Observation of Nonnative States. The different dynamicalproperties of residues mentioned above (Q2, G9-G14, A20,D22, D40-G41, and E56) in different trajectories imply thatdistinct conformational space is explored in the MD simulations.To further investigate this issue, the backbone (j, ψ) dihedralangles are plotted (Figures 4 and S2, Supporting Information).Instead of using (ji, ψi) as the coordinate system, (ψi-1, ji) isutilized since it is a better description of the peptide plane ofresidue i. Two different dihedral distribution patterns are ob-served (Figures 4and S2, Supporting Information). K10, T11,G14, and E56 have one single state (Figure S2, SupportingInformation), while other residues including Q2, G9, L12, K13,A20, D22, and N40 display two states. The population of theminor state(s) ranges from 3% to 13% for these residues (Table 2),and generally, they exhibit larger fluctuations ofψ, j angles thanresidues with one single state (Table 3). The dihedral distribu-tion of G41 is more complex, with a major state (a population of57% centered at (150�, 180�)) and two minor states (population40% at (40�,-75�) and 3% at (-50�, 50�)). L12 and G41 exhibitthe largest (ψ,j) fluctuations, consistent with the fact that thesetwo are the most flexible residues (Figure 1A).Fitting to the Experimental RDCs. G9-G14. Unlike relaxa-

tion rates which vary moderately due to protein dynamics anddiffusion tensor rhombicity (D )//D^ = 1.4),18 RDCs are veryeffective in restraining both structure and dynamics, thus provid-ing a better validation for the MD data. There are in total 200experimental backbone N-H, CR-HR, C0-CR, and C0-NRDCs available for the fragment G9-G14. The correspondingpredicted values were calculated from 5000 MD snapshots andcorrelated with the experimental ones (Figure 5). The calculatedχ value of 1.52 Hz is larger than the measurement error (less than1.0 Hz17), implying the existence of noise in the MD data. Afterfitting to the experimental RDCs, the χ value is reduced by 37%to 0.96 Hz where 2162 out of total 5000 snapshots are left. Thisfitting has a profound effect on the structure and dynamics of thefragment. ForG9, the average (ψ,j) value changes from(71�,-95�)to (64�,-83�) and the corresponding fluctuation decreases from(31�, 37�) to (14�, 19�), suggesting that the dynamics change ismore significant. The (ψ, j) contour plot provides a detailedpicture of the conformer cluster changes due to the fitting. Theminor state centered around (140�, 180�) is no longer presentfor the selected 2162 snapshots (Figure 4A and 4B). In fact, thetotal population of the minor state is reduced from 13.3% to 2.5%(Table 2), which is too small to show in Figure 4E, where the

Figure 2. Time correlation functions of the N-H bond vectors of G9 (A) and A20 (B) calculated from 20 MD trajectories, with the bold solid linecorresponding to the average. The insets are the histograms of the order parameters.

Figure 3. Calculated CR RMSDs (Q2-E56) to the reference structureof 2OED in the 100 nsMD simulation. The snapshots at 5N (N = 1, 2, ...,20) ns are selected as the starting structures for the 20 50-ns MDsimulations. (Inset) Histogram of the starting structure rmsd from2OED. Note that none of the selected starting structures deviate bymore than 1.4 Å from 2OED.

3492 dx.doi.org/10.1021/jp108217z |J. Phys. Chem. B 2011, 115, 3488–3495

The Journal of Physical Chemistry B ARTICLE

lowest contour equal probability line is 0.3%. Note that althoughthe total population of 2.5% is higher than the lowest contourlevel, none of the population at any grid point around the minorstate is large enough to appear in the figure. Similar to G9, (ψ,j)fluctuations of L12 and K13 also have a significant decrease from(39�, 43�) and (29�, 24�) to (21�, 27�) and (15�, 15�), respectively(Table 3), where the corresponding minor state populationsdecrease from 4.2% and 3.3% to 0.4% and 0.2%. This resultdemonstrates that the probability of visiting the minor state hasbeen overestimated in MD simulation for these three residues.

On the other hand, K10, T11, and G14 have one single state andthe RDC fitting has a very minor effect on these residues,

Figure 4. Backbone (ψi-1, ji) dihedral angle distribution contour plots for residues G9, L12, A20, D22, N40, and G41 (panels A, C, E, G, I, and K,respectively). A grid of 10�� 10�was built, and the distribution was calculated based on the 5000MD snapshots from the 20 50-ns simulations for eachresidue. Panels B, D, F, H, J, and L are from the optimized MD snapshots, after fitting to the experimental RDCs. Each contour line represents an equalprobability of 0.003, and the grid points with a probability less than this threshold are not shown.

Table 2. Population of Minor State(s) from the Original andOptimized MD Snapshots after Fitting to the ExperimentalRDCs

residue Q2 G9 L12 K13 A20 D22 N40 G41

Pop_orga (%) 12.8 13.3 4.2 3.3 8.9 10.3 8.2 42.4d

Pop_fitb (%) n/ac 2.5 0.4 0.2 0.2 1.3 0.5 15.9aThe population computed from the 20 50-ns MD trajectories. bThecomputed population after fitting the trajectories to theexperimental RDCs. cThe fitting for Q2 was not performed due tothe absence of too many experimental RDCs. dThe minor states of G41include (ψ, j) centered around (40�, -75�) and (-50�, 50�)(Figure 4).

Table 3. Averages and Fluctuations of the Selected Backboneψi-1 and ui Angles in GB3 Derived from the Original andFitted MD Snapshots

original MD snapshots fitted MD snapshots

residue ψ (deg) j (deg) ψ (deg) j (deg)

Q2 142( 21 -95( 21 n/a n/a

G9 71( 31 -95( 37 64 ( 14 -83 ( 19

K10 174( 14 -69( 14 172 ( 12 -69 ( 13

T11 -28( 13 -127 ( 17 -34 ( 12 -120 ( 17

L12 -14( 39 -110( 43 -19 ( 21 -119 ( 27

K13 122( 26 -124( 24 126 ( 15 -125 ( 15

G14 144( 25 166( 31 141 ( 25 171 ( 30

A20 122( 31 -144( 26 130 ( 12 -150 ( 11

V21 161( 11 -88 ( 17 158 ( 10 -78 ( 11

D22 -19( 17 -142( 21 -23 ( 12 -150 ( 12

N40 123( 35 -129( 24 134 ( 15 -139 ( 16

G41 94( 56 -134( 59 112 ( 50 -150 ( 57

E56 129( 12 -95 ( 18 121 ( 9 -88 ( 18

3493 dx.doi.org/10.1021/jp108217z |J. Phys. Chem. B 2011, 115, 3488–3495

The Journal of Physical Chemistry B ARTICLE

especially in the dynamics aspect (Table 3). This suggests thatthe remaining 2162 snapshots only slightly alter the structureensemble of these residues which, on the other hand, confirmsthat the MD simulations sample well the (ψ, j) dihedral space,at least for these three residues. These results indicate that thedivergences of the N-H vector time correlation functions forresidues G9-G14 (Figure S1, Supporting Information) arelikely caused by visiting the minor states for G9, L12, and K13in the simulations.A20-D22. There are a total of 112 experimental RDCs available

for this fragment. After fitting, the error χ is reduced from 2.64 to1.90 Hz (Figure 5C and 5D) and 2205 snapshots remain. A20 andD22 show a two-state behavior (Figure 4E and 4G) with the minorstate populations close to 10% for both residues when computedfrom the original snapshots (Table 2). The fitting reduces thepopulations to 0.2% and 1.3% for A20 and D22, respectively, wherethe corresponding (ψ,j) fluctuations decrease from (31�, 26�) and(17�, 21�) to (12�, 11�) and (12�, 12�). Like K10, T11, and G14,(ψ, j) of V21 only has one state centered around (160�, -90�)(Figure S2, Supporting Information) and its j angle average andfluctuation change from -88� ( 17� to -78� ( 11�.D40-G41. By fitting to the 112 experimentally available RDCs

for this fragment, the χ value decreases from 3.36 to 1.41 Hz anda better correlation is achieved with the remaining 2164 snap-shots (Figure 5E and 5F). For D40, the (ψ, j) value changesfrom (123, -129) to (134, -139) and the corresponding fluctua-tion reduces from (35, 24) to (15, 16). The total populationdiminishes from 8.2% to 0.5% (Table 2) for the minor state,which disappears from the contour plot (Figure 4I and 4J). G41shows a different behavior due to the multistate (ψ, j) distribu-tion. The minor state population around (40�, -75�) decreasesfrom 39.9% to 10.3% but increases around (-50�, 50�) from2.5% to 5.6% (Figure 4K and 4L) and the (ψ, j) fluctuationsonly drop slightly.E56. The fitting of 15 RDCs for this residue filters out 2788

MD snapshots, and the computed RDCs from the remaining2212 snapshots agree much better with the experimental values,based on the reduction of χ value from 2.58 to 1.04 Hz(Figure 5G and 5H). The structural change of E56 appears to

be more pronounced than the dynamics, with an 8� decrease forthe averageψ and a 7� increase for the average j angle, while thefluctuation drops by 3� for ψ but remains unchanged for thej angle.In summary, fitting of the backbone RDCs significantly reduces

the populations of the minor states for residues displaying two-state behavior in different fragments. The dynamics of theseresidues, measured by the fluctuations of (ψ, j), decrease accord-ingly, and themajor states sampled by theMD simulations overallpredict RDCs that agree well with experimental data. For thoseresidues displaying only one single state, the changes of thedihedral (ψ, j) distribution generally become less dramatic.As stated in the Methods section, the proposed trajectory

filtering process is quite rudimentary. To confirm its correctness,quantities that are not involved in the trajectory filtering, includ-ing 3JHNHR, intraresidue RDCHNiHRi, and sequential RDCHNiH-

Ri-1, are calculated for residues 9-14, 20-22, 40-41, and 56using the original 5000 MD and optimized ∼2200 snapshotsseparately and compared to the experimental values (Figure 6).The rmsd between the experimental and predicted couplings forthe original MD snapshots is 1.66 and 1.12 Hz for the optimizedMD data, suggesting that the filtering process proposed in thiswork is effective and the identified minor states populations areoverestimated in the 1.1 μs MD simulations.Even though the population change of the minor state is quite

significant for a two-state behavior residue after the RDCs fitting,the effect on the free energy change of the minor state relative tothe major state is not as dramatic. For example, the reduction ofthe population of G9 minor state from 13.3% to 2.5% is equivalentto a free energy increase of 1.1 kcal/mol for the minor state (at300 K). This number is comparable to the rmsd value of 1.3 kcal/mol between Amber99sb and high-level QM energies for 51stable Ala3 conformers which were used for Amber99sb forcefield parametrization.6 In other words, the (ψ, j) distributionerror seen in this work is within the force field error. This statementis correct if the minor state populations estimated from RDCsfitting are accurate, which is not quite true when the populationsapproach zero. Unfortunately it is difficult to estimate the minorstate population error because of the approximations used in the

Figure 5. Correlations between the experimental and the predicted RDCs of G9-G14, A20-D22, D40-G41, and E56. The predicted RDCs werecalculated from the 5000 original MD snapshots (panels A, C, E, and G) and ∼2200 optimized snapshots (B, D, F, and H). The improvement of thefitting is due to the filtering of the MD snapshots, as described in detail in the text.

3494 dx.doi.org/10.1021/jp108217z |J. Phys. Chem. B 2011, 115, 3488–3495

The Journal of Physical Chemistry B ARTICLE

RDC fitting procedure, the existence of the different minor statepopulations for different residues and the fact that the fittedsnapshots ensemble does not necessarily encapsulate the en-semble in reality. To simplify this problem, we created a set of200 perfect backbone N-H, CR-HR, C0-CR, and C0-NRDCsfrom the optimum 2162 snapshots for G9-G14 (with 2.5% G9minor state population) using the experimentally determinedalignment tensors. A total of 100 sets of 200 RDCs with randomnoises of 1.0 Hz were fitted by systematically varying the G9minor state population to find theminimum χ value. The averageof the minor state population is 2.5 ( 1.4%, in agreement withthe preset number. The 1.4% standard deviation should betreated as a lower bound of the true population fitting error forG9, considering the simplifications used in the error analysis.Though it is clear that the dramatic drop of the minor statepopulations for almost all the residues listed in Table 2 stronglyproves that MD trajectories are biased toward nonnative states,we cannot tell whether it is force field error or insufficient samplingthat causes the artifact.To see whether the two-state behavior persists in other systems,

a total of 0.55 μs MD simulations were performed for anothersmall protein, ubiquitin. Unlike protein GB3, essentially no dis-tinctiveminor state was observed in the ubiquitinMD trajectoriesfor all the loop residues (Figure S4, Supporting Information).Residue K11 displays the largest (ψ, j) fluctuation (Table 4),with a broad distribution, but no two-state behavior is seen. A fewresidues such as T7-K11, I36, and G53 exhibit fluctuations inthe backbone N-H bond vector time correlation functionssimilar to those in protein GB3, suggesting sufficiently longMD trajectories should be utilized to yield the converged orderparameters for these residues. A recent work byMarkwick et al.41

is closely related to our study. There, an accelerated moleculardynamics (AMD) method was employed to simulate ubiquitin,and the acceleration parameters were adjusted to best matchexperimental and computational quantities, such as RDCs and Jcouplings. The MD trajectory reweighing method employed byMarkwick is very different from the selection procedure usedhere, which is essentially a post-MD snapshot processing method.

It has been suggested that an optimum sampling method such asAMD improves the agreement between the experimental andcomputational data.42 It will be interesting to see whether asignificant percentage of minor states still persists when runningMD simulation for protein GB3 with this improved samplingmethod.

’CONCLUDING REMARKS

A series of MD simulations was performed for protein GB3.The order parameters of backbone N-H bond vectors werecalculated and compared to the experimental values. The dy-namics for rigid regions agree well with the experimental data butare overestimated for certain residues, primarily in the flexibleregions that exhibit diverging time correlation functions in theMDdata analysis. For several of these residues, including Q2, G9,L12, K13, A20, D22, and N40, two-state behavior was observed.After fitting to experimental RDCs, the populations of the minorstates are significantly reduced, pointing to the fact that theprotein conformers sampled in MD simulations are biased. Onthe other hand, this two-state behavior was not seen in theubiquitin MD simulations, suggesting that the artifact dependson the specific molecular system studied.

Table 4. Averages and Fluctuations of the Selected ψi-1 andui Angles in Ubiquitin Derived from the MD Snapshots andX-ray Structure (pdb code 1ubq)

MD snapshots X-ray structure

residue ψ (deg) j (deg) ψ (deg) j (deg)

T7 135( 16 -100 ( 21 128 -99

L8 166( 11 -66( 10 171 -73

T9 -21( 13 -86( 18 -7 -101

G10 3( 17 84( 19 15 77

K11 8( 26 -95( 27 17 -96

V17 126( 11 -142( 11 121 -139

E18 160( 12 -107( 20 171 -120

P19 142( 16 -60( 9 145 -55

S20 -30 ( 12 -86( 14 -25 -80

D21 10( 16 -104( 25 -8 -71

T22 154( 11 -87( 14 148 -84

G35 -5( 11 76( 10 -6 81

I36 -13( 18 -74( 17 5 -80

A46 140( 10 49 ( 8 130 48

L50 144( 12 -102( 15 130 -80

E51 136( 18 -100( 22 138 -101

D52 144( 12 -68( 13 140 -49

G53 -35( 14 -79( 10 -42 -82

R54 4( 17 -113( 20 -9 -85

T55 153( 10 -85( 13 165 -104

L56 156( 8 -61( 9 165 -61

N60 -6( 10 53( 9 5 58

I61 43 ( 13 -83( 14 45 -89

Q62 131( 13 -105( 19 116 -103

K63 167( 10 -54( 10 170 -55

E64 147( 10 57( 7 143 67

S65 22( 12 -77 ( 15 19 -71

Figure 6. Correlation between experimental and computational data,including 3JHNHR, intraresidue RDCHNiHRi, and sequential RDCHNiH-

Ri-1 for residues 9-14, 20-22, 40-41, and 56. Filled symbols arepredicted values from 5000 MD snapshots with circles for 3JHNHR,triangles for RDCHNiHRi, and squares for RDCHNiHRi-1 while the emptysymbols are the corresponding values calculated from the ∼2200optimized snapshots. The rmsd is reduced from 1.66 to 1.12 Hz aftertrajectory optimization, proving the effectiveness of the method.

3495 dx.doi.org/10.1021/jp108217z |J. Phys. Chem. B 2011, 115, 3488–3495

The Journal of Physical Chemistry B ARTICLE

’ASSOCIATED CONTENT

bS Supporting Information. Two figures of N-H bondvector time correlation functions; three figures of the (ψ, j)distribution contour plots. This material is available free of chargevia the Internet at http://pubs.acs.org.

’AUTHOR INFORMATION

Corresponding Author*Phone 86 532 80662792. Fax: 86 532 80662778. E-mail:[email protected].

’ACKNOWLEDGMENT

We are thankful to the Supercomputing Center of the ChineseAcademy of Sciences for providing the computer resources andtime. This work was supported by 100 Talent Project (grant no.Y07102110Q), the Director Innovation Foundation of theQingdao Institute of Biomass Energy and Bioprocess Technol-ogy, CAS, and the Foundation for Outstanding Young Scientistin Shandong Province (no. BS2010NJ020).

’REFERENCES

(1) Nederveen, A. J.; Bonvin, A. J. Chem. Theory Comput. 2005,1, 363.(2) Maragakis, P.; Lindorff-Larsen, K.; Eastwood, M. P.; Dror, R. O.;

Klepeis, J. L.; Arkin, I. T.; Jensen, M. O.; Xu, H. F.; Trbovic, N.; Friesner,R. A.; Palmer, A. G.; Shaw, D. E. J. Phys. Chem. B 2008, 112, 6155.(3) Bui, J. M.; Gsponer, J.; Vendruscolo, M.; Dobson, C. M. Biophys.

J. 2009, 97, 2513.(4) Lange, O. F.; van der Spoel, D.; de Groot, B. L. Biophys. J. 2010,

99, 647.(5) Project, E.; Nachliel, E.; Gutman, M. J. Comput. Chem. 2010,

31, 1864.(6) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.;

Simmerling, C. Proteins Struct. Funct. Bioinf. 2006, 65, 712.(7) Buck, M.; Bouguet-Bonnet, S.; Pastor, R. W.; MacKerell, A. D.

Biophys. J. 2006, 90, L36.(8) Soares, T. A.; Daura, X.; Oostenbrink, C.; Smith, L. J.; van

Gunsteren, W. F. J. Biomol. NMR 2004, 30, 407.(9) Showalter, S. A.; Bruschweiler, R. J. Am. Chem. Soc. 2007,

129, 4158.(10) Wickstrom, L.; Okur, A.; Simmerling, C. Biophys. J. 2009,

97, 853.(11) Showalter, S. A.; Johnson, E.; Rance, M.; Bruschweiler, R. J. Am.

Chem. Soc. 2007, 129, 14146.(12) Showalter, S. A.; Bruschweiler, R. J. Chem. Theory Comput.

2007, 3, 961.(13) Gronenborn, A. M.; Filpula, D. R.; Essig, N. Z.; Achari, A.;

Whitlow, M.; Wingfield, P. T.; Clore, G. M. Science 1991, 253, 657.(14) Derrick, J. P.; Wigley, D. B. J. Mol. Biol. 1994, 243, 906.(15) Stone, M. J.; Gupta, S.; Snyder, N.; Regan, L. J. Am. Chem. Soc.

2001, 123, 185.(16) Meier, S.; Haussinger, D.; Jensen, P.; Rogowski,M.; Grzesiek, S.

J. Am. Chem. Soc. 2003, 125, 44.(17) Ulmer, T. S.; Ramirez, B. E.; Delaglio, F.; Bax, A. J. Am. Chem.

Soc. 2003, 125, 9179.(18) Hall, J. B.; Fushman, D. J. Biomol. NMR 2003, 27, 261.(19) Bouvignies, G.; Bernado, P.; Meier, S.; Cho, K.; Grzesiek, S.;

Bruschweiler, R.; Blackledge, M. Proc. Natl. Acad. Sci. U.S.A. 2005,102, 13885.(20) Yao, L.; Vogeli, B.; Torchia, D. A.; Bax, A. J. Phys. Chem. B 2008,

112, 6045.(21) Loccisano, A. E.; Acevedo, O.; DeChancie, J.; Schulze, B. G.;

Evanseck, J. D. J. Mol. Graphics Modell. 2004, 22, 369.

(22) Daggett, V. Curr. Opin. Struct. Biol. 2000, 10, 160.(23) Lipari, G.; Szabo, A. J. Am. Chem. Soc. 1982, 104, 4546.(24) Lipari, G.; Szabo, A. J. Am. Chem. Soc. 1982, 104, 4559.(25) Case, D. A. Acc. Chem. Res. 2002, 35, 325.(26) Yao, L. S.; Grishaev, A.; Cornilescu, G.; Bax, A. J. Am. Chem. Soc.

2010, 132, 4295.(27) Hall, J. B.; Fushman, D. J. Am. Chem. Soc. 2006, 128, 7855.(28) Yao, L. S.; Vogeli, B.; Ying, J. F.; Bax, A. J. Am. Chem. Soc. 2008,

130, 16518.(29) Case, D. A. J. Biomol. NMR 1999, 15, 95.(30) Yao, L. S.; Bax, A. J. Am. Chem. Soc. 2007, 129, 11326.(31) Case, D. A.; Darden, T. A.; Cheatham, T. E., III; Simmerling,

C. L.; Wang, J.; Duke, R. E.; Luo, R.; Crowley, M.; Walker, R. C.; Zhang,W.; Merz, K. M.; Wang. B.; Hayik, S.; Roitberg, A.; Seabra, G.;Kolossv�ary, I.; Wong, K. F.; Paesani, F.; Vanicek, J.; Wu, X.; Brozell,S. R.; Steinbrecher, T.; Gohlke, H.; Yang, L.; Tan, C.; Mongan, J.;Hornak, V.; Cui, G.; Mathews, D. H.; Seetin, M. G.; Sagui, C.; Babin, V.;Kollman, P. A. AMBER 10, University of California, San Francisco,2008.

(32) Berendsen, H. J. C.; Postma, J. P. M.; Vangunsteren, W. F.;Dinola, A.; Haak, J. R. J. Chem. Phys. 1984, 81, 3684.

(33) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.;Pedersen, L. G. J. Chem. Phys. 1995, 103, 8577.

(34) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. J. Comput. Phys.1977, 23, 327.

(35) Tolman, J. R. J. Am. Chem. Soc. 2002, 124, 12020.(36) Tolman, J. R.; Ruan, K. Chem. Rev. 2006, 106, 1720.(37) Bax, A.; Kontaxis, G.; Tjandra, N. InNuclearMagnetic Resonance of

Biological Macromolecules, Partt B; Academic Press Inc: San Diego, 2001;Vol. 339, p 127.

(38) Vogeli, B.; Yao, L. S.; Bax, A. J. Biomol. NMR 2008, 41, 17.(39) Vijaykumar, S.; Bugg, C. E.; Cook, W. J. J. Mol. Biol. 1987,

194, 531.(40) Koller, A. N.; Schwalbe, H.; Gohlke, H. Biophys. J. 2008, 95, L4.(41) Markwick, P. R. L.; Bouvignies, G.; Salmon, L.; McCammon,

J. A.; Nilges, M.; Blackledge, M. J. Am. Chem. Soc. 2009, 131, 16968.(42) Markwick, P. R. L.; Bouvignies, G.; Blackledge, M. J. Am. Chem.

Soc. 2007, 129, 4724.