evaluation of elastic rod models with long range yuriy v ... term is attractive, and yields the same...

22
867 Journal of Biomolecular Structure & Dynamics, ISSN 0739-1102 Volume 27, Issue Number 6, (2010) Current Perspectives on Nucleosome Positioning ©Adenine Press (2010) * Phone: 1-504-862-8391 Fax: 1-504-862-8392 E-mail: [email protected] Yuriy V. Sereda Thomas C. Bishop* Center for Computational Science Tulane University New Orleans, LA 70118, USA Evaluation of Elastic Rod Models with Long Range Interactions for Predicting Nucleosome Stability http://www.jbsdonline.com Abstract The ability of a dinucleotide-step based elastic-rod model of DNA to predict nucleosome binding free energies is investigated using four available sets of elastic parameters. We com- pare the predicted free energies to experimental values derived from nucleosome reconstitu- tion experiments for 84 DNA sequences. Elastic parameters (conformation and stiffnessess) obtained from MD simulations are shown to be the most reliable predictors, as compared to those obtained from analysis of base-pair step melting temperatures, or from analysis of x-ray structures. We have also studied the effect of varying the folded conformation of nucleosomal DNA by means of our Fourier filtering knock-out and knock-in procedure. This study confirmed the above ranking of elastic parameters, and helped to reveal prob- lems inherent in models using only a local elastic energy function. Long-range interactions were added to the elastic-rod model in an effort to improve its predictive ability. For this purpose a Debye-Huckel energy term with a single, homogenous point charge per base- pair was introduced. This term contains only three parameters, - its weight relative to the elastic energy, the Debye screening length, and a minimum sequence distance for includ- ing pairwise interactions between charges. After optimization of these parameters, our Debye-Huckel term is attractive, and yields the same level of correlation with experiment (R = 0.75) as was achieved merely by varying the nucleosomal shape in the elastic-rod model. We suggest this result indicates a linker DNA - histone attraction or, possibly, entropic effects, that lead to a stabilization of a nucleosome away from the ends of DNA segments longer than 147 bp. Such effects are not accounted for by a localized elastic energy model. Key words: Nucleosomal DNA; Elastic energy; Debye-Huckel electrostatics; Free energy; Correlation. Introduction We study the organization of DNA into nucleosomes, which is the primary stage of compaction for meters of DNA into a 3D chromatin structure that exists in the cell nucleus. The nucleosome structure contains a left-handed super helix of double- stranded DNA wrapped around a core of eight histones. Even 35 years after the initial discovery of the nucleosome core particle (1) and 12 years since its crystal- lization (2), questions remain about nucleosome energetics and the arrangement of nucleosomes into chromatin. Recently, whole genome nucleosome positioning maps have become available (3 - 12). Thus, there is high demand for computationally inexpensive procedures to predict and understand nucleosome positioning and stability on genomic scales. Such analysis is complicated by the fact that even the most precise nucleosome posi- tioning data are difficult to interpret, in part due to fuzziness of actual nucleosome positions, but also in part due to unknown biologic effects. Open Access Article The authors, the publisher, and the right holders grant the right to use, reproduce, and disseminate the work in digital form to all users.

Upload: builiem

Post on 27-May-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

867

Journal of Biomolecular Structure & Dynamics, ISSN 0739-1102 Volume 27, Issue Number 6, (2010) Current Perspectives on Nucleosome Positioning ©Adenine Press (2010)

*Phone: 1-504-862-8391Fax: 1-504-862-8392E-mail: [email protected]

Yuriy V. Sereda Thomas C. Bishop*

Center for Computational Science

Tulane University

New Orleans, LA 70118, USA

Evaluation of Elastic Rod Models with Long Range Interactions for Predicting Nucleosome Stability

http://www.jbsdonline.com

Abstract

The ability of a dinucleotide-step based elastic-rod model of DNA to predict nucleosome binding free energies is investigated using four available sets of elastic parameters. We com-pare the predicted free energies to experimental values derived from nucleosome reconstitu-tion experiments for 84 DNA sequences. Elastic parameters (conformation and stiffnessess) obtained from MD simulations are shown to be the most reliable predictors, as compared to those obtained from analysis of base-pair step melting temperatures, or from analysis of x-ray structures. We have also studied the effect of varying the folded conformation of nucleosomal DNA by means of our Fourier filtering knock-out and knock-in procedure. This study confirmed the above ranking of elastic parameters, and helped to reveal prob-lems inherent in models using only a local elastic energy function. Long-range interactions were added to the elastic-rod model in an effort to improve its predictive ability. For this purpose a Debye-Huckel energy term with a single, homogenous point charge per base-pair was introduced. This term contains only three parameters, - its weight relative to the elastic energy, the Debye screening length, and a minimum sequence distance for includ-ing pairwise interactions between charges. After optimization of these parameters, our Debye-Huckel term is attractive, and yields the same level of correlation with experiment (R = 0.75) as was achieved merely by varying the nucleosomal shape in the elastic-rod model. We suggest this result indicates a linker DNA - histone attraction or, possibly, entropic effects, that lead to a stabilization of a nucleosome away from the ends of DNA segments longer than 147 bp. Such effects are not accounted for by a localized elastic energy model.

Key words: Nucleosomal DNA; Elastic energy; Debye-Huckel electrostatics; Free energy; Correlation.

Introduction

We study the organization of DNA into nucleosomes, which is the primary stage of compaction for meters of DNA into a 3D chromatin structure that exists in the cell nucleus. The nucleosome structure contains a left-handed super helix of double-stranded DNA wrapped around a core of eight histones. Even 35 years after the initial discovery of the nucleosome core particle (1) and 12 years since its crystal-lization (2), questions remain about nucleosome energetics and the arrangement of nucleosomes into chromatin.

Recently, whole genome nucleosome positioning maps have become available (3 - 12). Thus, there is high demand for computationally inexpensive procedures to predict and understand nucleosome positioning and stability on genomic scales. Such analysis is complicated by the fact that even the most precise nucleosome posi-tioning data are difficult to interpret, in part due to fuzziness of actual nucleosome positions, but also in part due to unknown biologic effects.

Open Access ArticleThe authors, the publisher, and the right holders grant the right to use, reproduce, and disseminate the work in digital form to all users.

868

Sereda and Bishop

There are numerous models of nucleosome positioning (13 - 19), but the model of Scipioni et al., (20) is one of the few models of nucleosome stability. This model was developed using 84 DNA sequences for which experimental values of binding free energy, ∆Gb are known (14, 15, 20 - 32). This model achieves a cor-relation between predicted and experimental values of ∆Gb of R 5 0.62 if only elastic energy is considered and R 5 0.92 when an additional long range term is included. We prefer the in vitro data on binding free energy over data obtained for whole genome nucleosome positioning studies in vivo, because the latter includes additional effects such as the concentration and identity of DNA-binding pro-teins and restrictions imposed by higher-order chromatin structure (33). To avoid these unknown biologic effects we focus our attention on nucleosome stability, as opposed to nucleosome positioning.

The first experimental studies of the nucleosome stability were performed by Shrader and Crothers (21) using the method of competitive nucleosome reconsti-tution. Artificial DNA sequences containing segments consisting exclusively of A and T or G and C, separated by 2 bp, were shown to be able to mimic the binding energy of natural nucleosome positioning sequences. A TG pentamer containing CGG in-phase (separated by the DNA helical repeat distance, approximately 10 bp) with its complement GCC and TTA in-phase with TAA was shown to bind histone octamer even stronger than natural DNA. However, the modest binding strengths for natural sequences compared to those of artificial sequences (21, 22, 29, 30) is a biologic necessity that allows nucleosomes to move or be removed as required by various mechanisms. In (22), nucleosomes stabilized by short CTG repeat segments were associated with several human genetic diseases and are con-sidered direct evidence of the important role of nucleosome stability for biologic mechanics. Sequence patterns with a 10 bp interval between preferred dinucle-otide steps were first identified in bulk nucleosomal DNA by Trifonov (34). Sta-tistical analysis of the nucleosomal DNA fragments from C. elegans has further established the general importance of 10 bp repeats of pyrimidines and purines, formulated qualitatively as a consensus nucleosomal DNA sequence pattern (35) and quantitatively as a table of probabilities for each of the 16 dinucleotide steps at each of the 10 locations on a given DNA helix repeat (36). The stability and positioning activity of different base pairs was tested by Fernandez and Anderson (37) using single base pair substitution of the permanganate sensitive TA step. They demonstrated that TA pairs at 10 bp intervals also have a strong positioning signal, see also (38, 39).

Simple models have been proposed to capture the structural and conformational properties that relate to the above observa-tions. One of the first elastic models of DNA appeared in (40, 41) without twist and with the assumption of the independence of local bends on neighboring base-pairs. Simple formulae for the persistence length and the characteristic ratio were obtained and applied in (40). Potential energy calculations with empiri-cal parameters for the bending of DNA were proposed to ana-lyze DNA bending anisotropies without kinks (42), and with kinks (43).

Here we analyze the ability of an elastic-rod model of DNA to predict nucleosome stability using various sets of avail-able elastic parameters. A schematic representation of the free energy cycle to be considered is presented in Figure 1. The figure illustrates how we derive the difference in binding free energies

∆ ∆ ∆∆ ∆ ∆G G G G Gb b b n f2 1=− ≡ − [1]

Figure 1: Free Energy Cycle. The free energy cycle for two different sequences of DNA binding to a histone octamer. Different sequences of DNA (top and bot-tom) have different equilibrium conformations when free in solution. They also have an internal energy difference, ∆Gf. We assume the histone octamer (middle) is the same for all nucleosomes. The binding energy of DNA with histones is indi-cated by ∆Gb, and the free energy difference of the two nucleosomes by ∆Gn.

869

Elastic Rod Models and Long Range Interactions in

Nucleosome

for two DNA sequences with the same histone octamer core (H2A-H2B)(H3-H4)2(H2A-H2B). ∆Gn is the difference in free energy between the two nucleosomes, and ∆Gf is the difference in free energy between the two DNA mol-ecules in solution. The free energies must account for the self energies of the DNA and the histones, the interaction energy between DNA and histones, and entropy. Assuming that the cores are equivalent, i.e., histones are not modified and the sol-vation of nucleosome is independent of differences in solvation energy for different DNA basepairs, then two components of the free energy will dominate: the histone-DNA interactions and the changes in DNA self energy associated with deforming DNA from its equilibrium conformation to its conformation in the nucleosome. The histone-DNA interactions and gross conformation of nucleosomal DNA are deter-mined mostly by interactions between the DNA backbone and positively charged amino acids (44). Both are largely independent of DNA sequence. This sequence independence is suggested by the crystal structures, and is important for the bio-logic necessity of folding an arbitrary sequence into a nucleosome (45). However differences in DNA conformation exist even in high resolution x-ray structures with nearly identical sequences (1). Sequence dependent variations in the detailed but not the gross structure of the nucleosome super helix were demonstrated to enhance the predictive ability of a recent model of nucleosome positioning (19).

Thus only as a first order approximation can we expect the conformation of nucleosomal DNA and its interactions with the histone core to be conserved. Even with such simplifying assumptions the conformational dependence of the DNA self-energy is difficult to predict because of the complex material properties of DNA and the fact that DNA is highly charged. Nonetheless, if the intrinsic con-formational properties of a given DNA sequence match those of the nucleosome (i.e., the DNA is pre-bent or pre-sheared) or if the sequence exhibits the appropriate flexibilities, then the energy penalty will be lower than for a sequence which does not possess such characteristics.

Here, we investigate the applicability of an elastic-rod DNA model for the descrip-tion of the free energy of nucleosome reconstitution for various DNA sequences. This model allows a high-throughput study of the nucleosomal positioning and stability.

Methods

Two representations of DNA are used in our energy calculations, an all atom rep-resentation and a coarse-grained representation. In all cases the energies depend on the conformation of the DNA and the parameterization of the energy functions. The all atom model energy, Emm, utilizes a standard molecular mechanics force field. The coarse-grain model includes an elastic, Ee, and an electrostatic, Edh, component. The latter accounts for the charged nature of DNA with a Debye-Huckel term. Below are details of how DNA conformation is represented in a local coordinate reference system using the so-called DNA helical parameters. This representation is used in the elastic energy function. The electrostatic energy and molecular mechan-ics energies require a Cartesian coordinate representation of DNA. The conversion between local and Cartesian representations is reversible and is extremely fast using the algorithm in (46). A number of software tools are readily available for this purpose (47 - 49). Below we describe these two representations of DNA, our Fourier filtering, and the energy functions employed. We conclude this section by describing how to determine the correlation between these energies and experimental values of ∆∆Gb.

DNA Conformation

For the coase-grain model of DNA the base pairs are treated as rigid planar bodies. Their relative positions are expressed using the six inter-basepair helical parameters, - three translational (shift, slide, and rise), and three rotational (tilt, roll, and twist) (47, 50). We disregard variations of intra-basepair helical parameters as these contribute little to

870

Sereda and Bishop

gross shape of nucleosomal DNA (51). The equilibrium conformation of DNA free in solution is described by a list of helix parameters denoted HP0. The size of the list is 63nbps where nbps denotes the number of basepair steps (one less than the number of basepairs). Assuming only nearest neighbor effects, there are only 16 values found in any HP0, one for each type of step: a 5 {1,2,...,16} corresponding to AA through TT. If directional symmetry is enforced, i.e., listing HP0 from 59 to 39 is equivalent to a 39 to 59 listing, then there are only 10 unique values in any HP0 corresponding to the 10 unique steps. Similarly, the conformation of nucleosomal DNA is denoted by the list HP, has size 63146, and can be extracted from any of the x-ray structures. This list is regarded to be sequence independent since the conformation of nucleosomal DNA is similar in all available nucleosome structures in the protein databank (www.rcsb.org). We utilize an HP list extracted from the highest resolution structure, 1kx5 (52) with the analysis package 3DNA (47). However, it is important to realize that for the nearly 30 crystal structures available in the protein databank the DNA sequence is almost identical. For 1kx5 the sequence is NCP147. Yet differences in the HP lists have been identified by our Fourier filtering procedure (51). In the current study we use this filtering strategy to systematically vary the conformation of the nucleosome.

Given a list of helical parameters, e.g., HP, we can represent it with a finite Fourier series:

HP(s) A= ( )=0/2 2 /Σk

nbps iks nbpsk e− p [2]

where A is an array of six complex amplitudes, one for each of the helix param-eters; 0snbps; and k is the wavenumber. For our nucleosomes there are 74 pos-sible wavenumbers: 73 represent wavelengths, nbps/k, ranging from nbps to 2 bp and k50 corresponds to the average value of the list.

Given such a representation we define a Fourier knock-out, KOj, as the list of heli-cal parameters from which one Fourier component has been eliminated and a Fou-rier knock-in, KOl, as the list of helical parameters obtained using a subset, denoted l, of the all possible Fourier components:

KO s A k e KI s A k ej k jnbps iks nbps

l k liks nb( ) = ( ) ( ) = ( )/2 2 / 2 /Σ Σ≠

−∈

−p p pps

ijs nbpsHP s A j e

.

= ( ) ( ) 2 /− − p [3]

As indicated in Eq. [3], KOj has all variations associated with wavenumber j in the helical parameter data removed. The wavenumbers included in KIl can be any sub-set of the possible wavenumbers. If all wavenumbers are included in the list then KI745HP and a Cartesian model of DNA built from KI74 will be nearly equivalent to that of the initial nucleosomal DNA.

In a previous study (51) we devised a two stage filtering strategy that allowed us to establish the order of importance of each of the Fourier components in describing the nucleosome superhelix. We thus obtained a series of KIl containing from 1 to 74 wavenumbers, that correspond to a series of DNA models which monotonically converge from a straight DNA to that observed in the nucleosome. We demon-strated that only 12 Fourier components are necessary and sufficient to achieve a high resolution representation of the nucleosome super helix. We label this model KI12 and assign it special attention since it represents a model of the nucleosome superhelix that is within 3 Å RMSD of the highest resolution x-ray structure and for which there are no unnecessary distorsions of the DNA. KI12 is thus a smoothed representation of the nucleosome super helix that is within thermal motion (3 Å) of the x-ray structure.

We expect that the same 12 wavenumbers will be crucial for a correct description of energy and that the elastic energy will rapidly converge with an increasing number

871

Elastic Rod Models and Long Range Interactions in

Nucleosome

of knock-ins since the RMSD of the reconstructed DNA rapidly converges as more knock-ins are added.

To generate the HP list corresponding to the ideal super helix employed by Anselmi et al., (53) we use a torsion-helix representation parameterized to match the stated pitch, radius and torsion, as opposed to the shear-helix, see (51) for further details. In order to convert a list of helix parameters to a one-point per basepair model we use the algorithm in (46) and for conversion to all atom models we utilized 3DNA (47). In the one-point model each point represents the centroid of a basepair. This point is used in our determination of the electrostatic energy for the coarse-grained model. Visualization and additional analysis is performed using VMD software (54) and the Virtual DNA Viewer plug-in (49).

Molecular Mechanics Energy Model, Emm

All molecular mechanics (MM) energy calculations utilized the molecular dynam-ics program package NAMD 2.6 (55) and the Amber force field parm99 (56) with Barcelona corrections parmbsc0 (57). All systems were modelled in vacuum and without histones thus by molecular mechanics standards these are to be consid-ered crude calculations. In all cases significant external restraints were imposed that were designed to maintain the gross structure of the system and only a short minimization procedure was employed. Two types of systems were created. Small systems containing only individual dinucleotide steps and large systems contain-ing 147 bp corresponding to the sequence of DNA found in the x-ray structure 1kx5. The molecular mechanics energies can be grouped into non-bonded long range energies (van der Waals and electrostatics) and bonded local energies (bonds, angles and torsions).

The small systems were designed to enable us to characterize the elastic parameters associated with each of the 16 possible dinucleotide steps at each of the 146 step positions within a nucleosome. For each base pair position we created all atom models of each of the 16 possible dinucleotide steps using the helical parameters obtained from PDB entry 1kx5 and also from its smoothed variant KI12. In total, 146 3 16 3 2 5 4672 systems were created. This study was conducted to determine whether such crude, yet cheap, molecular mechanics simulations of DNA in vac-uum might correlate with the experimental free energies and to determine which, if any, of the molecular mechanics energies correlate with the elastic energies.

The large systems were designed to enable us to investigate the energy landscape associated with the folding of DNA as determined using our Fourier knock-out and knock-in models. A total of 74 3 2 5 148 systems all with the same 147 bp long sequence were created. This study was conducted to determine whether the elastic and molecular mechanics energies exhibit similar convergence properties as the knock-in conformations converge to the crystal structure. Since the geometry converges with only 12 Fourier modes our expectation was that both the elastic and molecular energies would also converge to their respective crystal-structure values after inclusion of approximately the same number of Fourier knock-ins. By comparing individual terms in the molecular mechanics energies to the elastic rod model we expect to gain insights as to the suitability of a linear elastic approxima-tion and the role of long range versus local interactions.

Elastic Energy Model, Ee

Our elastic energy model of DNA is a nearest-neighbor approximation that employs the linear elastic approximation:

E s p pei s

s nbps

i

T

i i( ) =1

2( ) ( ( ( ) (

=

1

0 0

+ −

∑ −[ ] −[ ]HP HP K HP HP ) ) ) [4]

872

Sereda and Bishop

Here s denotes the starting position of the nucleosome footprint in a sequence of length L, p 5 i - s 11 is position within the nucleosome, ai corresponds to the type of basepair step at position i, and K is a 636 matrix representing sequence specific stiffnesses. The lists HP and HP0 represent the conformation of nucleosomal and free DNA, respectively, in terms of the helical parameters as described earlier. Given this expression the energy of DNA free in solution is by definition zero.

The stiffness, K, describes the energetic couplings between the helical param-eters. By definition, each stiffness matrix is symmetric. Additional symmetries are related to the equivalence of the elastic energy Eq. [4] for the six couples of complementary basepair steps and four self-complementary basepair steps CG, GC, AT, and TA. The latter must have shift = tilt = 0, and force constants Kij5 0 for the coupled terms which change sign upon strand reversal. For additional symmetry restrictions see (58).

In general, Ee is a function of: HP, the nucleosome conformation; HP0, the free DNA conformation; and K, the stiffnesses. HP0 and K are usually derived simul-taneously from an experiment or theory while HP is independent. We thus refer to HP0 and K as a set of elastic parameters. A primary goal of this study is to evaluate different sets of elastic parameters, labeled ABC, OP, OB, and SA. The ABC set is derived from molecular mechanics calculations by the Ascona B-DNA consor-tium (59). This set does not satisfy the symmetry conditions required for a proper description of DNA but in all cases the symmetry violations are small quantities. The OP and OB sets are derived from analysis of x-ray structures (60). OB utilizes only x-ray structures of B-form DNA while OP includes structures of protein-DNA complexes. Both of these sets satisfy the required symmetries.

The elastic model in (53) employs a different energy functional than the one described above. This model is based only on curvature, i.e., it is a shear-free model with curvature calculated according to the algorithm in (61). Nonetheless, a parameter set that contains only the roll, tilt, and twist components in Eq. [4] can be obtained from their model. This parameter set is labelled SA. The stiffnesses in this set are derived from observations of DNA melting and therefore represent an entirely different physical basis for determining stiffnesses than employed in the other sets.

We investigate all three components of the elastic model by systematically varying HP using our Fourier filtering strategy, by combining the elastic parameters HP0 and K from the different sources, and by considering the symmetrized values of K.

Electrostatic Energy Model, Edh

Since DNA is highly charged, we assign homogenous point charges, qi, to the centroid of each basepair and utilize a Debye-Huckel electrostatic energy model to account for interactions that our nearest neighbor elastic rod model does not capture:

Eq q

r

rdh

i j i n

i j

ij

ij=1

4.

0pe e l∑ ∑≥ +

exp [5]

We create two different implementations by changing the range of base pairs included in the summations: a simple model and a complete model. In the simple model the summations in Eq. [5] only extend over the footprint of the nucleosome. In the complete model the summations extend over the entire length of the DNA under consideration (see additional details below). The model has three free param-eters: the Debye screening length λ, the magnitude of qi, and n. Since qi is constant it serves as a scaling factor that adjusts the weight of Edh relative to Ee. It also

contains the dielectric constant ε and a dimensional factor e2

04pe = 331.841 kcal/mol

873

Elastic Rod Models and Long Range Interactions in

Nucleosome

when the energy is measured in kcal/mol and distances in Å. The Debye screening length, λ, formally accounts for the concentration of ion species in solution. The number of neighboring basepairs to exclude from the energy calculation is deter-mined by n. The reason for excluding some number of neighboring basepairs from the charge-charge interaction is that the elastic energy term already accounts for the local charge-charge interactions when DNA is near its equilibrium conforma-tion. However when DNA is significantly distorted from equilibrium or achieves self-contact between basepairs that are far separated in sequence space the elastic rod model does not capture these non-local interactions. It is unclear even for a dinucleotide model what n should be since the elastic parameters account for local electrostatic effects.

In the simple model only 147 bp segments are included in Edh regardless of the length of the DNA sequence. Thus for a sequence of DNA of length L there will be L – nbps possible values of Edh associated with the Cartesian coordinate representa-tion of a given HP0. One for each nbps long subsequence. Edhfor the nucleosome is constant for any given HP and cancels out in all free energy calculations for a given choice of HP. In the simple model the sequence dependence of Edh is entirely dependent on the conformation of free DNA, HP0. This model is designed to account for length scales beyond nearest-neighbor but it does not account for long-range interactions arising from DNA self-interaction or between regions of linker DNA as it enters and exits the nucleosome. The simple model was motivated by the averaging procedures utilized in (53). In this model we varied only two parameters - the relative weight a and the screening length l.

In the complete model the summations in Edh extend over the entire length of DNA, L. The complete model thus includes long range sequence-dependent conforma-tional effects associated with free DNA, as well as, contributions between linker DNA entering and exiting the nucleosome. In the complete model both the free DNA conformation and the nucleosomal DNA conformation contribute to ∆G. In the complete model, the value of Edh for free DNA has a single value for any given sequence and depends on the choice of HP0. The value of Edh for the nucleosome conformation includes linker-linker interactions and assumes L–nbps values. It is strongly dependent upon the location of the nucleosome along L, the conformation of the free linkers and the conformation of the nucleosome itself. If the nucleosome is located at one end of L there is only a single entry or exit linker segment but not both. The linker-linker interactions become zero by definition. We assume the linker conformation to be the same as the free DNA conformation thus linker self-interaction contributions to Edh cancel in the free energy analysis. If the nucleosome is centrally located on L each linker is (L–nbps)/2 basepairs long. The linker-linker interactions are maximal and depend on the conformation of nucleosome and free-DNA. If the interactions are attractive the result is a bias of nucleosome positioning towards the central region. If the attractions are repulsive, as we expect for linker-linker interactions, the bias is away from the central region.

Since the complete model is the most computationally expensive, we utilized only the HP obtained from 1kx5 and the ABC set of elastic parameters because this choice produced the highest correlation between elastic-rod and experimental val-ues of ∆∆G. For this choice of elastic parameters we optimized unknown λ, n, and qi. We systematically vary the values of each of the free parameters and observe the correlation between the model predictions and the experimental free energy values. The computational cost of the elastic plus complete electrostatic model is as follows. We consider 84 DNA sequences, (157 to 459)2/2 pairs of electrostatic interactions per sequence, and 2 states - initial and final, with 15 to 313 possible nucleosome positions per sequence. This yields approximately 2.46.108 pairwise electrostatic energy calculations for each fixed value of the three free parameters a, λ, n, in Eq. [5]. Optimization of a requires around 200 samples for a given n and λ. To make the reliable conclusions, we considered 30 representative values of λ and 20 values

874

Sereda and Bishop

of n. Our programs for computing the correlation pre-compute and store the inde-pendent energy terms to achieve a fast optimization of the weight parameters in their linear combinations.

Comparison to Experiment

To compare the above energy values to ∆∆Gb we first combine the elastic and elec-trostatic energies:

E E i E ii e dh= ( ) ( )+ , [6]

where i represents the L–nbps potential locations of the nucleosome containing nbps basepair steps on a length of DNA, L. We then calculate the Boltzmann aver-age of the differences between free and nucleosome conformations, ∆Ei, over all possible locations i:

∆∆

G RT Z ZE

RTi

L nbpsi= , =

=1

− ( ) −

∑ln exp [7]

The correlation between the resulting values of ∆G and experimental values is determined by means of the Pearson product-moment correlation coeffi-cient (see Table I). For these calculations it is convenient to convert the energy according to 1 kcal/mol = 503.219 R [K], where R is the gas constant. So that, at T = 300 K we use 1 kcal/mol = 1.677 RT. The collection of experimen-tal ∆∆Gb values was taken from Ref. (20) in order to rank the elastic model SA together with the other three elastic models ABC, OB, and OP in a consistent way. Range of ∆∆Gb is [-2.2, 3.8] kcal/mol relative to the ∆∆Gb of TG pentamer sequence (21, 24). So that, the maximal difference in free energy between vari-ous sequences is 6 kcal/mol = 10 RT, and at T = 300 K the relative histone protein binding affinity is P = exp(–10) = 4.3.10–5.

The energy required to fold DNA into a nucleosome at a particular location i is given by the difference between final energy of the folded DNA En and the initial free DNA energy Ef. Nucleosome energies ∆Ei are weighted by the relative prob-abilities Pi for different nucleosome locations,

∆∆

E E E PZ

E

RTi f i n i ii= =

1., ,− +

, exp [8]

The Pi ∆Ei gives the statistical nucleosome positioning score, which quantifies the significance of the deviation of the energy score at a given position i of the nucleosomal template of nbps+1 bp on the DNA sequence from the mean score, - the partition function Z which accounts for all possible states of a nucleosome on mononucleosomal DNA fragment.

Table ICorrelation of theoretical and experimental free enegies.

Model 1kx5 KO14 KI12 SA Optimal

Emm MM .47 - .26 - -

Ee

ABC .65 .74 .59 .48 .74 OB .55 .53 .60 .55 .60 OP .33 .29 .27 .34 .40 SA - - - .62 -

Ee + Edh

ABC + DH simple .69 .76 .65 .56 .76 OB + DH simple .55 .53 .60 .56 .60 OP + DH simple .53 .47 .41 .34 .53 DH complete .43 - - - .60 ABC + DH complete .75 - - - -

875

Elastic Rod Models and Long Range Interactions in

Nucleosome

Results

Comparison of Molecular Mechanics and Elastic Rod Energies

In this section we directly compare our all-atom molecular mechanics, Emm, and elastic rod, Ee, based models to each other and to experimentally determined free energies ∆∆Gb. Though the in vacuo ∆Gmm free energies have poor correlation (R = 0.47) with experimental values for the 84 sequences, we consider this a surpris-ingly good result given the expected difficulties with our cheap Emm calculations. This observation also provides a metric for how much we can learn from comparing Emm and Ee. Naïve application of the elastic rod model using the ABC set of elastic parameters and nucleosome geometry 1kx5 provides a correlation of 0.65 with the experimental values. The total elastic energy obtained for sequence NCP147, the ABC parameters and 1kx5 nucleosome geometry is 417 kcal/(mol·nuc) or 4.8 kbT/bps. If KI12 is used for the nucleosome geometry Ee is 151 kcal/(mol·nuc) or 1.7 kbT/bps. These results agree well with physical expectations. For a simple homog-enous model of DNA approximately 2 kbT/bps is needed to wrap DNA around the histone octamer. Thermal motion adds an additional 3 kbT/bps since we have 6 degrees of freedom in our model. The OB and OP sets provide correlations of 0.55 and 0.33, respectively and the energies do not allow for the same physical interpretation. The results for ABC are particularly interesting as there are no free parameters and the correlation is as good as the SA elastic model (correlation 0.62). The latter was specifically optimized for this set of experimental data.

Emm and Ee Versus Position in the Nucleosome

In Figure 2, we consider the correlation between Ee and Emm for all possible step types at each of the 146 positions in the nucleosome. Three sets of the elastic parameters, ABC, OB, and OP, and the nucleosomal DNA geometry 1kx5 are considered here. In Supple-mental Data we also consider the six combinations of elastic parameters obtained by combining HP0 and K from the different sources, the effects of our knock-ins and knock-outs, and a basepair step specific selection procedure for the combined elastic sets.

We find that Ee correlates with the total Emm better than with any single bonded or long-range component. The lowest correlations are obtained between Ee and Emm for the CA, GT, and TG steps (R = 0.38) and CT step (R = 0.39). The best correla-tions are between AA (R = 0.57) and TT (R = 0.52) steps. In most cases, the devia-tions between Ee and Emm have ~10 bp periodicity. Some steps, namely GC and GT, exhibit a ~5 bp periodicity. These results provide some indication of which steps are modelled well by the elastic approximation and which are not modelled so well. However, the different periodicities observed indicate that we cannot simply consider the correlation across all possible steps. There are position and orienta-tion specific effects that also affect the correlations above. (In the Supplemental Data we further consider sequence specific effects by systematically elementing sequences based on their sequence content and then determining the correlation between Ee predictions of ∆∆Gb and experimental values.)

By considering the individual components of Emm (data not shown) we find that the lowest correlations are obtained between the electrostatic term in Emm and Ee. These correlations are all negative and range from – 0.65 for AT steps to – 0.18 for CG steps. We, therefore, expect that addition of an electrostatic energy component to the elastic model is needed to improve its description of the energy distribution across a nucleosome.

Emm and Ee as a function of KI

In Figure 3 we consider the correlation between Ee and Emm as a function of nucleosomal geometry using our knock-in models of the nucleosome (51) and the

876

Sereda and Bishop

NCP147 sequence. Data for each of the elastic parameter sets ABC, OB and OP are as indicated by solid lines. We consider the other data in Figure 3 later. The knock-ins are constructed so as to monotonically converge to the geometry of the nucleosome as determined by RMSD. Thus each successive knock-in in Figure 3 more closely resembles the 3D conformation of 1kx5. As reported in (51), the RMSD between the 1kx5 conformation and KI12 is less than 3Å. The first few knock-ins have significant steric overlap, see for example the molecular image labelled KI1 in the bottom of Figure 4. Thus Emm will exhibit large variations for the first few knock-ins while Ee which lacks any long range or self-contact inter-actions will not exhibit such large variations. By KI12 the molecular mechanics energy (solid black line in Figure 3) has largely converged to the value associated with 1kx5. This agrees well with the idea that knock-ins with more than 12 Fou-rier components are within the allowed range of thermal motion appropriate for 1kx5 (i.e., there is little energy difference between the knock-ins with more than 12 terms and 1kx5).

The elastic rod model exhibits a very different behaviour as a function of knock-in number. The ABC, OB and OP models exhibit a large initial variation as the first few knock-ins are introduced, but then only gradually converge to the value associ-ated with 1kx5, i.e., KI74. This is in part a scaling issue arising from the unfavorable steric interactions present in Emm that are not present in Ee. Nonetheless, by KI12 we

Figure 2: Elastic and Molecular Mechanics Energies vs. Position in Nucleosome. The elastic (solid lines) and molecular mechanics (dashed) energies for each of the 16 base pair step types are plotted as functions of position in the nucleosome. Numeric values in the legend indicate: average energy for the indicated sequence (kcal/mol), range of energy across all 146 positions, and correlation between the elastic and molecular mechanics energies. Each data set is shifted and scaled such that the relative scale is the same for all data. One vertical grid corresponds to 50.0 kcal/mol = 83.9 RT for elastic energies and 26.9 kcal/mol = 45.1 RT for the molecular mechanics energies. Legends are sorted according to the indicated range of energies.

877

Elastic Rod Models and Long Range Interactions in

Nucleosome

find that the elastic models have accounted for only 40% in case of ABC to 70% in case of OB of the total elastic energy. For the ABC and OP sets the convergence is nearly monotonic indicating that each successive knock-in has an equivalent percent contribution to the total elastic energy. The elastic energy thus appears to be equally partitioned into the different Fourier components. This is not the case for the OB parameter set where the contributions are strongly biased by particular knock-ins.

Correlation of Ee with Experiment

In Figure 4 we consider the effect of varying the nucleosomal geometry on the predictive ability of Ee as determined by our correlation analysis. Again we initially focus only on the solid lines. Varying the nucleosome geometry adds one degree of freedom to Ee for each of the four sets of the elastic parameters. The best correlation is achieved by using KO14 for the nucleosome conformation and the ABC parameter set. KO14 lacks structural deformations with a periodicity of 10.4 bp. These defor-mations are known to be the primary determinant of the 3D conformation of the superhelix. As indicated by the molecular image in Figure 4 the DNA conforma-tion is comparatively straight without these deformations. However, Ee depends only on the local helical parameters and not explicitly on the 3D conformation. Thus while KO14 does not resemble the nucleosome superhelix it does have all other varia-tions in base pair stacking that are characteristic of the nucleosome. Numerically the amplitude of the 14th Fourier component is the largest thus this particular com-ponent is the one most likely to introduce deviations that exceed the limits of the linear elastic model. The OB and OP parameter sets are also strongly influenced by KO14; however, for these elastic parameters the correlation decreases rather than increases for this knock-out. The correlations are also significantly influenced by knock-outs 1, 27, and 60. KO1 is missing variations with a period of 146 bp,

Figure 3: Elastic and Molecular Mechanics Energy vs. Nucleosome Conformation. Elastic (colored lines) and in vacuo molecular mechanics (black) nucleosome energies versus nucleosome conformation as defined by the Fourier knock-ins (see (51) and text for more details). All models are for the sequence NCP147. The energy is reported as the error relative to the completely folded state KI74. The legend indicates which elastic parameter set was used. The letter “q” indicates a non-screened electrostatic energy, and “λ” a screened electrostatic energy determined by our simple Edh model. The first number in the legend indicates correlation of model with experimental ∆G, and the second number indicates cor-relation of model with molecular mechanics energies. Collapsed conformation KO0 (DNA without rise) and KO1 give unreasonably high Emm and were discarded from our correlation analysis.

878

Sereda and Bishop

i.e., variations that span the entire length of the nucleosome. Such long length scale variations were identified in (51) as both necessary and sufficient for the geometry of the nucleosome. KO13 has significant steric overlap, - the superhe-lical pitch is near zero (see molecular graphic in Figure 4). KO27 corresponds to

Figure 4: Correlation of Models with Experiment as a function of Nucleosome Geometry. Top Plot: Knock-outs. Vertical axis: correlation of theoretical and experimental ∆G values for the elastic param-eters as indicated in legend. Horizontal axis: wavenumbers for the knock-outs obtained from 1kx5. Also indicated are 1kx5 (squares) and KO14 (spheres). Electrostatic model is simple Edh model. Top Images: DNA conformations associated with the knock-outs are as labelled. Bottom Plot: Knock-ins. Vertical axis: correlation of theoretical and experimental ∆G values for the elastic parameters as indicated in legend. Horizontal axis: KI number. Crystal-structure conformation 1kx5 is indicated by squares, and KI12 by spheres. Bottom Images: DNA conformations associated with knock-ins as labelled. Other details are the same as for KO.

879

Elastic Rod Models and Long Range Interactions in

Nucleosome

structural variations with a period of 5.4 bp and KO60 to variations with period 2.5 bp. The short length variations may simply be harmonic effects since they are close to multiples of the fundamental length wavenumber 14 (10.4 bp). Interestingly the 5.4 bp variations also appeared in our comparison of Ee and Emm, where no filtering was employed.

Addition of Long Range Interactions

In this section we consider the effects of adding a long range interaction to the elastic rod energy. Before considering the functional form of Edh in Eq. [5] we attempted to introduce a long range term similar to the those introduced by De Santis and coworkers. All are based on their average free-DNA curvature Af (62). We employed the power-law function a Af as in (20), the power-exponential function c + a Af exp (b < Af >), and a simple quadratic function as in (53) (data not shown). We find that the power-law model with exponent n = 1.5 yeilds the best enhancement of our elastic models. The best correlations we can achieve with such a long range term are OP (0.77), ABC (0.71), and OB (0.62). These are all less than the 0.92 correlation reported by De Santis and coworkers .

By comparing the free energies obtained from Ee for individual sequences to the experimental values as a function of average curvature (62) (data not shown) we find that Ee with the ABC, OB, and OP parameters describes highly curved (.1.5 rad/nuc) nucleosomes better than the SA elastic model. However our elastic mod-els do not model the less curved DNA sequences as well as SA. Since we were unable to achieve a correlation approaching 0.92 using curvature based long-range energy functions, we investigated the functional forms in Eq. [5].

Simple Model of Edh

Our simple Edh model only considers 147 bp subsegments regardless of the length of the DNA sequence. Therefore only the free DNA conformation and parameter-ization of Edh affect the values of ∆Gb predicted with this model. The simple model contains two free parameters a and . No significant improvement in the correla-tion between Ee + Edh based predictions of ∆Gb and the experimental values were achieved compared to the best elastic set, namely ABC, even upon optimization of these parameters for various nucleosome geometries, see the dashed and dotted lines in Figure 4. Compared to the elastic rod model, this model did achieve some improvements for specific Fourier filtered geometries, and especially for the com-bined elastic sets (data not shown).

For the nucleosomal DNA conformation 1kx5 and ABC elastic parameters the cor-relation improved from 0.65 to 0.69, see Table I. The OP based model improved significantly, from 0.33 to 0.53, but it is still consistently below the ABC model’s results (see Figure 4). The OB based model could not be improved by introduction of the simple Edh model. The best performance was obtained by combining OP’s HP0 values with ABC’s K values (R = 0.74). But even this correlation is no better than the ABC elastic model with KO14. The main problem, which invalidates this implementation of Edh, is that the optimized Debye screening length parameters become non-physical, λ 1Å.

Complete Model of Edh

In our complete implementation of Edh the summation extends over the entire length of DNA for both the nucleosome and the free DNA conformations. Thus the ∆Gb

predicted with this model depends on both the conformation of free DNA and nucleosomal DNA, as well as, the particular values of λ, n and a in Eq. [5]. In all cases the correlation between model predictions and experimental ∆∆Gb remains at the same level as in the simple electrostatic model. However the optimized

880

Sereda and Bishop

parameter values have acquired reasonable magnitudes. Our typical electrostatic energy of DNA folding is ∆Edh ≈ 50–60 kcal/mol for the parameter values: minimal separation nopt = 25, Debye screening length λopt =15Å, and electrostatic weight aopt=–0.018.

We consistently obtain a negative electrostatic value (aopt < 0) indicating an attrac-tion rather than repulsion. The optimal values of electrostatic weight are in the range aopt = [–0.027, –0.003], and the screening parameter is in the range λ =[10-40]Å. The above value of a differs only by a factor of 3 from the estimate in (63),

ae

= =2.41

80= 0.0726

2 2q −( )

. And our screening parameter is consistent with the

estimate l = 0.736 20298

= 7.8

1

CTs

Å in (63) for oligonucleosomes at physi-

ological salt concentration, 0.1 M. The optimal values of separation are in the range n = [25-70], but for n>10 the correlations are only weakly affected by n.

Our values of Edh are similar in magnitude to the total of the linker-linker and linker-nucleosome electrostatic energy –53 9 kcal/mol obtained by Sun et al., (64) using their DiSCO model for the nucleosome electrostatic potential at physiological salt concentrations, 0.1 M. For comparison purposes the electrostatic energy contri-bution to the nucleosome formation constitutes approximately 25% of the elastic energy contribution. The electrostatic energy by itself gives poor correlation R = 0.43 for λopt =10Å. At the limit of no screening, i.e., λ → ∞, Edh reduces to a simple Coulomb model. For this model we obtain a correlation of -0.15 using the 1kx5 nucleosomal conformation and no charge separation, n = 1. For other nucleosome models the correlations range from -0.71 for KI10 through KI13 to 0.35 for KO14.

In Figure 5 ∆Edh values for five sequences are plotted. Analysis of the energy com-ponents was performed for five sequences spanning the entire range of intrinsic free-DNA curvatures and sequence lengths. DNA curvature was calculated accord-ing the algorithm in (61) and spans the range from 0.04 to 4.1 rad/nuc (20). The most curved sequence CritBamH1 with high nucleosomal ∆Gb is the most difficult to describe by means of an elastic theory. In all cases the central region exhibits lower energy than either end because aopt < 0. The sequences chosen indicate that curva-ture alone is insufficient to rank the effects of this long range energy. In particular the sequence labelled telo-homo458, which has the lowest curvature score and an obvious sequence repeat, is strongly biased in this energy landscape towards the central region but in a delocalized manner. CRITBamH1, which has the largest cur-vature, exhibits a very strong and very localized signal in this energy landscape.

When curvature grows, the magnitude of the attractive electrostatic energy of DNA folding | ∆Edh| first decreases, especially for the internal positions of the nucleosome. This reflects an entropic effect, which is expected to be the most pronounced for long DNA chains. Indeed, the longer the sequence, the larger is its attractive energy |∆Edh|. Whereas for the shorter or more curved sequences the counter-ion interac-tion of DNA with proteins and solvent cations starts playing an increasing role, as was pointed out in (20).

Note the absence of periodicity in Edh for the short periodic sequences telhuman222 and TG pentamer (SC_TG). Each have two aperiodic flanking sequences. Since we used a charge exclusion of n = 25 bp, the charges interact strongly at sequence distances close to one superhelix turn ~147 bp / 1.7 turns = 86 bp, when in the nucleosome. The only way to not feel an aperiodic sequence is when its electro-statically visible part is entirely in a straightened linker conformation.

In Figure 6, the elastic and electrostatic values are combined. In this energy land-scape the tendency to centrally locate is maintained in all cases, but the character of the landscape has changed in some cases. The physical energies depend in part

881

Elastic Rod Models and Long Range Interactions in

Nucleosome

on the actual length of DNA and its interaction with counter-ions, as noted above. These effects are not accounted for in our modeling. Nonetheless, CRITBamH1 exhibits multiple minimum instead of a single minimum while the periodic pat-terns in telo-homo458 are amplified from several kcal/(mol·nuc) to over 10 kcal/

Figure 5: Electrostatic Energy of DNA Folding. Electrostatic energy (vertical axis) of folding DNA into 1kx5 conformation for ABC parameters versus relative position of nucleosome on sequence (horizontal axis). Legend indicates sequence name and curvature values according to Ref. (20). Molecular images: CRIT-BamH1 fragment with nucleosome at various locations.

(mol·nuc). It is not the patterns in Edh or Ee alone that determine nucleosome pos-tioning in these energy landscapes, but the relative weighting of these two terms. Conditions which favor one term over the other can significantly change the com-bined energy landscape. These observations also appear to be sequence dependent such that different sequences can respond differently to changes in the relative weighting of Edh and Ee.

Comparison of Ee + Edh to Emm

In this section we consider the correlation between our molecular mechanics energy functions Emm and our optimized coarse-grained energy function, Ee + Edh. This data is represented by the dashed and dotted lines in Figure 3. The correlation between Emm and Ee + Edh tends to 1 in some cases, indicating that the steric overlap that arises in Emm is captured by addition of a long-range term to our elastic rod model. The ABC parameters yield the highest correlation (R = .97) between Ee + Edh and Emm. OP ranks second (R = .96) and OB ranks third (R = .92). We emphasize that for the reported correlations Edh was optimized to match experiment, not to optimize the correlation with Emm. The high correlation of our Ee + Edh and Emm energies is thus an unintentional result of fitting the parameters in Ee + Edh to achieve the high-est correlation with experiment. Figure 3 also demonstrates that the Ee + Edh models

882

Sereda and Bishop

converge to within a few percentage of their respective total energy values by KI14. This is as expected based on the Emm results and on the RMSD methodology that was employed to construct the knock-in series.

Addition of Edh consistently improved the correlation of Ee + Edh with the bonded and electrostatic components of Emm for each of the elastic sets ABC, OB, and OP across all KI. Correlation between the van der Waals component of Emm and Ee + Ed becomes consistently worse (see Supplemental Material for additional analysis.)

From the point of view of configurational energies it is necessary to include the complete Edh model with a screening parameter. However inclusion of a long range term, either our Edh , or the curvature based models, did not yield improvements in the predictive abilities of our energy function that could not be realized by use of only an elastic term, Ee. The long range term is crucial for obtaining the proper con-formational energy; however it does not affect the free energy of binding. In fact the long range energy associated with the DNA contained within the nucleosome footprint should cancel in our models.

Discussion

In the present work we study sequence dependences in the free energy of nucleosome formation. In a separate effort, we have developed a suite of computational tools for interactively folding and visualizating chromatin that enables us to perform a genome-wide analysis of chromatin structure, providing we have a reliable means of predicting nucleosome stabilities (see http://dna.ccs.tulane.edu/icm). Here we test the suitability of a nearest-neighbor based elastic rod model combined with a Debye-Huckel long range model as a means of creating a coarse-grained model of nucleosome stability.

For this reason we investigated the predictive ability of two different elastic rod models using four available sets of elastic parameters labelled ABC, OB, OP

Figure 6: Total Energy of DNA Folding. Total energy (vertical axis) of folding DNA into 1kx5 confor-mation with ABC elastic parameters and aopt= –0.018, nopt= 25, λopt= 15 Å versus relative position of nucleosome on sequence (horizontal axis). Sequence names and curvatures are indicated as in Figure 5.

883

Elastic Rod Models and Long Range Interactions in

Nucleosome

and SA in the text. Our quality assessment is based on direct comparisons of our elastic rod energies to molecular mechanics energies and analysis of the correla-tion between predicted values of ∆Gb and experimental values. For these studies we varied the elastic parameters, the nucleosome geometry, and the functional form of the long range interactions.

Our in vacuo molecular mechanics provide little insight into the experimental val-ues of ∆Gb because the correlation is only 0.47. Nonetheless, comparison of the molecular mechanics and elastic rod based models provides some indication of the behavior of the elastic rod model. As shown in Supplemental Data, we demon-strate that it is highly curved DNA sequences whose energy cannot be satisfactorily described by the elastic model of DNA, rather than the dimer step contents of the DNA sequences. This is accomplished by elimination of DNA sequences from the pool of 84 based on sequence content.

From our comparison of elastic energies to molecular mechanics energies as func-tion of base pair step type (presented in main and supplemental text), we conclude that four dinucleotide step types - CA, CG, TA and TG - do not work well in the elastic-rod model for the elastic parameters set used here. These are all pyrim-idine-purine (YR) steps. Notably, in an updated study of well-resolved protein-DNA x-ray structures, the deformability rank of CG and CA·TG dimers has been changed considerably compared to the elastic parameters used in our OB and OP data sets (65).

There is no simple relation between the content and location of YR steps in the 84 DNA sequences employed in our study. However, it is well known that YR steps are essential for achieving high nucleosome stability, as was shown in (21) for our TG reference sequence containing synthetic 10-bp TGTAACTCGG repeats. The role of TA or AT repeats every 10 bp and out of phase with GC steps was reported in (15, 66). In (37) the consensus positioning sequences YYYTA and TARRR were proposed.

Using correlation analysis of the predicted ∆Ge for 84 mononucleosomal DNA fragments, we show that the most reliable set of force constants K and equilib-rium free DNA conformations HP0 belongs to the ABC group (59). Without varied parameters, the ABC model yeilds a correlation of 0.65 which is even better than the elastic model SA of P. De Santis, C. Anselmi and coworkers. The latter model contains parameters specifically optimized to correlation with the ∆Gb values and is also used here. This result suggests the importance of including shear deformations (18, 51, 67) which were absent in the SA model. The SA model is based only on bend and twist. The rank of the other elastic parameters in descending quality is the following: SA (53), OB, and OP (60) (see Table I).

The effect of the nucleosomal geometry, HP, was studied using our Fourier fil-tering procedure (51). Since this procedure represents a smoothing of the helical parameters, it provides a systematic means of testing the limits of the linear elas-tic model. As implemented, we knock-out and knock-in all six helical parameters associated with a chosen wavenumber at once, leaving aside the question of which Fourier components are required to properly form the nucleosome superhelix as studied in (51). The best correlations are achieved for the elastic parameter set ABC and the nucleosome geometry KO14, a significantly straightened confor-mation of the nucleosome. We conclude that this particular Fourier mode, with period 10.4 bp, exceeds the elastic limits for this set of parameters. Stated dif-ferently, highly distorted DNA is not modeled well by the ABC parameters. This makes sense because they were obtained from equilibrium molecular dynamics simulations of DNA. The performance of the OB and OP parameters are also strongly affected by KO14. However these sets perform better with this particular Fourier mode included, suggesting that the linear elastic model is still within

884

Sereda and Bishop

range of applicability for these parameters sets. This interpretation is consistent with the fact that the OP and OB sets were derived from x-ray structures which exhibit greater deviations from equilibrium geometries than the ABC parameter set. What all of the parameter sets demontrate is that the elastic energy appears to be fairly evenly partitioned into the different Fourier modes (with noted excep-tions for the OB in the Results). All knock-ins contribute almost equally to the total elastic energy so it is difficult to rank the significance of the individual knock-ins. This is very different from the RMSD results in which after 12 KI’s there was little change in the geometry.

Smoothing of the nucleosome geometry considerably changes the nucleosome energy and typically leads to a decrease of the elastic model’s predictive abil-ity, see Figure 4. However there are some exceptions and the performance of the ABC actually degrades beyond KI50. This result suggests a study of which Fou-rier components of HP are important for the nucleosome energetics and which are important for predicting nucleosome stability. As we have demonstrated, the conformational energy and nucleosome stabily are not necessarily the same. Fur-thermore, the knock-in procedure used here does not identify which Fourier modes are associated with the various knock-ins. In any case, given the impact of HP on the performance of our elastic rod model, we suggest the conformation of the nucleosome should be considered a free parameter in developing elastic rod mod-els, at least to some extent.

We now consider the results from inclusion of the long range energy function. Unlike the molecular mechanics based energies, the elastic potential [4] has no long range component, e.g., the interaction between linker segments. Given the charged nature of DNA, we utilized a homogeneous point charge distribution assigned to DNA base-pair centers. Inclusion of the electrostatic energy modulates the total energy as a function of nucleosome position on DNA via the position, sequence, and nucleosome conformation dependent geometry of the linker segments of DNA (see inserts in Figure 5). Whereas the elastic energy, Ee, depends only on the nucleosomal part of any DNA sequence.

Our electrostatic energy term contains three free parameters: the rela-tive weight a, the Debye screening length λ, and minimal charge separation n. Inclusion of this term allowed us to improve the correlation with exper-imental free energies from 0.65 to 0.75. Our observation that the corre-lations do not significantly improve for values of n > 10 suggests an upper limit of the charge exclusion length for which the elastic parameters have already accounted for DNA self interactions. Our long range term favors nucleosome locations toward the center rather than ends of DNA, i.e., aopt < 0. This centralizing affect is also present in our total free energies (Figure 6) but appears to depend on the relative scaling of the elastic and the long-rang term. This is a step towards the observation by (68) that DNA sequence “tunes” the energy landscape by either working with or against other effects.

In regard to our negative electrostatic value aopt < 0, i.e., a long-range attraction rather than repulsion between linkers, we propose two possible interpretations. Both of which may take place simultaneously. First, we have ignored the substantial con-tribution of the nucleosome-linker electrostatic attraction. In particular the highly charged histone tails extend from the nucleosome core so that the linker-linker DNA repulsion may instead be dominated by histone attraction. This effect would be strongest when the nucleosome resides in the middle of a short DNA segment and each linker DNA segment would be capable of interacting with the histone tails (see molecular graphics in Figure 5). Second, there may be entropic reasons for the nucleosome to prefer positions away from the ends of DNA, as Figure 6 indicates. In a biologic context there are many other external influences.

885

Elastic Rod Models and Long Range Interactions in

Nucleosome

Conclusion

The free energy differences that we seek to model here are very complex. Con-ceptually even the proposed free energy cycle in Figure 1 fails to draw attention to fact that the histone core can itself change conformations. The level of approxima-tion inherent in our energy models must be acknowledged from the beginning. We are ignoring solvation effects, the histone tails, sequence specific variations in the histone-DNA interactions and many other physical interactions. From this point of view the 0.65 correlation with experimental values achieved by the elastic rod model with elastic parameters obtained from analysis of molecular dynamics simu-lations, i.e., the ABC parameters, should be considered quite good. The fact that without any free parameters this model performs as good as the elastic model by De Santis’s group is surprising. The primary difference between these two modeling approaches is that here we have employed detailed geometries and energy function-als (i.e., we include shear and stretching terms) rather than simplying assumptions. The fact that neither elastic model achieved a correlation greater than .65 with the chosen experimental data and a linear elastic model suggests this is the limit of what can be achieved with this modeling approach. Our observation that removal of the largest amplitude deformation in the nucleosome, KO14, had a significant effect on all elastic rod models suggests that a simple linear approximation does not properly capture the material properties of DNA. We also know from recent ABC simula-tions that the distributions of helix parameters for various base pair step types are not a single Gaussian distribution but rather some steps exhibit a bimodal distribu-tion, most notably for slide and twist (69). This is an indication that the harmonic approximation is not suitable for these particular helix parameters. These observa-tions alone demonstrate some of problems inherent in a linear elastic model. Within these limitations, our modeling efforts have also drawn attention to structural varia-tions associated with wave numbers (wavelengths): 1 (146 bp), 27 (5.4 bp) and 60 (2.5 bp). We again find a long length scale varition to be an important struc-tural feature of nucleosome. We first identified this feature in (51). It is an essen-tial component of our Roll-Slide-Twist description of the nucleosome superhelix. Since this variation spans the entire 147 bp in a nucleosome it may be consid-ered unique to the nucleosome. Many other proteins can introduce variations over a 10-20 bp intervals.

In addition to consideration of DNA’s local material properties, a long-range term is warranted based on consideration of all factors that must be accounted for in determining ∆G. Given the highly charged nature of DNA, it is reasonable to assume that electrostatic repulsion will dominate. However, we find that the sup-posed electrostatic contribution Edh is not repulsive but attractive. Note however that a simple elastic rod description works almost as well as our optimized Debye-Huckel energy term. The caveat in interpreting our long-range results is that we know the elastic rod model does not accurately represent the complex details of DNA as a material (e.g., the above noted bimodal distributions) nor does it account for solvent effects or sequence specific histone-DNA interactions (e.g., electro-static focusing reported by Honig et al., in this issue). Thus any long-range term that is added may turn out to be just a correction to the approximations inherent in our elastic model assumptions rather than an actual long-range contribution. In this regard further studies of the material properties of DNA as a function of sequence are needed before we can clearly assign nucleosome stability effects to the material properties of DNA.

This research was reported in part by Y. S. at the Biophysical Society’s 53rd Annual Meeting and by T. B. at Albany 2009: The 16th Conversation (70). The authors acknowledge support from NIH grant R01GM76356 “Molecular Dynam-ics Study of Nucleosome Stability and Receptor Binding”. We are grateful to Dr. Anita Scipioni for sending us 83 sequences and their experimental nucleosome free energies and Prof. Claudio Anselmi for providing their program which

886

Sereda and Bishop

calculates DNA curvatures and free energies according to their model. Thanks to Prof. Osamu Gotoh who sent us the dinucleotide melting temperatures.

Supplementary Material

Supplementary material for this paper deals with the following: (1). Choice of the Best Combination of Elastic Parameters HP0 and K. (2). Attempt to Find Prob-lematic Sequences. They can be obtained free of charge from the author (http://dna.ccs.tulane.edu/Supplements/Sereda-JBSD-2010.pdf) or can be purchased from Adenine Press for US $50.00.

References and Footnotes

R. D. Kornberg. 1. Science 184, 868-871 (1974).K. Luger, A. W. Mader, R. K. Richmond, D. F. Sargent, and T. J. Richmond. 2. Nature 389, 251-260 (1997).G.-C. Yuan, Y.-J. Liu, M. F. Dion, M. D. Slack, L. F. Wu, S. J. Altschuler, and O. J. Rando. 3. Science 309, 626-630 (2005).W. Lee, D. Tillo, N. Bray, R. H. Morse, R. W. Davis, T. R. Hughes, and C. Nislow. 4. Nat Genet 39, 1235-1244 (2007).I. Whitehouse, O. J. Rando, and T. Tsukiyama. 5. Nature 450, 1031-1035 (2007).I. Albert, T. N. Mavrich, L. P. Tomsho, J. Qi, S. J. Zanton, S. C. Schuster, and B. F. Pugh. 6. Nature 446, 572-576 (2007).T. N. Mavrich, I. P. Ioshikhes, B. J. Venters, C. Jiang, L. P. Tomsho, J. Qi, S. C. Schuster, I. 7. Albert, and B. F. Pugh. Genome Res 18, 1073-1083 (2008).Y. Field, N. Kaplan, Y. Fondufe-Mittendorf, I. K. Moore, E. Sharon, Y. Lubling, J. Widom, 8. and E. Segal. PLoS Comput Biol 4, e1000216 (2008).S. Shivaswamy, A. Bhinge, Y. Zhao, S. Jones, M. Hirst, and V. R. Iyer. 9. PLoS Biol 6, e65 (2008).T. N. Mavrich, C. Jiang, I. P. Ioshikhes, X. Li, B. J. Venters, S. J. Zanton, L. P. Tomsho, 10. J. Qi, R. L. Glaser, S. C. Schuster, D. S. Gilmour, I. Albert, and B. F. Pugh. Nature 453, 358-362 (2008).A. Valouev, J. Ichikawa, T. Tonthat, J. Stuart, S. Ranade, H. Peckham, K. Zeng, J. A. Malek, 11. G. Costa, K. McKernan, A. Sidow, A. Fire, and S. M. Johnson. Genome Res 18, 1051-1063 (2008).D. E. Schones, K. Cui, S. Cuddapah, T.-Y. Roh, A. Barski, Z. Wang, G. Wei, and K. Zhao. 12. Cell 132, 887-898 (2008).S. Cacchione, M. A. Cerone, P. De Santis, and M. Savino. 13. Biophys Chem 53, 267-281 (1995).M. Del Corno, P. De Santis, B. Sampaolese, and M. Savino. 14. FEBS Lett 431, 66-70 (1998).P. T. Lowary and J. Widom. 15. J Mol Biol 276, 19-42 (1998).E. Segal, Y. Fondufe-Mittendorf, L. Chen, A. C. Thastrom, Y. Field, I. K. Moore, J.-P. Z. 16. Wang, and J. Widom. Nature 442, 772-778 (2006).S. Fujii, H. Kono, S. Takenaka, N. Go, and A. Sarai. 17. Nucleic Acids Res 35, 6063-6074 (2007).M. Y. Tolstorukov, A. V. Colasanti, D. M. McCandlish, W. K. Olson, and V. B. Zhurkin. 18. J Mol Biol 371, 725-738 (2007).A. V. Morozov, K. Fortney, D. A. Gaykalova, V. M. Studitsky, J. Widom, and E. D. Siggia. 19. Nucleic Acids Res 37, 4707-4722 (2009).A. Scipioni, S. Pisano, C. Anselmi, M. Savino, and P. De Santis. 20. Biophys Chem 107, 7-17 (2004).T. E. Shrader and D. M. Crothers. 21. Proc Natl Acad Sci USA 86, 7418-7422 (1989).J. S. Godde and A. P. Wolffe. 22. J Biol Chem 271, 15222-15229 (1996).S. Cacchione, J. L. Rodriguez, R. Mechelli, L. Franco, and M. Savino. 23. Biophys Chem 104, 381-392 (2003).T. E. Shrader and D. M. Crothers. 24. J Mol Biol 216, 69-84 (1990).J. S. Godde, S. U. Kass, M. C. Hirst, and A. P. Wolffe. 25. J Biol Chem 271, 24325-24328 (1996).H. R. Widlund, H. Cao, S. Simonsson, E. Magnusson, T. Simonsson, P. E. Nielsen, J. D. 26. Kahn, D. M. Crothers, and M. Kubista. J Mol Biol 267, 807-817 (1997).L. Rossetti, S. Cacchione, M. Fua, and M. Savino. 27. Biochemistry 37, 6727-6737 (1998).H. Cao, H. R. Widlund, T. Simonsson, and M. Kubista. 28. J Mol Biol 281, 253-260 (1998).D. J. Fitzgerald and J. N. Anderson. 29. Nucleic Acids Res 26, 2526-2535 (1998).D. J. Fitzgerald and J. N. Anderson. 30. J Mol Biol 293, 477-491 (1999).I. Filesi, S. Cacchione, P. De Santis, L. Rossetti, and M. Savino. 31. Biophys Chem 83, 223-237 (2000).S. Mattei, B. Sampaolese, P. De Santis, and M. Savino. 32. Biophys Chem 97, 173-187 (2002).

887

Elastic Rod Models and Long Range Interactions in

Nucleosome

S. C. R. Elgin and J. L. Workman. 33. Chromatin structure and gene expression. Oxford Uni-versity Press (2000).E. N. Trifonov. 34. Nucl Acids Res 8, 4041-4053 (1980).F. Salih, B. Salih, and E. N. Trifonov. 35. J Biomol Struct Dyn 26, 273-281 (2008).I. Gabdank, D. Barash, and E. N. Trifonov. 36. J Biomol Struct Dyn 26, 403-412 (2009).A. G. Fernandez and J. N. Anderson. 37. J Mol Biol 371, 649-668 (2007).A. Thastrom, P. T. Lowary, H. R. Widlund, H. Cao, M. Kubista, and J. Widom. 38. J Mol Biol 288, 213-229 (1999).T. E. Cloutier and J. Widom. 39. Proc Natl Acad Sci USA 102, 3645-3650 (2005).J. A. Schellman. 40. Biopolymers 13, 217-226 (1974).J. A. Schellman. 41. Biophys Chem 11, 321-328 (1980).W. K. Olson. 42. Biopolymers 18, 1213-1233 (1979).V. B. Zhurkin, Y. P. Lysov, and V. I. Ivanov. 43. Nucleic Acids Res 6, 1081-1096 (1979).G. Arents, R. W. Burlingame, B. C. Wang, W. E. Love, and E. N. Moudrianakis. 44. Proc Natl Acad Sci USA 88, 10148-10152 (1991).P. T. Lowary and J. Widom. 45. Proc Natl Acad Sci USA 94, 1183-1188 (1997).M. A. el Hassan and C. R. Calladine. 46. J Mol Biol 251, 648-664 (1995).X.-J. Lu and W. K. Olson. 47. Nucleic Acids Res 31, 5108-5121 (2003).R. Lavery, M. Moakher, J. H. Maddocks, D. Petkeviciute, and K. Zakrzewska. 48. Nucleic Acids Res 37, 5917-5929 (2009).T. C. Bishop. 49. Bioinformatics 25, 3187-3188 (2009).R. E. Dickerson. 50. Nucleic Acids Res 17, 1797-1803 (1989).T. C. Bishop. 51. Biophys J 95, 1007-1017 (2008).C. A. Davey, D. F. Sargent, K. Luger, A. W. Maeder, and T. J. Richmond. 52. J Mol Biol 319, 1097-1113 (2002).C. Anselmi, G. Bocchinfuso, P. De Santis, M. Savino, and A. Scipioni. 53. Biophys J 79, 601-613 (2000).W. Humphrey, A. Dalke, and K. Schulten. 54. J Mol Graph 14, 33-38 (1996).J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. 55. Skeel, L. Kale, and K. Schulten. J Comput Chem 26, 1781-1802 (2005).T. E. Cheatham III and M. A. Young. 56. Biopolymers 56, 232-256 (2001).A. Perez, I. Marchan, D. Svozil, J. Sponer, T. E. Cheatham III, C. A. Laughton, and M. 57. Orozco. Biophys J 92, 3817-3829 (2007).T. J. Healey. 58. Math Mech Solids 7, 405-420 (2002).F. Lankas, J. Sponer, J. Langowski, and T. E. Cheatham. 59. Biophys J 85, 2872-2883 (2003).W. K. Olson, A. A. Gorin, X. J. Lu, L. M. Hock, and V. B. Zhurkin. 60. Proc Natl Acad Sci USA 95, 11163-11168 (1998).C. Anselmi, G. Bocchinfuso, P. De Santis, M. Savino, and A. Scipioni. 61. J Mol Biol 286, 1293-1301 (1999).P. De Santis, M. Fua, A. Palleschi, and M. Savino. 62. Biophys Chem 55, 261-271 (1995).G. Arya and T. Schlick. 63. J Phys Chem A 113, 4045-4059 (2009).J. Sun, Q. Zhang, and T. Schlick. 64. Proc Natl Acad Sci USA 102, 8180-8185 (2005).S. Balasubramanian, F. Xu, and W. Olson. 65. Biophys J 96, 2245-2260 (2009).J. Widom. 66. Q Rev Biophys 34, 269-324 (2001).T. J. Richmond and C. A. Davey. 67. Nature 423, 145-150 (2003).T. A. Blank and P. B. Becker. 68. J Mol Biol 260, 1-8 (1996).R. Lavery, K. Zakrzewska, D. Beveridge, T. C. Bishop, D. A. Case, T. Cheatham III, Surjit 69. Dixit, B. Jayaram, F. Lankas, C. Laughton, J. H. Maddocks, A. Michon, R. Osman, M. Orozco, A. Perez, T. Singh, N. Spackova and J. Sponer. Nucleic Acids Res 38, 299-313 (2010).Abstracts of Albany 2009: 16th Conversation. June 16-20, Albany, New York, USA; T. C. 70. Bishop. Molecular Dynamics Studies of Nucleosome Positioning, Abstract #208. J Biomol Struct Dyn 26, 787-927 (2009).

Date Received: November 15, 2009

Communicated by the Editor Ramaswamy H. Sarma