topology, structures, and energy landscapes of human ... · topology, structures, and energy...

6
Topology, structures, and energy landscapes of human chromosomes Bin Zhang a,b and Peter G. Wolynes a,b,c,1 a Department of Chemistry, b Center for Theoretical Biological Physics, and c Department of Physics and Astronomy, Rice University, Houston, TX 77005 Contributed by Peter G. Wolynes, March 31, 2015 (sent for review March 16, 2015; reviewed by Zaida A. Luthey-Schulten and Masaki Sasai) Chromosome conformation capture experiments provide a rich set of data concerning the spatial organization of the genome. We use these data along with a maximum entropy approach to derive a least-biased effective energy landscape for the chromosome. Sim- ulations of the ensemble of chromosome conformations based on the resulting information theoretic landscape not only accurately reproduce experimental contact probabilities, but also provide a picture of chromosome dynamics and topology. The topology of the simulated chromosomes is probed by computing the distribution of their knot invariants. The simulated chromosome structures are largely free of knots. Topologically associating domains are shown to be crucial for establishing these knotless structures. The simu- lated chromosome conformations exhibit a tendency to form fibril- like structures like those observed via light microscopy. The topo- logically associating domains of the interphase chromosome exhibit multistability with varying liquid crystalline ordering that may allow discrete unfolding events and the landscape is locally funneled to- ward idealchromosome structures that represent hierarchical fibrils of fibrils. chromosome conformation capture | maximum entropy | topologically associating domains | liquid crystal T he genomes 3D organization is thought to play a crucial role in many biological processes, including gene regulation, DNA replication, and cell differentiation (1). A theoretical frame- work for picturing the structure and dynamics of chromosomes thus promises significantly to advance our understanding of cell biology. It is a challenge to develop from first principles an en- ergy landscape theory for chromosomes analogous to that for proteins, which has proved rather successful in providing quanti- tative insights into protein folding thermodynamics, kinetics, and evolution as well as providing protein structure prediction schemes (24). The difficulty arises from several factors, the first being the complexity of the team of molecular players that contribute to chromosome organization. Although one chromosome is made of a single DNA molecule, an array of different proteins is involved in its 3D organization (5); of these only a few have been fully characterized. Even at the shortest length scales, the double- stranded DNA is wrapped by core histone proteins into nucleo- somes, which can go on to form higher-order structures such as the proposed 30-nm fibers upon stabilization by histone proteins H1 (6). Long-range contacts between genomic loci can also form in the presence of CCCTC-binding factor (CTCF) (7). The second difficulty is that the chromosome is large in molecular terms. It is so large, in fact, that its dynamics can be so slow that the chro- mosome may be a far from equilibrium structure, unlike most fol- ded proteins (8, 9). To complicate matters further, chromosome organization has been reported to be variable and to depend both on cell type and phase in the cell cycle (10, 11). There are ap- parently thus many global attractors on any realistic chromosome landscape. An energy landscape theory built at atomistic resolu- tion using physicochemical interactions of known components for chromosome organization seems thus currently out of reach. Here we explore a way to skirt some of the difficulties by using a maximum entropy approach to derive an effective equivalent equilibrium energy landscape for chromosomes by using physical contact frequencies measured in chromosome conformation capture experiments. Results Maximum Entropy Inferred Chromosome Models. Hi-C experiments provide a good measure of the frequency of spatial contacts between genomic loci (12, 13). These data have been incorporated into computer simulations to build structural models in a variety of ways. Many approaches start by mapping the contact frequencies onto preferred spatial distances between the corresponding geno- mic loci. The resulting distances can be directly used to construct a potential much like the associative memory models of protein folding (14). They can also be used as constraints to derive a unique chromosome structure using algorithms resembling NMR protein structure determination (15, 16). Because the experimental contact frequencies are ensemble averages from a population of cells that potentially vary in chromosome conformation (1720), the explicit form of this mapping to specific spatial distances is not known and may not be unique. In this study, we propose a method that confronts the issue of the averaging involved and directly re- produces the observed contact frequencies using an ensemble of chromosome conformations. We use the maximum entropy principle to derive what might be considered a minimally biased information theoretic energy landscape. Using an approximate statistical mechanical treat- ment of the ensemble and computer simulations, we develop an efficient iterative algorithm to find an ensemble of chromosome conformations consistent with a pseudo-Boltzmann distribu- tion for an effective energy landscape that reproduces the ex- perimentally measured pairwise contact frequencies (see SI Text, Inverse Statistical Mechanics and the Maximum Entropy Ensem- ble). We emphasize that the inferred energy landscape must be regarded as at best an effective one. It is an effective landscape because not only are the crucial protein partners left out, but also Significance Various cell types emerge from their nearly identical genetic content for a multicellular organism, in part, via the regulation of the function of the genomethe DNA molecules inside the cell. Similar to the way catalytic activity is lost when proteins denature, the function of the genome is tightly coupled to its 3D organization. A theoretical framework for picturing the structure and dynamics of the genome will therefore greatly advance our understanding of cell biology. We present an en- ergy landscape model of the chromosome that reproduces a diverse set of experimental measurements. The model enables quantitative predictions of chromosome structure and topol- ogy and provides mechanistic insight into the role of 3D ge- nome organization in gene regulation and cell differentiation. Author contributions: B.Z. and P.G.W. designed research, performed research, contrib- uted new reagents/analytic tools, analyzed data, and wrote the paper. Reviewers: Z.A.L.-S., University of Illinois at UrbanaChampaign; and M.S., Nagoya University. The authors declare no conflict of interest. 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1506257112/-/DCSupplemental. 60626067 | PNAS | May 12, 2015 | vol. 112 | no. 19 www.pnas.org/cgi/doi/10.1073/pnas.1506257112 Downloaded by guest on September 7, 2020

Upload: others

Post on 18-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topology, structures, and energy landscapes of human ... · Topology, structures, and energy landscapes of human chromosomes Bin Zhanga,b and Peter G. Wolynesa,b,c,1 aDepartment of

Topology, structures, and energy landscapes ofhuman chromosomesBin Zhanga,b and Peter G. Wolynesa,b,c,1

aDepartment of Chemistry, bCenter for Theoretical Biological Physics, and cDepartment of Physics and Astronomy, Rice University, Houston, TX 77005

Contributed by Peter G. Wolynes, March 31, 2015 (sent for review March 16, 2015; reviewed by Zaida A. Luthey-Schulten and Masaki Sasai)

Chromosome conformation capture experiments provide a rich setof data concerning the spatial organization of the genome. We usethese data along with a maximum entropy approach to derive aleast-biased effective energy landscape for the chromosome. Sim-ulations of the ensemble of chromosome conformations based onthe resulting information theoretic landscape not only accuratelyreproduce experimental contact probabilities, but also provide apicture of chromosome dynamics and topology. The topology of thesimulated chromosomes is probed by computing the distribution oftheir knot invariants. The simulated chromosome structures arelargely free of knots. Topologically associating domains are shownto be crucial for establishing these knotless structures. The simu-lated chromosome conformations exhibit a tendency to form fibril-like structures like those observed via light microscopy. The topo-logically associating domains of the interphase chromosome exhibitmultistability with varying liquid crystalline ordering that may allowdiscrete unfolding events and the landscape is locally funneled to-ward “ideal” chromosome structures that represent hierarchicalfibrils of fibrils.

chromosome conformation capture | maximum entropy | topologicallyassociating domains | liquid crystal

The genome’s 3D organization is thought to play a crucial rolein many biological processes, including gene regulation,

DNA replication, and cell differentiation (1). A theoretical frame-work for picturing the structure and dynamics of chromosomesthus promises significantly to advance our understanding of cellbiology. It is a challenge to develop from first principles an en-ergy landscape theory for chromosomes analogous to that forproteins, which has proved rather successful in providing quanti-tative insights into protein folding thermodynamics, kinetics, andevolution as well as providing protein structure prediction schemes(2–4). The difficulty arises from several factors, the first being thecomplexity of the team of molecular players that contribute tochromosome organization. Although one chromosome is made ofa single DNA molecule, an array of different proteins is involvedin its 3D organization (5); of these only a few have been fullycharacterized. Even at the shortest length scales, the double-stranded DNA is wrapped by core histone proteins into nucleo-somes, which can go on to form higher-order structures such as theproposed 30-nm fibers upon stabilization by histone proteins H1(6). Long-range contacts between genomic loci can also form inthe presence of CCCTC-binding factor (CTCF) (7). The seconddifficulty is that the chromosome is large in molecular terms. It isso large, in fact, that its dynamics can be so slow that the chro-mosome may be a far from equilibrium structure, unlike most fol-ded proteins (8, 9). To complicate matters further, chromosomeorganization has been reported to be variable and to depend bothon cell type and phase in the cell cycle (10, 11). There are ap-parently thus many global attractors on any realistic chromosomelandscape. An energy landscape theory built at atomistic resolu-tion using physicochemical interactions of known components forchromosome organization seems thus currently out of reach. Herewe explore a way to skirt some of the difficulties by using amaximum entropy approach to derive an effective equivalentequilibrium energy landscape for chromosomes by using physical

contact frequencies measured in chromosome conformationcapture experiments.

ResultsMaximum Entropy Inferred Chromosome Models. Hi-C experimentsprovide a good measure of the frequency of spatial contactsbetween genomic loci (12, 13). These data have been incorporatedinto computer simulations to build structural models in a variety ofways. Many approaches start by mapping the contact frequenciesonto preferred spatial distances between the corresponding geno-mic loci. The resulting distances can be directly used to constructa potential much like the associative memory models of proteinfolding (14). They can also be used as constraints to derive aunique chromosome structure using algorithms resembling NMRprotein structure determination (15, 16). Because the experimentalcontact frequencies are ensemble averages from a population ofcells that potentially vary in chromosome conformation (17–20),the explicit form of this mapping to specific spatial distances is notknown and may not be unique. In this study, we propose a methodthat confronts the issue of the averaging involved and directly re-produces the observed contact frequencies using an ensemble ofchromosome conformations.We use the maximum entropy principle to derive what might

be considered a minimally biased information theoretic energylandscape. Using an approximate statistical mechanical treat-ment of the ensemble and computer simulations, we develop anefficient iterative algorithm to find an ensemble of chromosomeconformations consistent with a pseudo-Boltzmann distribu-tion for an effective energy landscape that reproduces the ex-perimentally measured pairwise contact frequencies (see SI Text,Inverse Statistical Mechanics and the Maximum Entropy Ensem-ble). We emphasize that the inferred energy landscape must beregarded as at best an effective one. It is an effective landscapebecause not only are the crucial protein partners left out, but also

Significance

Various cell types emerge from their nearly identical geneticcontent for a multicellular organism, in part, via the regulationof the function of the genome—the DNA molecules inside thecell. Similar to the way catalytic activity is lost when proteinsdenature, the function of the genome is tightly coupled to its3D organization. A theoretical framework for picturing thestructure and dynamics of the genome will therefore greatlyadvance our understanding of cell biology. We present an en-ergy landscape model of the chromosome that reproduces adiverse set of experimental measurements. The model enablesquantitative predictions of chromosome structure and topol-ogy and provides mechanistic insight into the role of 3D ge-nome organization in gene regulation and cell differentiation.

Author contributions: B.Z. and P.G.W. designed research, performed research, contrib-uted new reagents/analytic tools, analyzed data, and wrote the paper.

Reviewers: Z.A.L.-S., University of Illinois at Urbana–Champaign; and M.S., NagoyaUniversity.

The authors declare no conflict of interest.1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1506257112/-/DCSupplemental.

6062–6067 | PNAS | May 12, 2015 | vol. 112 | no. 19 www.pnas.org/cgi/doi/10.1073/pnas.1506257112

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

7, 2

020

Page 2: Topology, structures, and energy landscapes of human ... · Topology, structures, and energy landscapes of human chromosomes Bin Zhanga,b and Peter G. Wolynesa,b,c,1 aDepartment of

the model employs a very coarse-grained description of thechromosome itself, being only at the resolution of 40,000 bp. Atthe same time, the validity of the pseudo-Boltzmann distributioncan be questioned because doubtless there are nonequilibriumeffects that allow the structures found in the cell nucleus to beformed under kinetic, not thermodynamic, control. Nevertheless, asshown by Wang and Wolynes (21, 22), effective equilibrium land-scapes can provide accurate descriptions of even highly non-equilibrium systems in favorable (but not universal) circumstances.The tabula rasa starting point for our modeling describes thechromosome as a generic homopolymer, a chain of beads on astring that is confined in a spherical wall mimicking the effect of thenuclear envelope to yield a volume fraction of 0.1 (9). Owing to theexcluded volume, one might expect strong topological constraintsdue to the possible knotting of such a confined chain. However, invivo the cell nucleus typically contains a large quantity of top-oisomerases that might relieve such constraints (23). Therefore, tomimic the effect of topoisomerases that can enable chain crossing, asoft-core repulsive potential is applied between pairs of beads ratherthan a strong hardcore excluded volume (10). A detailed mathe-matical definition of the model is provided in SI Text, SimulationDetails of Chromosome Models. We denote the potential energyfunction for this homopolymeric backbone model as UðrÞ. Tofit the experimental data, we also need to encode a geometricalsignal for contact formation. A rigorous description of thisencoding would involve a quantitative understanding of the phys-ical chemistry of the cross-linking reactions as well as the readoutprocesses. For simplicity, thus, we assume the measured contactprobability between genomic loci is a scalar function of the pairwisedistance r between those loci, f ðrÞ, where f is roughly a square-wellfunction. A biased energy landscape that can fit the observationsthat would be consistent with the maximum entropy principle canthus be written as a sum of the tabula rasa homopolymer energyand a pair interaction term with coefficients to be determined:UMEðrÞ=UðrÞ+P

iαifiðrabÞ (24–26), where the index i loops overthe pairs of contacts and a, b indicate the two genomic loci involvedin forming the i-th contact pair. The coefficients αi are Lagrangianmultipliers that may be determined using an iterative optimizationprocedure to ensure that contact frequencies predicted from thesimulation ultimately match the experimental measurements, thatis, hfiðrabÞi= fi,exp, where the bracket h · i indicates the ensembleaverage. As a proof of principle, in Fig. S1 we show that the iterativeinverse statistical mechanics algorithm we have developed indeed isable accurately to reproduce the structure of a small protein [withroot-mean-squared displacement (RMSD) <2Å] from pair contactinformation alone. At the same time such an information theoreticenergy landscape provides a picture of the free energy landscapenear the basin of minima. This encourages us to apply the approachto finding effective landscapes for the human chromosome 12 inboth embryonic stem and mature fibroblast cells starting from thecontact maps measured by Ren and coworkers (27).Fig. 1 A and B compare the experimental mature cell chro-

mosome contact maps (upper triangles) with the simulated onesobtained via the iterative algorithm (lower triangles). The corre-sponding plots for the stem cell chromosome are shown in Fig. S2.Fig. 1B is the zoomed-in version of Fig. 1A showing only the ge-nomic region 80–100 Mb, as an illustration of the finer structure.For both cell types, the derived landscapes accurately reproducethe contact map. This is quite remarkable given that the two celltypes exhibit strikingly different contact maps, with the mature cellchromosome forming stronger and denser long-range contacts.Results for the human chromosome 11 obtained in the same wayalso agree well with experiment and are shown in Fig. S2.Often, rather than examining the whole contact matrix, only

the contact probability as a function of genomic distance be-tween loci has been used to compare theories and models withmeasured data. In Fig. S2C, log-log plots of this quantity for theexperimentally determined contact probability versus genomicdistance are shown for the stem cell (light blue) and the maturecell (light red) chromosome. The corresponding lines from thelandscape simulations are drawn in darker colors. Two dashed

black lines with critical exponents of −1.0 and −1.5 are drawn asguides for the eye. These two exponent values have previously beenpredicted by two different homopolymer pictures of the chromo-some and have been used to test these pictures. The exponent −1.5corresponds to a fully equilibrated globule whereas the exponent−1.0 is predicted for a homopolymeric fractal globule, whose con-formations are kinetically limited owing to the rapid nonequilibriumcollapse of an originally unknotted chain (28). The present simu-lation models that assume a kind of effective equilibrium but withheterogeneous interactions closely reproduce the experimentsacross the entire range over 100 Mb for all cases.

Structural Characterization of Human Chromosomes. Examining theensemble of chromosome conformations obtained with the opti-mized information theoretic energy landscape that reproduces theHi-C contact map suggests some deeper ways of characterizing theensemble. In particular, one can use many body correlation func-tions and statistical landscape characterization through inherentstructure analysis to find the themes of local organization. To thatend, we first determine the mean squared fluctuation of the dis-tances between loci separated at various genomic distances. Thisquantity characterizing a polymeric trajectory would grow linearlyfor an ideal random coil. In contrast, as shown in Fig. S3A, con-sistent with the 3D FISH experiments (29), the mean squared dis-placements plateau at long genomic separations for both the stemcell (blue) and the mature cell (red) chromosome. Quantitativecomparison with the experiment also allows the estimation of aneffective persistence length scale of 150 nm per 40,000 bp (see Fig.S4A and SI Text, Physical Units of Chromosome Models). We notethis length scale would be consistent with forming a 30-nm fiber ata nucleosome line density of 1.3, a density that lies in the range ofexperimental observations (30). A comparison of the probabilitydistributions for the radius of gyration of the entire chain is shownin Fig. S3B. The stem cell chromosome has a size similar to that foran uncollapsed homopolymeric backbone model of the measuredpersistence length, but the mature cell chromosome adopts smallerand more collapsed conformations. The increase in compactnessfrom the stem cell to the mature cell chromosome is consistent withthe chromatin condensation and gene silencing that are observedexperimentally during cell differentiation (31).To characterize the heterogeneity of the sampled chromo-

some structures, we perform quality threshold clustering using theRMSD of atom-pair distances as a metric over the ensemble ofsimulated conformations. For neither of the cell types is there adominant cluster that would signal a well-defined unique chromo-some structure. Instead, each ensemble exhibits several differentclusters of structures of the chromosome that have rather com-parable statistical weights. The central structures from the top twoclusters are shown in Fig. 2 for the stem cell (SC1 and SC2) andthe mature cell (MC1 and MC2) chromosome. Representative

Fig. 1. Comparison between contact probabilities obtained from experi-ments and simulations. (A) Contact probability maps for the mature cellchromosome as determined in Hi-C experiments by Ren and coworkers (27)(upper triangles) and as sampled in simulations based on the informationtheoretic energy landscape (lower triangles). (B) Zoomed-in version of A inthe genomic region 80–100 Mb.

Zhang and Wolynes PNAS | May 12, 2015 | vol. 112 | no. 19 | 6063

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

CHEM

ISTR

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

7, 2

020

Page 3: Topology, structures, and energy landscapes of human ... · Topology, structures, and energy landscapes of human chromosomes Bin Zhanga,b and Peter G. Wolynesa,b,c,1 aDepartment of

structures for other clusters are shown in the Fig. S3. For com-parison, an example configuration of the reference homopolymericbackbone model is also shown in Fig. S3C. Both the stem cellchromosome and the backbone-only model adopt overall spheri-cal structures, which arise from the confinement wall that is in-cluded to mimic the nuclear envelope. Consistent with its smallerradius of gyration, the mature cell chromosome seems more col-lapsed. Despite the structural heterogeneity of the simulated en-sembles, the chromosome conformations for both cell types can bereadily distinguished from those of the starting homopolymericbackbone model because they appear segregated by sequence. Forexample, one sees that the red colors at the 5′ end cluster to-gether, as also do the yellow colors in the other end of the chro-mosome. However, the red colors are spread all over the place forthe homogeneous backbone model. These still frames from thesimulation already indicate there is a difference in the topology ofchromosomes from the homogeneous backbone model, a pointthat will be made clearer in the following section.Light microscopy and in situ hybridization experiments show

that chromosomes occupy mutually exclusive domains that formterritories inside the nucleus (32). The underlying mechanism forthis segregation is not yet known. Our optimized chromosomemodels also support such territory formation (see SI Text, Chro-mosome Territories). As shown in Fig. S2K, for the optimizedlandscapes, both in the stem and mature cell models, the chro-mosomes 11 and 12 do indeed segregate and restrict their mo-tions to more limited regions.

Topological Landscape for Human Chromosomes. Characterizing thetopology of human chromosomes has been of great interest. Theissue of chain knotting is clearly biologically relevant, as wit-nessed by the fact that topoisomerases have evolved in even thesimplest cellular organisms (33). Entanglements in complexknotted conformations would pose potential kinetic barriers forDNA replication and transcription in the absence of such top-oisomerases. Thus, it has been argued that the chromosomeshould be unknotted (8). The so-called fractal globule modelsuggests the reason for a lack of knots is a kinetic effect comingfrom rapid collapse. If topoisomerases are sufficiently active,however, the chain could topologically equilibrate, so the reason forunknottedness (if indeed that is the case) would have to be soughtelsewhere. The Hi-C experiments are indeed consistent with thepredictions for the fractal globule model in many aspects. For ex-ample, the power-law scaling of contact probability as a function of

genomic distance for human chromosomes does show a slope ofapproximately −1, which is consistent with the prediction from thefractal globule model (12). Nevertheless, as pointed out by manyauthors, the slope for the power-law scaling does vary dramaticallyamong different cell types (34) (see also Fig. S2C). In addition,other models can also give rise to a slope of −1 (35, 36). Whattopologies are found for the optimized chromosome landscapesinferred using statistical mechanical information theory?We characterize the topological state of an individual chro-

mosome configuration by using the minimal length/diameter ratioas an easily computable topological invariant (see SI Text, Knots andthe Topological Characterization of Chromosomal Configurations).Fig. 3A presents the probability distributions of this knot invariantfor the stem cell (blue) and the mature cell (red) chromosomeconformational ensembles. As a comparison, the probability dis-tribution for the input homogeneous backbone model is also shownin yellow. The homopolymeric backbone model is highly knotted, asexpected from entropic considerations for such long, locally flexiblechains (37). In contrast to the homogeneous backbone model,however, both the stem cell and the mature cell chromosome en-sembles have topological distributions peaked near the trivial andthe simple trefoil knot. The mature cell chromosome does exhibitconformations composed by compounding a few simple knots. Asdiscussed in SI Text, these knots are tight, being formed from twolocally intertwined loop configurations. These knots can be effec-tively eliminated via a simple coarse-graining procedure (dottedline). We find that the key to unknotting lies precisely in the for-mation of topologically associating domains (see SI Text, Topolog-ically Associating Domains in Chromosome), which locally rigidifythe chain. Forming these domains effectively renormalizes thepersistence length of the chromosome. Because these domains areapproximately 1 Mb in size, an entire chromosome of 0.1 billion bpcan be pictured as being a coarse-grained polymer that has onlyaround 100 beads. Computer simulations show that more than 100beads are needed to sample with high likelihood even a single knot(38). For a polymer of such a small equivalent size, therefore, heavyknotting is not expected to occur. To support this argument, weexamined another energy landscape for the chromosome using amodel that only includes the local topologically associating domain(TAD) signals from the mature cell chromosome by excluding themore genomically distant interactions in determining the energyfunction. The corresponding knot invariant distribution shown inFig. S5I indicates that this model also adopts knotless conforma-tions. Of course, because the derived energy landscape is only aneffective information theoretic construct, it is still possible thatnonequilibrium effects are the actual source of the effective shortrange in sequence interactions that prevent knotting. Rapid kineticcollapse constraints may well be the origin of forming the topo-logically associating domains, but certainly one must entertain otherpossibilities involving specific protein actors.Not only is the structural distribution of chromosome con-

formations interesting, but so also is the dynamics of chromosomereorganization (39). It has been argued that owing to the ex-ceedingly long timescale of topological relaxation, the interphasechromosome may never equilibrate and thus closely resembles intopology the metaphase chromosome that transforms to this state(9). These kinetic arguments, however, have been based on ho-mopolymeric models for the chromosome. Here we examine thekinetics of chromosome unknotting using both the optimized en-ergy landscape having the small excluded volume force mimickingthe presence of strong topoisomerase activity and a model withlarger excluded volume forces that would not allow knots to un-ravel, as would happen without sufficient topoisomerases.Starting from equilibrium configurations of the homogeneous

backbone model with complex knots, we turn on the optimizedcontact potentials at time 0. Fig. 3B presents the relaxation of theaverage minimal length knot invariant for the stem cell (blue) andfor the mature cell chromosome (red). The simulations shown inthe main figure have the smaller excluded volume forces thatmimic the presence of topoisomerase, through a soft-core poten-tial that allows chain crossing. Results for the model with a full

MC1 MC2

SC1 SC2

0 132 Mb

Fig. 2. Structural characterization of the chromosome ensembles. Centralstructures of the top two most populous clusters for the stem cell (SC1 andSC2) and the mature cell (MC1 and MC2) chromosome.

6064 | www.pnas.org/cgi/doi/10.1073/pnas.1506257112 Zhang and Wolynes

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

7, 2

020

Page 4: Topology, structures, and energy landscapes of human ... · Topology, structures, and energy landscapes of human chromosomes Bin Zhanga,b and Peter G. Wolynesa,b,c,1 aDepartment of

excluded volume effect modeling what happens when there isinsufficient topoisomerase are plotted in the inset. With sufficienttopoisomerase, equilibrium knotless conformations are reachedwithin the simulation time equivalent to 3 h in the laboratory,much less than the cell cycle of human cells. However, whentopoisomerase is absent, the inferred relaxation timescale vastlyexceeds the cell life cycle. For the stem cell chromosome, theknotted conformations are significantly relaxed after about 300 hon the laboratory timescale, whereas for the mature cell chro-mosome the relaxation is too slow to be observed on our simu-lation timescale. Apparently, the higher density of the morecollapsed chromosome conformations of the mature cell causesjamming that substantially increases the timescale.

In Search of an Ideal Chromosome Structure.As shown in Fig. 2 andin Fig. S3, instead of one single structure, an ensemble of chro-mosome conformations is needed to reproduce the contact map.How tight is the structural ensemble? We quantify the similaritiesamong the sampled chromosome structures on various length scales.We know that even for a protein in the denatured thermodynamicstate, local secondary and even tertiary structures resembling thefinal folded structure persist (40, 41). Likewise, the chromosomeconformation ensemble inferred from Hi-C experiments may wellshare local structural motifs characterizing a more energeticallyimportant and organized chromosome structure. Many experimentsindeed already report finding chromosome motifs (42). One way tolook for these motifs is to study chromosome conformations at a lowinformation theoretic temperature. As for protein folding, althoughthe unfolded ensemble is favored by entropy and thus dominates athigh information theoretic temperature, a tighter ensemble reflect-ing a nearly unique most probable folded structure could emergebelow a folding temperature.To investigate chromosome structures at low effective tem-

peratures, we performed simulated annealing runs in which theequivalent temperature T is linearly decreased from 2 to 0.2 over18 million steps. The temperature T = 1.0 corresponds with thephysical ensemble as sampled in the interphase state. To char-acterize the relation of these energy minima or “inherent struc-tures” to the ensemble (43), we use a collective variable Q, ameasure of the fraction of common contacts formed, to quantifythe similarity between quenched inherent structures (the mathe-matical definition of Q is provided in SI Text). When a singlestructure dominates as in protein folding, Q to the unique nativestructure can serve as an excellent reaction coordinate for con-formational changes, as it does for describing protein foldingwhere the energy landscape is known to be funneled. Q rangesfrom 0 to 1 with higher values corresponding to higher structuralsimilarity. Fig. S6A shows the ensemble average hQi over all of thepairs of structures for the stem cell (blue) and the mature cell

(red) ensembles obtained at various temperatures. For neither ofthe cell types does the chromosome relax toward a unique struc-ture. The “uniqueness” of the folded structure for a protein isusually visually apparent when the average hQi exceeds 0.5. Whenthe hQi at different length scales is determined, as shown in Fig.S6B for the mature cell chromosome, it becomes clear that thewide breadth of distribution of structures mainly manifests itself atlarge length scales. In contrast, the chromosomes do exhibit highstructural similarity in their inherent structure ensembles whenone probes at short genomic regions of approximately 1 Mb insize, the length scale on which the formation of topologically as-sociating domains has been observed.The relatively low hQi at large length scales suggests the

presence of glassy behavior that is typical of random hetero-polymers where the interactions fluctuate strongly as a functionof sequence. To study the role of the heterogeneity of the in-teractions between genomic loci, we examined a smoothed av-erage contact potential that varies only as a function of genomicdistance V ði− jÞ, but otherwise is homogeneous (see SI Text, TheIdeal Homogeneous Chromosome). Fig. S6C shows this sequenceaveraged ideal homogeneous potential V ði− jÞ of the contactenergies for both the stem cell (blue) and the mature cell (red)chromosome. Consistent with its more compact structure, themature cell chromosome has a larger contact energy at longgenomic distances. Nevertheless, the contact potentials for bothcell types exhibit large fluctuations, which lead to the observedheterogeneity and to the small hQi in the average even for thequenched structural ensemble containing the local energy min-ima, which are individually the most probable structures. Owing topossible experimental noise in the contact map that may arisefrom intrinsic low resolution and the mixture of cell cycles, how-ever, the significance of the observed heterogeneity in the con-tact potential is presently difficult to evaluate and may well beoverestimated.To simplify our understanding of the origins of chromosome

structural themes, it is therefore interesting to examine theconformational ensemble that would be predicted for the homog-enized polymer model having only the averaged contact po-tential V ði− jÞ acting uniformly throughout the sequence. Whenquenched using the same simulation protocol as was used for theheterogeneous model, this idealized chromosome model exhibitsquite clearly some extremely long-range correlated fibril-likestructural features (see Fig. 4C for an example conformation).To quantitatively characterize the fibril structure, we calculatethe correlation of a unit vector between beads separated by 4 insequence. The strong oscillatory behavior of the correlationfunction, shown as the yellow line in Fig. 4A, is a quantitativemeasure of the presence of fibril structures. Fourier transformingthis correlation function, as shown in Fig. 4B, further reveals that

A B

Fig. 3. Topological characterization of the chromosome ensembles. (A) The probability distributions of the minimal length/diameter ratio that is a topological in-variant. These are shown for the mature cell (MC, red), the stem cell (SC, blue) chromosome, and for the tabula rasa backbone model (BB, yellow). The probabilitydistribution of the mature cell chromosome computed using coarse-grained representations is shown as a green dotted line. Example polymer configurations withminimal length/diameter ratio for the trivial knot, the trefoil knot, and a complex knot are also shown, and arrows indicate their corresponding values of the to-pological invariant. (B) Relaxation of the mature cell (red) and the stem cell (blue) chromosome topology measured with the average knot invariant as a function oftime when topoisomerases are sufficiently active. (Inset) The corresponding plot without the presence of topoisomerase. The shaded regions represent the SDs.

Zhang and Wolynes PNAS | May 12, 2015 | vol. 112 | no. 19 | 6065

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

CHEM

ISTR

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

7, 2

020

Page 5: Topology, structures, and energy landscapes of human ... · Topology, structures, and energy landscapes of human chromosomes Bin Zhanga,b and Peter G. Wolynesa,b,c,1 aDepartment of

there are actually two layers of fibrils in the ideal chromosome.The first layer has a periodicity of ∼0.25 Mb, and the secondlayer has a periodicity corresponding to around 5 Mb. The twolayers of fibrils can also be seen from the density map shown inFig. S6D, with the genomic and spatial distances forming the twoaxes, respectively. Fig. 4C further shows that the 0.25-Mb peakcorresponds to a 300-nm fibril and the 5-Mb peak corresponds toa 600-nm fibril superstructure. We note that the hierarchical layersof fibril structures are reminiscent of the liquid crystal-like confor-mations that have been observed for dinoflagellate chromosomes(44) and are consistent with the hierarchical metaphase chromo-some model supported by light microscopy experiments (42).As shown by the correlation functions and their power spectra

in Fig. 4 A, B, and D, signatures of fibril structure also actuallyappear in the optimized interphase chromosome having heteroge-neous interactions (red lines), although they are much weaker. Thefibril structures are more evident at the low information theoretictemperature T = 0.2  as shown as blue lines, although the hetero-geneity by itself leads to broadening of the peak around 0.25 Mb.For human interphase chromosomes, signals of local fibril-likestructures have indeed been picked up in light microscopy experi-ments and these have led to specific suggestions of a hierarchicalfiber of fibers model for chromosomes (42). The scattering intensityprofile computed from the present landscape shown in Fig. S6Eindicates that these fibril features would be hard to detect in small-angle X-ray scattering experiments of the type whose results havebeen used to argue against such models (45).

Folding Free-Energy Profiles of Topologically Associating Domains.It has been suggested that the topologically associating domains,within which long-range contacts between DNA sequences areformed, provide structural units that could be a dynamical basisfor coordinating gene regulation (46). For example, folding of to-pologically associating domains can bring enhancers and promotersthat are separated by large genomic distances close to each other.Such a correlation between structure and function argues for theimportance of specific interactions in guiding the 3D organizationof topologically associating domains, and thus justifies the use oflandscape theory to study chromosome folding. The present model,in addition to allowing an examination of global chromosome dy-namics, allows us to explore the energy landscape of individualdomains and their coupling to neighboring ones while still re-maining agnostic regarding detailed biochemical and biophysicalmechanisms that actually give rise to the effective landscape.We first search for generic structural features of the set of

topologically associating domains identified in the mature cellchromosome 12. To measure the structure similarity within each of

these collapsed globules, we again determine the pairwise Q in theensemble of each defined topologically associating domain atT = 1.0. As shown in Fig. S6G, most of the topologically associatingdomains have an average hQi around 0.5. These high hQi valuesindicate that topologically associating domains exhibit strongstructural regularity, which is consistent with the orientationalordering characterized in Fig. 4. The liquid crystal-like orienta-tional ordering shown in Fig. 4 would indeed even be predicted forcollapsed globules of locally ordered chain with excluded volume,just as it is for proteins (47).Landscapes can be found for all of the local domains and various

behaviors are found as described in SI Text. To highlight what canbe learned, we discuss the analysis of the thermodynamics of oneparticular topologically associating domain between genomic re-gion 40.76–42.6 Mb, as highlighted with a star in Fig. S6G. Similaranalyses for other regions are provided in the Figs. S7 and S8. Herewe study the free-energy profile over the reaction coordinate Qrelative to a reference structure shown in Fig. 5, Inset. The freeenergy FðQÞ, which is shown in Fig. 5 in blue, exhibits two basinsaround Q= 0.4 and Q= 0.8 respectively. The average energy EðQÞ,however, decreases to a minimum at the rather high value ofQ= 0.8. The double-well feature of the free-energy profile in-dicates a possible cooperative transition between the two ensem-bles of conformations and supports the idea that topologicallyassociating domains may undergo two state transitions as in recentmodels of stochastic gene regulation (48–50). We also can in-vestigate the coupled landscapes for neighboring topologically

A

325 nm

670 nm

1340

nm

1170

nm

412 nm

DC

B

Fig. 4. Chromosome structures at low informationtheoretic temperature. (A and B) The orientationalorder parameter along the genome (A) and its Fou-rier transform (B) for the mature cell chromosome attemperature T = 1.0 (red) corresponding to the ex-perimental data and at the lower information theo-retic temperature T = 0.2 (blue) as well as for theideal homogenized chromosome model (IC). (C and D)Example conformations of the ideal (C) and the ma-ture cell (D) chromosome at T = 0.2.

Fig. 5. Landscape characterizations for a topologically associating domainbetween genomic region 40.76–42.6 Mb. Average energy E (red) and freeenergy F (blue) at T = 1.0 as a function of Q. (Inset) The reference structureused for the calculation of Q.

6066 | www.pnas.org/cgi/doi/10.1073/pnas.1506257112 Zhang and Wolynes

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

7, 2

020

Page 6: Topology, structures, and energy landscapes of human ... · Topology, structures, and energy landscapes of human chromosomes Bin Zhanga,b and Peter G. Wolynesa,b,c,1 aDepartment of

associating domains. As seen in Fig. S9, there are some pairsshowing significant cooperative coupling.

DiscussionAs witnessed by the blossoming of the protein-folding field, en-ergy landscape theory provides powerful statistical mechanicaltools for the quantitative study of heteropolymers. For a complexmacromolecule such as the chromosome, an effective energylandscape inferred using an information theoretic approachprovides an interesting alternative to landscapes derived directlyfrom physicochemical interactions of its constituents given byfirst principles. The power of the inferred energy landscape hasbeen partially revealed by the prediction of an “ideal chromo-some” model that nicely bridges the seemingly disordered in-terphase chromosome that provides the input data here with themore condensed metaphase chromosome structure suggested bymicroscopy leading to a proposed hierarchical model (42). With

the coming availability of contact probability maps for meta-phase chromosomes (10), it will be of great interest quantita-tively to investigate the phase diagram of chromosome structuresat different stages of the cell cycle and further to evaluate thesignificance of the suggested ideal chromosome model as a startingpoint for dynamical analysis. Equally interesting will be the studyof the structure–dynamics–function relationships that are crucialto understanding protein biology, but now in the much morecomplex context of gene regulation, and seek possible structuralimprints of epigenetic memory in chromosome conformation (11).

ACKNOWLEDGMENTS. We thank Drs. José Onuchic and Garegin A. Papoianfor critical reading of the manuscript. This work was supported by theCenter for Theoretical Biological Physics sponsored by National ScienceFoundation Grants PHY-1308264 and PHY-1427654. Additional supportwas provided by D. R. Bullard-Welch Chair at Rice University GrantC-0016.

1. Misteli T (2007) Beyond the sequence: Cellular organization of genome function. Cell128(4):787–800.

2. Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG (1995) Funnels, pathways, and theenergy landscape of protein folding: A synthesis. Proteins 21(3):167–195.

3. Frauenfelder H, Sligar SG, Wolynes PG (1991) The energy landscapes and motions ofproteins. Science 254(5038):1598–1603.

4. Wolynes PG (December 18, 2014) Evolution, energy landscapes and the paradoxes ofprotein folding. Biochimie, 10.1016/j.biochi.2014.12.007.

5. Marko JF (2008) Micromechanical studies of mitotic chromosomes. Chromosome Res16(3):469–497.

6. Finch JT, Klug A (1976) Solenoidal model for superstructure in chromatin. Proc NatlAcad Sci USA 73(6):1897–1901.

7. Ong C-T, Corces VG (2014) CTCF: An architectural protein bridging genome topologyand function. Nat Rev Genet 15(4):234–246.

8. Grosberg A, Rabin Y, Havlin S, Neer A (1993) Crumpled globule model of the three-dimensional structure of DNA. Europhys Lett 23(5):373.

9. Rosa A, Everaers R (2008) Structure and dynamics of interphase chromosomes. PLOSComput Biol 4(8):e1000153.

10. Naumova N, et al. (2013) Organization of the mitotic chromosome. Science 342(6161):948–953.

11. Phillips-Cremins JE, et al. (2013) Architectural protein subclasses shape 3D organiza-tion of genomes during lineage commitment. Cell 153(6):1281–1295.

12. Lieberman-Aiden E, et al. (2009) Comprehensive mapping of long-range interactionsreveals folding principles of the human genome. Science 326(5950):289–293.

13. van Steensel B, Dekker J (2010) Genomics tools for unraveling chromosome archi-tecture. Nat Biotechnol 28(10):1089–1095.

14. Tokuda N, Terada TP, Sasai M (2012) Dynamical modeling of three-dimensional ge-nome organization in interphase budding yeast. Biophys J 102(2):296–304.

15. Umbarger MA, et al. (2011) The three-dimensional architecture of a bacterial genomeand its alteration by genetic perturbation. Mol Cell 44(2):252–264.

16. Duan Z, et al. (2010) A three-dimensional model of the yeast genome. Nature465(7296):363–367.

17. O’Sullivan JM, Hendy MD, Pichugina T, Wake GC, Langowski J (2013) The statistical-mechanics of chromosome conformation capture. Nucleus 4(5):390–398.

18. Kalhor R, Tjong H, Jayathilaka N, Alber F, Chen L (2012) Genome architectures re-vealed by tethered chromosome conformation capture and population-based mod-eling. Nat Biotechnol 30(1):90–98.

19. Giorgetti L, et al. (2014) Predictive polymer modeling reveals coupled fluctuations inchromosome conformation and transcription. Cell 157(4):950–963.

20. Nagano T, et al. (2013) Single-cell Hi-C reveals cell-to-cell variability in chromosomestructure. Nature 502(7469):59–64.

21. Wang S, Wolynes PG (2011) Communication: Effective temperature and glassy dy-namics of active matter. J Chem Phys 135(5):051101.

22. Wang S, Wolynes PG (2012) Tensegrity and motor-driven effective interactions in amodel cytoskeleton. J Chem Phys 136(14):145102.

23. Swedlow JR, Sedat JW, Agard DA (1993) Multiple chromosomal populations of top-oisomerase II detected in vivo by time-lapse, three-dimensional wide-field micros-copy. Cell 73(1):97–108.

24. Pitera JW, Chodera JD (2012) On the use of experimental observations to bias simu-lated ensembles. J Chem Theory Comput 8(10):3445–3451.

25. Roux B, Weare J (2013) On the statistical equivalence of restrained-ensemble simu-lations with the maximum entropy method. J Chem Phys 138(8):084107.

26. Savelyev A, Papoian GA (2009) Molecular renormalization group coarse-graining ofelectrolyte solutions: Application to aqueous NaCl and KCl. J Phys Chem B 113(22):7785–7793.

27. Dixon JR, et al. (2012) Topological domains in mammalian genomes identified byanalysis of chromatin interactions. Nature 485(7398):376–380.

28. Mirny LA (2011) The fractal globule as a model of chromatin architecture in the cell.Chromosome Res 19(1):37–51.

29. Mateos-Langerak J, et al. (2009) Spatially confined folding of chromatin in the in-

terphase nucleus. Proc Natl Acad Sci USA 106(10):3812–3817.30. Robinson PJJ, Fairall L, Huynh VAT, Rhodes D (2006) EM measurements define the

dimensions of the “30-nm” chromatin fiber: Evidence for a compact, interdigitatedstructure. Proc Natl Acad Sci USA 103(17):6506–6511.

31. Fisher CL, Fisher AG (2011) Chromatin states in pluripotent, differentiated, and re-

programmed cells. Curr Opin Genet Dev 21(2):140–146.32. Cremer T, Cremer M (2010) Chromosome territories. Cold Spring Harb Perspect Biol

2(3):a003889.33. Champoux JJ (2001) DNA topoisomerases: Structure, function, and mechanism. Annu

Rev Biochem 70:369–413.34. Barbieri M, et al. (2012) Complexity of chromatin folding is captured by the strings

and binders switch model. Proc Natl Acad Sci USA 109(40):16173–16178.35. Bohn M, Heermann DW (2010) Diffusion-driven looping provides a consistent

framework for chromatin organization. PLoS ONE 5(8):e12218.36. Gürsoy G, Xu Y, Kenter AL, Liang J (2014) Spatial confinement is a major determinant

of the folding landscape of human chromosomes. Nucleic Acids Res 42(13):8223–8230.37. Grosberg A, Khokhlov A (1994) Statistical Physics of Macromolecules (AIP, Woodbury,

NY).38. Frank-Kamenetskiı̆ MD, Vologodskiı̆ AV (1981) Topological aspects of the physics of

polymers: The theory and its biophysical applications. Sov Phys Usp 24(8):679.39. Sikorav JL, Jannink G (1994) Kinetics of chromosome condensation in the presence of

topoisomerases: A phantom chain model. Biophys J 66(3 Pt 1):827–837.40. Weinkam P, Pletneva EV, Gray HB, Winkler JR, Wolynes PG (2009) Electrostatic effects

on funneled landscapes and structural diversity in denatured protein ensembles. ProcNatl Acad Sci USA 106(6):1796–1801.

41. Shoemaker BA, Wolynes PG (1999) Exploring structures in protein folding funnels

with free energy functionals: The denatured ensemble. J Mol Biol 287(3):657–674.42. Kireeva N, Lakonishok M, Kireev I, Hirano T, Belmont AS (2004) Visualization of early

chromosome condensation: A hierarchical folding, axial glue model of chromosomestructure. J Cell Biol 166(6):775–785.

43. Stillinger FH, Weber TA (1982) Hidden structure in liquids. Phys Rev A 25:978–989.44. Gautier A, Michel-Salamin L, Tosi-Couture E, McDowall AW, Dubochet J (1986) Elec-

tron microscopy of the chromosomes of dinoflagellates in situ: Confirmation of

Bouligand’s liquid crystal hypothesis. J Ultrastruct Mol Struct Res 97:10–30.45. Nishino Y, et al. (2012) Human mitotic chromosomes consist predominantly of ir-

regularly folded nucleosome fibres without a 30-nm chromatin structure. EMBO J

31(7):1644–1653.46. Nora EP, Dekker J, Heard E (2013) Segmental folding of chromosomes: A basis for

structural and regulatory chromosomal neighborhoods? BioEssays 35(9):818–828.47. Luthey-Schulten Z, Ramirez BE, Wolynes PG (1995) Helix-coil, liquid crystal, and spin

glass transitions of a collapsed heteropolymer. J Phys Chem 99(7):2177–2185.48. Sasai M, Wolynes PG (2003) Stochastic gene expression as a many-body problem. Proc

Natl Acad Sci USA 100(5):2374–2379.49. Zhang B, Wolynes PG (2014) Stem cell differentiation as a many-body problem. Proc

Natl Acad Sci USA 111(28):10185–10190.50. Sasai M, Kawabata Y, Makishi K, Itoh K, Terada TP (2013) Time scales in epigenetic

dynamics and phenotypic heterogeneity of embryonic stem cells. PLOS Comput Biol9(12):e1003380.

Zhang and Wolynes PNAS | May 12, 2015 | vol. 112 | no. 19 | 6067

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

CHEM

ISTR

Y

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

7, 2

020