regulatory gene networks in embryonic stem cell di erentiation -...

Regulatory Gene Networks in Embryonic Stem Cell Di↵erentiation

Steven ZwickApplied Physics

Harvard University

(Dated: May 15, 2015)

Motivated by the questions of how cells choose when to di↵erentiate and what type of cell to become, wereview a model of gene expression for large regulatory gene networks. By treating these networks analogouslyto magnetic systems, we rigorously define the epigenetic landscape that governs the dynamics of thesecells. Under this view, cell types become attractors in a multidimensional gene expression phase space withdi↵erentiation resulting from transitions between these attractors. We also discuss how the predictions ofthis model fits with data and the implications of this approach for future work in cellular decision-making.

I. INTRODUCTION

During the development of multicellular organisms,progenitor cells must choose their fate and di↵erenti-ate accordingly into one of the many di↵erent types ofadult cells. These decisions are governed by complex,stochastic networks of DNA-binding proteins (transcrip-tion factors) that reorganize the cell’s gene expressionduring di↵erentiation. This transition has been classi-cally viewed in terms of Waddington’s “epigenetic land-scape,” in which the transition of cells through the mul-tidimensional gene expression space is akin to marblesrolling to the bottom of a rugged valley [1]. More rig-orously, we can view di↵erent cell states as robust at-tractors in a high dimension potential landscape, withdi↵erentiation manifesting as transitions between theseattractors. Therefore, knowledge of the underlying net-work structure enables one, using analytical approachesfrom statistical mechanics, to construct a phase space ofgene expression with di↵erent attractors correspondingto the di↵erent states of the cell.

A standard and intuitive method to simulate suchmolecular networks is to directly model the chemical re-actions involved using a Monte Carlo algorithm [2]. How-ever, this type of simulation can become computationallyimpractical for realistically large networks, in which casea higher level approach may be more suitable for studyof these networks. Here, we review a model proposed byZhang and Wolynes [3] that treats gene networks analo-gously to magnetic systems and uses methods from sta-tistical mechanics to characterize the attractors in thesystem. They explicitly model the synthesis, degrada-tion, and DNA binding of proteins in a gene network asa birth-death process on a lattice. They then apply thismodel to the regulatory transcriptional network of em-bryonic stem cells (ESCs), which early in developmentmust choose when to leave a pluripotent state and whichcell type to di↵erentiate towards.

II. MODEL

Following Zhang and Wolynes [3], we start by describ-ing the master equation for stochastic gene expression in

FIG. 1. Gene regulation model for network of two mutually re-pressing genes Oct4 and Cdx2, from Zhang and Wolynes [3].

the regulatory network. Suppose the network in ques-tion has M genes that transcribe M di↵erent transcrip-tion factors. A schematic for our model given a simpletwo gene network is given in Figure 1. We assume thateach gene’s expression is determined by the binding ofthese transcription factors to adjacent regulatory regionsof DNA. Assuming that these transcription factors bindindependently of each other to unique regulatory DNAelements, from trivial combinatorics we have N = 2M

possible DNA occupation states, in which each transcrip-tion factor is either bound or unbound. If we label eachoccupancy state with an integer from [1, N ], we can writethe joint probability of a single gene having DNA occu-pancy state j with n proteins in the system at time t asP

j

(n, t). More conveniently, we can write these proba-bilities as a N-component probability vector for all thepossible occupancy states of a single gene,

P(n, t) =

0

B@

P1(n, t)P2(n, t)

...

P

N

(n, t)

1

CA . (1)

As a first order approximation for the expression of thenetwork of M genes, we can assume that the probabil-ities for each gene’s DNA occupancy are independent,referred to as the self-consistent proteomic field approxi-mation[3]. While this approximation is crude, it will en-able us to solve the master equation exactly and shouldnot a↵ect the broken symmetries in the network. Underthis assumption, if we denote the occupancy configura-tion probabilities for gene m as P

m

(nm

, t), the DNA oc-cupancy probability for the entire gene network is just

2

the tensor product of single gene probabilities,

P(n1, n2, ..., nM

, t) = P1(n1, t)⌦P2(n2, t)...⌦PM(nM

, t).(2)

The dynamics of the gene expression in our networkcan therefore be described by the following master equa-tion for the DNA occupancy probabilities,

@

@t

P(n, t) = G{P(n� 1, t)�P(n, t)}+K{(n+ 1)P(n+ 1, t)� nP(n, t)}+WP(n, t).

(3)

The first term describes the change in DNA occupancyprobability due to the synthesis of a new transcriptionfactor, with transition matrix G which is a diagonal ma-trix of the protein translation rates g

j

, which are deter-mined by the DNA occupancy configuration j and thestructure of the gene network. A simple rule is that theprotein translation is “on” (g

j

= g

on

) if the occupancyconfiguration j includes an activator and no inhibitorsfor that gene, and translation is “o↵” (g

j

= g

off

) if thereare no bound activators or if there are bound inhibitors.The second term accounts for exponential degradationof the transcription factors, such that K = kI wherek is the protein degradation rate. The final term has anon-diagonal transition matrixW that describes how theDNA occupancy states change given the network struc-ture, and is defined as follows. We assume that the tran-scription factors dimerize before binding to DNA, as isthe case for most transcription factors [citation]. If atransition from occupancy state j to state i requires thebinding of a homodimer of transcription factor m, thenW

ij

= hn

m

(nm

� 1)/2 where n

m

is the number of tran-scription factors m and h is the DNA binding rate. Ifthe transition requires the binding of a heterodimer oftranscription factors l and m, then W

ij

= hn

l

n

m

. Ifthe transition requires a transcription factor m to un-bind from the DNA, then W

ij

= j where j is the DNAunbinding rate. All other entries of the transition matrixW are 0.

It is worth discussing a key feature of this model beforesolving the master equation. Models of protein transla-tion often assume that the timescales of transcriptionfactors binding to DNA or unbinding from DNA aremuch faster than the protein translation timescale. Un-der this assumption, it is valid to assume that the en-semble of DNA occupancy configurations reaches a quasi-equilibrium, with the probabilities of each configurationreaching a steady state value and the translation rate de-pending solely on some function of the transcription fac-tor concentrations n

m

. However, experimental evidencehas demonstrated that this assumption is not valid forgene expression in eukaryotes, in which the chromatin ar-chitecture and and the wrapping of DNA around histonescan slow the process of DNA binding [5,6]. In this model,the DNA binding and unbinding rate parameters h and j

can give insight into how these epigenetic modificationscan alter gene expression and cellular decision-making.

A. Steady States

We can now solve this master equation by treat-ing the problem as a birth-death process on a latticeand using a path integral approach. Following Doi [7]and Pelilti [8], for each gene m we can define creationand annihilation operators a

†m

|nm

i = |nm

+ 1i anda

m

|ni = n

m

|nm

� 1i respectively and a state vector|

m

i =P1

n=0 Pm

(nm

, t)|nm

i. The master equation fora single gene m can then be written in the following”second-quantized” form,

@

t

| m

(t)i = ⌦

m

| m

(t)i, (4)

with non-Hermitian, ”Hamiltonian”-like operator

⌦

m

⌘ G(a†m

� 1) +K(am

� a

†m

a

m

) +W

†. (5)

Here, G and K are the same as in equation (3) and W

† isobtained by rewriting terms hn

m

(nm

� 1)/2 and hn

l

n

m

in operator form as h(a†m

a

m

)2/2 and h(a†l

a

l

)(a†m

a

m

).We can easily generalize this approach for the whole

network of m genes under the self-consistent proteomicfield approximation using equation (2), through which weobtain

| m

(t)i =MY

m=1

| m

(t)i, (6)

@

t

| (t)i = ⌦| (t)i, (7)

⌦ =MX

m=1

⌦

m

. (8)

Using equation (7), we can calculate the transitionprobability P (nf , ⌧ |ni, 0) of finding protein concentra-tions nf = (n1

f

, ..., n

M

f

)> at time t = ⌧ given initial pro-

tein concentrations ni = (n1i

, ..., n

M

i

)> at t = 0, wheren

m

i

and n

m

f

are the initial and final concentrations oftranscription factor m:

P (nf , ⌧ |ni, 0) = hnf

| exp(⌦⌧)|nii. (9)

Following Zhang et al. [9], we can derive a path inte-gral representation for the transition probability using aresolution of identity . For the sake of brevity, we referthe reader to the Supplementary Information of Zhangand Wolynes [3] for the precise definitions and deriva-tions of quantities. Following this approach, we writethe transition probability as a path integral

P (nf ,⌧ |ni, 0) /Z Y

m

Dx

Y

m

Dc

⇥ exp

�Z

dt

X

m

[pm

q

M

m=1 �H(qm

,p

m

)]

!,

(10)

3

withq

m

= (c1x1, ..., cNx

N

, (cN

� c1)/2, ..., (cN � c

N�1)/2)>,and p

m

= (px1 , ..., px

N

, p

c

1, ..., pc

N�1)>. Here, coordinates c

j

and x

j

are the probability and average protein numberof DNA occupation configuration j for gene m, and p

c

j

and p

x

j

are the corresponding conjugate momenta. TheHamiltonian H is defined directly from our master equa-tion [3] and can be seen via equation (10) to be separableover the set of genes in the network.

Following this result, the steady state solutions of themaster equation are attractors of the deterministic dy-namics that extremize the action of our path integral:

dq

m

dt

=@H@p

m

��pm=0

. (11)

Finally, from the steady state solutions for q

m

and p

m

,we can calculate the steady state probability distributionP

m

(n) and mean x

m

of the concentration of transcriptionfactor m by simply averaging over all DNA occupancyconfigurations,

P

m

(n) =NX

j=1

c

j

(xj

)ne�xj

n!,

x

m

=NX

j=1

c

j

x

j

.

(12)

B. Transition Paths

Once the steady states of the cell network have beenfound, the next question is how the cell switches betweenthese states. Now that our master equation is in Hamilto-nian form in equation (10), the most probable transitionpaths between two steady states is simply determined bythe standard Hamiltonian equations for evolution of asystem

dq

m

dt

=@H@p

m

,

dp

m

dt

= � @H@q

m

, (13)

with the two steady states of interest as boundary con-ditions. Once we have determined the most probabletransition path we can estimate the transition rate k be-tween these states as k / e

S where S is the transitionaction over the path:

S =

Zdt

X

m

p

m

q

m

. (14)

C. Adiabatic Limit

The “adiabatic limit” of the model refers to the regimein which DNA binding and unbinding occurs on a muchfaster time-scale than translation. In terms of our model

parameters, this is the regime h, f � g

j

for DNA occu-pancy configuration j. In this limit, we can easily cal-culate the relevant quantities by assuming that the oc-cupancy configurations reach equilibrium, in which casewe can define an e↵ective protein translation rate overall possible configurations,

g =NX

j=1

c

j

g

j

. (15)

This reduces the Hamiltonian to

H = g[ep � 1] + kx[e�p � 1], (16)

from which we can calculate the new steady state solu-tions from the deterministic dynamics of equation (11),which is now

dx

dt

= g � kx. (17)

Lastly, the Hamiltonian equations from (13) governingthe most probable transition path between steady statesare now

dx

dt

=@H@p

= ge

p � kxe

�p

,

dp

dt

= �@H@x

= �@g@x

[ep � 1]� k[e�p � 1].

(18)

FIG. 2. Gene regulatory network for embryonic stem cell modelin Zhang and Wolynes [3]. Each node represents a transcriptionfactor except for Oct4-Sox2, a heterodimer of two transcriptionfactors. Arrows indicate that a transcription factor binds to a reg-ulatory DNA element for a gene encoding another transcriptionfactor. Green arrows show that the transcription factor increasesexpression of a gene, whereas red arrows indicate that the tran-scription factor represses gene expression. The color of the tran-scription factor corresponds to the state of the cell that the proteinis experimentally found in.

III. EMBRYONIC STEM CELLDIFFERENTIATION

By connecting steady states of the model to states ofan ESC cell, we can make predictions about di↵erenti-ation that can be tested experimentally. Figure 2 de-picts the simplified regulatory network for mouse embry-onic stem cells (mESCs) used by Zhang and Wolynes

4

[3]. This network governs the decision of the mESCto leave the pluripotent state and di↵erentiate into dif-ferent cell types. Simulation results of the Zhang andWolynes model for this network [3] are shown in Figure3, with the steady state solutions given in 3(A) and themost probable transition path between two of the statesdescribed in 3(B). The model has five steady state so-lutions, corresponding with the gene expression of fourdi↵erent cell types: di↵erentiated cells (DC), trophec-toderm (TE), primitive endoderm (PE), and stem cells(SC1 and SC2). Interestingly, the model predicts twodi↵erent steady states for the stem cell state, charac-terized by a di↵erence in Nanog and Pbx2 expression,agreeing with experimentally measured heterogeneity inNanog expression in stem cell populations [10]. In figure3(B), we can see that the SC2 state is an intermediarystate in the most probable transition path from the SC1to PE states. This agrees with experimental observationsthat Nanog downregulation is necessary for di↵erentia-tion of mESCs [11].

FIG. 3. Simulation results of model from Zhang and Wolynes [3].(A) Steady state solutions (five) for the mESC network, linked toexperimentally observed cell types. Each steady state is defined bythe concentrations of the transcription factors along the x-axis. (B)Most probable transition pathway between SC1 (stem cell) stateand PE (primitive endoderm) steady states. The x-axis representsthe progression along the path and each row corresponds with adi↵erent transcription factor.

Past experimental results suggest that pluripotencygenes Oct4 and Sox2 are di↵erentially regulated in re-sponse to di↵erentiation signals in mESCs in vitro [11],with high Oct4 and low Sox2 in cells adopting themesendoderm fate and the opposite in cells di↵erenti-ating into neural ectodermal cells. Preliminary experi-mental results in human ESCs, which have a very simi-lar regulatory gene network to mESCs, also support this

conclusion (Figure 4). It is no surprise that Zhang andWolynes’s results do not predict this result since theirnetwork (Figure 2) does not include any di↵erential reg-ulation of Oct4 and Sox2, leading to no steady states inwhich Oct4 and Sox2 expression are significantly di↵er-ent (Figure 3A). However, this model o↵ers a promisingapproach to identify the transcription factors that arecritical for the di↵erentiation, such as the downregula-tion of Nanog is necessary in Figure 3B. Further workimplementing this model to other gene networks may giveinsight into which genes are the ”master” regulatory fac-tors for a particular cell fate choice and which are merelyresponsive.

FIG. 4. Time lapse microscopy images of human ESCs with fluo-rescent reporters tagged to copies of Oct4 (red) and Sox2 (green)after receiving mesoendoderm di↵erentiation signals for 0, 12, and24 hours respectively from left to right (Unpublished).

IV. CONCLUSION

In this paper, we discussed the view of cell di↵erentia-tion as transitions through an epigenetic phase space, inwhich cell types are steady state attractors. The modelproposed by Zhang and Wolynes [3] allowed us to quanti-tatively characterize these attractors and the transitionsbetween them using the formalism for magnetic systemsfrom statistical mechanics. Even with broad assump-tions, this approach produces results that agree with ex-periments and o↵er insight into how di↵erentiation arisesfrom the structure of the network. However, there re-main many open questions to be addressed through suchmodels. Why does the di↵erentiation of cells appear tobe an irreversible process experimentally? What roledoes DNA-binding kinetics (in adiabatic or non-adiabaticregime) have in the di↵erentiation between two di↵erentcell fates? Such general questions should not depend onthe specific microscopic details of the problem and in-vite the possibility of discovery through approaches fromstatistical mechanics like those discussed here.

[1] Waddington CH (1957) The Strategy of the Genes (Allen& Unwin, London).

[2] Gillespie DT (1977) Exact stochastic simulation of cou-pled chemical reactions. J Phys Chem 81:2340-2361.

[3] Zhang B, Wolynes PG (2014) Stem cell di↵erentiation asa many-body problem. PNAS 111:10185-10190.

[4] Gillespie DT (1977) Exact stochastic simulation of cou-pled chemical reactions. J Phys Chem 81:2340-2361.

5

[5] Feng H, Wang J (2012) A new mechanism of stem celldi↵erentiation through slow binding/unbinding of regu-lators to genes. Sci Rep 2:550.

[6] Li C, Wang J (2013) Quantifying Waddington landscapesand paths of non-adiabatic cell fate decisions for di↵eren-tiation, reprogramming and transdi↵erentiation. J R SocInterface 10(89):20130787.

[7] Doi M (1976) Second quantization representation forclassical many-particle system. J Phys A: Math Gen9:1465-1477.

[8] Peliti L (1985) Path integral approach to birth-death pro-cesses on a lattice. J Phys 46: 1469-1483.

[9] Zhang K, Sasai M, Wang J (2013) Eddy current andcoupled landscapes for non- adiabatic and nonequilib-rium complex system dynamics. Proc Natl Acad Sci USA110(37):14930-14935.

[10] Chambers I, et al. (2007) Nanog safeguards pluripo-tency and mediates germline development. Nature450(7173):1230-1234.

[11] Thomson M, et al. (2011) Pluripotency factors in embry-onic stem cells regulate di↵erentiation into germ layers.Cell 145(6):875-89.

regulatory gene networks in embryonic stem cell di erentiation -...

Documents