exactly solvable models of stochastic gene expression...2020/01/05 · exactly solvable models of...

Exactly solvable models of stochastic gene expressionLucy Ham,1, a) David Schnoerr,2, b) Rowan D. Brackston,2, c) and Michael P. H. Stumpf1, 2, d)1)School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Parkville VIC 3010,Australia2)Dept. Life Sciences, Imperial College London, SW7 2AZ, UK

(Dated: 6 January 2020)

Stochastic models are key to understanding the intricate dynamics of gene expression. But the simplest models whichonly account for e.g. active and inactive states of a gene fail to capture common observations in both prokaryoticand eukaryotic organisms. Here we consider multistate models of gene expression which generalise the canonicalTelegraph process, and are capable of capturing the joint effects of e.g. transcription factors, heterochromatin state andDNA accessibility (or, in prokaryotes, Sigma-factor activity) on transcript abundance. We propose two approaches forsolving classes of these generalised systems. The first approach offers a fresh perspective on a general class of multistatemodels, and allows us to “decompose" more complicated systems into simpler processes, each of which can be solvedanalytically. This enables us to obtain a solution of any model from this class. We further show that these modelscannot have a heavy-tailed distribution in the absence of extrinsic noise. Next, we develop an approximation methodbased on a power series expansion of the stationary distribution for an even broader class of multistate models of genetranscription. The combination of analytical and computational solutions for these realistic gene expression modelsalso holds the potential to design synthetic systems, and control the behaviour of naturally evolved gene expressionsystems, e.g. in guiding cell-fate decisions.

I. INTRODUCTION

The need for models in cellular biology is ever-increasing.Technological and experimental advances have led not only toprogress in our understanding of fundamental processes, butalso to the provision of a great wealth of data. This presentsenormous opportunities, but also requires further developmentof the mathematical and computational tools for the extractionand analysis of the underlying biological information.

The development of mathematical models for the complexdynamics of gene expression is of particular interest. Thestochastic nature of the transcriptional process generates in-tracellular noise, and results in significant heterogeneity be-tween cells; a phenomenon that has been widely observed inmRNA copy number distributions, even for otherwise identi-cal cells subject to homogeneous conditions1–6. Modelling iskey to understanding and predicting the dynamics of the pro-cess, and subsequently, to quantifying this observed variabil-ity. The Telegraph model, as introduced by Ko et al.7 in 1991,is the canonical model for gene expression and explains someof the variability observed in mRNA copy number distribu-tions. This mathematical model treats gene promoter activityas a two-state system and has the advantage of a tractable sta-tionary mRNA distribution4,8,9, enabling insights into limitingcases of the system10 and the behaviour of the distribution ingeneral.

With the advancement of single-cell experiments, the de-ficiencies of the Telegraph model are increasingly apparent.The model does not account for more complex control mech-anisms, nor more complex gene regulatory networks involv-

a)Electronic mail: [email protected])Electronic mail: [email protected])Electronic mail: [email protected])Electronic mail: [email protected]

ing feedback. Recent work has shown that, while the Tele-graph model is able to capture some degree of the variabilityin gene expression levels, it fails to explain the large vari-ability and over-dispersion in the tails seen in many experi-mental datasets11. Furthermore, it has become evident thatgene promoters can be in multiple states and may involve in-teractions between a number of regulatory factors, leading toa different mRNA synthesis rate for each state; such statesmay be associated with the presence of multiple copies of agene in a genome, or with the key steps involved in chromatinremodelling12–15, DNA looping16 or supercoiling17. Mathe-matical models—and accompanying analytic or approximatesolutions—that capture the kind of data arising from thesemore complex multistate dynamics are now essential.

Despite its deficiencies, the Telegraph model remains a use-ful framework around which more complicated models can beconstructed and, as we will show here, provides a foundationfrom which to develop further analytical results. A numberof extensions and modifications of the Telegraph model haveemerged, the simplest of which is perhaps the extension to al-low for a second basal rate of transcription—the “leaky" Tele-graph model11,18,19. A further extension incorporating the ad-ditional effects of extrinsic noise is given in Refs. 11, 20, and21. Several recent studies of gene expression have shown thatthe waiting time distribution in the inactive state for the genepromoter is non-exponential22–24, leading to the proposal of anumber of “refractory” models25–28. A distinguishing featureof these models is that the gene promoter can transcribe onlyin the active state and, after each active period, has to progressthrough a number of inactive states before returning to theactive state. An exact solution for the stationary distributionof a general refractory model is derived in Ref. 26, buildingupon the existing work of Refs. 25 and 27. In addition todirect extensions of the Telegraph model, many variants ofthe model have been proposed, some of which have been ad-ditionally solved analytically. Examples include a two-statemodel incorporating a gene regulatory feedback loop29,30; a

.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted January 6, 2020. . https://doi.org/10.1101/2020.01.05.895359doi: bioRxiv preprint

https://doi.org/10.1101/2020.01.05.895359

http://creativecommons.org/licenses/by-nc-nd/4.0/

2

three-stage model accounting for the discrepancy in abun-dances at the mRNA and protein level31; and more recently, amulti-scale Telegraph model incorporating the details of poly-merase dynamics is solved in Ref. 32. Some approximationmethods have been developed to solve more general multi-state models19,33.

Obtaining analytical solutions to models of gene transcrip-tion is a challenging problem, and there currently exist onlya few classes of systems and a handful of special cases forwhich an analytic solution is known. A number of these havejust been listed; see Ref. 34 for more information.

Existing methods for solving such models adopt a chemi-cal master equation approach and typically employ a gener-ating function method, in which the original master equationis transformed into a finite system of differential equations35.When an exact solution to the differential equations definingthe generating function can be found, the analytical solutionto the model can in principle be recovered by way of succes-sive derivatives. A major complication in finding analyticalsolutions in this way is that typically even slight changes tothe original model lead to considerably more complicated sys-tems of differential equations. An additional challenge is thatanalytic solutions to models usually involve hypergeometricfunctions, which can lead to numerical difficulties in furthercomputational and statistical analysis. Due to the lack of suc-cess in the search for analytic solutions, a significant amountof effort has been on stochastic simulation36 and approxima-tion methods34. However, some of these approaches can becomputationally expensive, making them unviable for largersystems, while most existing approximation methods are ad-hoc approximations that do not allow to control the approxi-mation error. To address these obstacles, we present two alter-native methods for solving certain classes of models of genetranscription.

The first approach offers a novel conceptualisation of aclass of multistate models for gene transcription and allowsus to extend a number of known solutions for relatively basicprocesses to more complicated systems. For a certain class ofmodels, we are able to abstractly decompose the more com-plicated system as the independent sum of simpler processes,each of which is amenable to analytic solution. The distribu-tion of the overall system can then be captured by way of aconvolution of the simpler distributions.

The method applies to a wide range of models of stochas-tic gene expression. Examples include systems with multiplecopies of a single gene in a genome, or systems with multiplepromoters or enhancers for a single gene, each independent,and contributing additively to the transcription rate. Here wewill provide analytical expressions for a multistate processconsisting of a finite, exponential number of activity states,each with a particular mRNA transcription rate. Similar theo-retical models have been employed to predictively model mul-tistate processes37,38, however, the analyses in these are donenumerically. Thus, our results enable exact solutions to somenatural examples of multistate gene action, with an unlimited(though exponential) number of discrete states.

The second method draws as inspiration from the origi-nal derivation of the solution to the Telegraph model, given

by Peccoud and Ycart8 in 1995, and provides an alternativeapproximative approach to solving master equations. Themethod is reminiscent of Taylor’s solution to differential equa-tions39, using the differential equations to construct a recur-rence relation for derivatives of the solution. We show howour approach can be used to recover the full solution to thepreviously mentioned multistate model to arbitrary numericalprecision. However, the method is also applicable to morecomplex systems for which no analytic solutions exist in theliterature. When an analytical solution is known, we are ableto benchmark the computational speed of the two approaches.For some systems, the approximation method is in fact fasterthan the analytical solution, as the latter typically requiresmultiple numerical calculations of the confluent hypergeomet-ric function.

This article is structured as follows. We begin in Section IIby presenting a general class of multistate models that covermany examples of gene transcription models used in the anal-ysis of experimental data. We explain how these multistateprocesses can be decomposed into simple processes acting in-dependently, and how these can be solved analytically. Wecomplete the section by giving some examples of previously-unsolved models. In Section III, we consider the effects of ex-trinsic noise, and show that the main results of Ref. 11 can beextended to these models. Next, in Section IV, we outline thestrategy for turning differential equations into recurrence rela-tions that can be computationally iterated to produce solutionsof arbitrary accuracy. We then utilise the approach to solvesome examples of gene transcription models for which no an-alytic solutions are currently known. Finally, we assess theaccuracy and computational efficiency of our approach. Com-parisons are made between our approximative approach, ana-lytical solutions, and the stochastic simulation algorithm40.

II. DECOMPOSING MULTISTATE PROCESSES

The standard method of solving models of gene transcrip-tion is to transform the chemical master equation into a sys-tem of ordinary differential equations by way of generatingfunctions. We will observe that there are certain situationsin which the need to solve these differential equations can beavoided, by decomposing into sums of independent simplerprocesses, each of which can be treated analytically. Thesecan then be recombined to solve the original system.

We now briefly outline the content of this section. After firstdefining the leaky Telegraph model, we present a generalisa-tion of this model that allows for a finite number of discreteactivity states. As motivation, we will apply our decomposi-tion method to the leaky Telegraph model, which has a knownexact solution for the steady-state mRNA copy number distri-bution. We will demonstrate how efficiently and easily this so-lution can be derived using the new approach. A more formalexposition of the method will then be given. Novel results arepresented in the remaining subsections, where we describe indetail how this approach can be used to solve more complex,previously unsolved systems.


https://doi.org/10.1101/2020.01.05.895359


3

mRNAKI

KA δ

λ

µ

I A

∅

FIG. 1. A schematic of the leaky Telegraph model. The nodes rep-resent the state of the gene, labelled I and A for inactive and active,respectively. The mRNA are represented as blue wiggly lines. Theparameters λ and µ are the rates at which the gene transitions be-tween the two states I and A; the parameters KI and KA are the tran-scription rate parameters for I and A, respectively; the degradationrate is denoted by δ .

A. The leaky Telegraph model

The leaky Telegraph model has been considered in a num-ber of previous studies such as Refs. 11, 18, 19, and 25, andincorporates the well known phenomenon of promotor leak-age and basal gene expression41–43. In this model, the geneis assumed to transition between two states, active (A) and in-active (I), with rates λ and µ , respectively. Transcription ofmRNA is modelled as a Poisson process, occurring at a basalbackground rate of KI when inactive, and at a rate KA > KIwhen active. Degradation occurs as a first-order Poisson pro-cess with rate δ , and is assumed to be independent of the genepromoter state. A schematic of the system is given in Fig. 1.Observe that the standard Telegraph model corresponds to thespecial case for KI = 0.

We recall here that the mRNA copy number from the leakyTelegraph process, as shown in Ref. 11, has the followingsteady-state distribution:

pL(n) =1n!

n

∑r=0

[(nr

)Kn−r

A e−KA(KI−KA)r µ(r)

(µ +λ )(r)

1F1(µ + r,µ +λ + r,−(KI−KA))

], (1)

where pL(n) is the probability of observing n mRNAmolecules in the system, 1F1 is the confluent hypergeomet-ric function44, and, for real number x and positive integer n,the notation x(n) abbreviates the rising factorial of x. Notealso that the rates are scaled so that δ = 1. When K0 = 0,we can recover the following well-known expression for theTelegraph process in the steady-state:

pT (n) =Kn

Aλ (n)

n!(µ +λ )(n)1F1(λ +n,µ +λ +n,−KA). (2)

Throughout this work, we will refer to the probability massfunctions pL(n) and pT (n) as the “leaky Telegraph distribu-tion” and “Telegraph distribution”, respectively.

Another way to arrive at the same overall behaviour as theleaky Telegraph process is to consider a system consisting oftwo independent copies of the same gene, one lacking the

≡+

KI

KA

Trans. rate

time

KA−KI

KI

KA−KI

FIG. 2. Decomposing the leaky Telegraph process (left) into the in-dependent sum of a constitutive and Telegraph process (right).

control features of the other so that it transcribes constitu-tively at a constant rate of K0 := KI . The other copy of thegene is governed by a Telegraph process with transcriptionrate K1 := KA−KI when active (and 0 when inactive). Notethat when the second gene is active, the combined transcrip-tion rate is K0 +K1 = KA, so this model is indistinguishablefrom the leaky Telegraph model; see Fig. 2 for a pictorial rep-resentation of this situation. The new interpretation is moreeasily modelled, as it is simply two independent systems, bothof which are simpler than the original.

We are able to make this observation more precise by wayof analysis of the corresponding probability generating func-tions, defined as g(z) = ∑

∞n=0 zn p(n), where p(n) is the sta-

tionary probability distribution. Let P be a random variablethat counts the copy number of a constitutive process withconstant rate K0, which is well-known to be Pois(K0) dis-tributed at stationarity45–47, and let T be an independent ran-dom variable distributed by a Telegraph distribution with ratesλ ,µ,K1 = KA−KI . Then the total copy number X for the bi-nary system is simply the independent sum of P and T . Astandard result in probability theory says that the probabil-ity generating function of a sum of two independent randomvariables is the product of the two component probability gen-erating functions48 (Section 3.6(5)). The probability generat-ing function for the random variable P is eK0(z−1) while theprobability generating function for the Telegraph distributedT is 1F1 (λ ,µ +λ ,K1(z−1))8. So the probability generatingfunction for the overall system is:

eK0(z−1)1F1 (λ ,µ +λ ,K1(z−1)) . (3)

This solution, (3), should match the probability generatingfunction yielding (1). From Ref. 11, the probability gener-ating function for the leaky Telegraph model is given by:

g(z) = eKA(z−1)1F1 (µ,µ +λ ,(KI−KA)(z−1)) . (4)

Multiplying (4) though by eKI−KI = 1, and applying Kum-mer’s transformation for the confluent hypergeometric func-tion, we obtain (3), noting that K0 = KI and K1 = KA−KI .

B. The 2m-multistate model

We now extend the leaky Telegraph process of the previ-ous section to allow for a finite number of discrete activitystates. For a non-negative integer m, consider a gene withm distinct activating enhancers a0, . . . ,am−1, where for eachi ∈ {0, . . . ,m− 1}, the enhancer ai, independently, is either


https://doi.org/10.1101/2020.01.05.895359


4

am−1 . . . a1 a0

0 1 . . . 0 1 0 1

s0 = (0,0, . . . ,0)s1 = (0,0, . . . ,1)s2m−1 = (1,1, . . . ,1) . . .

∅δ

K2m−1 K0 KB

m enhancers

Each ai ∈ {0,1}

2m states

Ki = KB +k · si

FIG. 3. A diagrammatic representation of the 2m-multistate model, for m∈N∪{0}. There are m enhancers, a0, . . . ,am−1, where each enhanceris either unbound (0) or bound (1): so ai ∈ {0,1}. This leads to 2m possible activity states (am−1, . . . ,a0) for the gene; the state s0 = (0,0, . . . ,0)corresponds to when all of the ai are unbound, the state s1 = (0,0, . . . ,1) corresponds to when a0 is bound, and all other ai are unbound, whilethe state s2m−1 = (1,1, . . . ,1) corresponds to when all of the ai are bound. The gene transitions between these 2m states, where switchingevents between adjacent states si and s j occur at rate νi j (this is, however, not depicted in the above). Each state si has an associated synthesisrate, Ki, equal to KB +k · si, where k = (km−1,km−2, . . . ,k0) is the vector of mRNA transcription rates for the enhancers. Degradation occursat rate δ , independently of the activity state.

bound or unbound to activators with rates λi and µi (resp.) andcontributes, additively, with rate ki to the overall transcriptionrate of the gene. So each ai is either bound or unbound, lead-ing to 2m possible states for the activity of the gene, which wedenote by s0 = (0,0, . . . ,0),s1 = (0,0, . . . ,0,1), . . . ,s2m−1 =(1,1, . . . ,1), where for notational convenience si is the m-bit binary representation of the number i as a tuple. Tuplesdiffering in just one position (that is, of Hamming distanced(si,s j) = 1) will be said to be adjacent. Changes in state re-sult from a binding or unbinding of an enhancer, from which itfollows that state transitions will occur only between adjacentstates.

We allow for a basal (leaky) rate of transcription KB, whichis independent of the state of the enhancers. Then in each statesi, the mRNA are transcribed according to a Poisson processwith a constant rate Ki = KB + k · si, for i ∈ {0, . . . ,2m− 1},where k = (km−1,km−2, . . . ,k0) is the vector of mRNA tran-scription rates for the enhancers and · is the usual dot product.Note that for i= 0 (all enhancers unbound) this gives K0 =KB,while for j ∈ {0, . . . ,m−1}, we have K2 j = KB + k j.

The enhancers switch between bound and unbound withrates λi and µi, respectively. This means that the system tran-sitions between adjacent states s j1 and s j2 only: states differ-ing in at most one bit, i say. This means that transition froms j1 to s j2 occurs stochastically at rate ν j1 j2 equal to either λi

(if j1 < j2, meaning the ith bit of j1 is 0) or µi (if j2 < j1).Degradation of mRNA occurs as a first-order Poisson processwith rate δ , and is assumed to be independent of the promoterstate. By a 2m-multistate model, we will mean a multistatemodel with 2m states arising from the independent and ad-ditive transcription activity of m enhancers. We denote thismodel by Mm. Figure 3 gives a depiction of the 2m-multistateprocess.

If we let M represent the mRNA molecules, the reactions

defining the 2m-multistate model can be summarised as:

Mm :=

si

νi j−→ s j, for i, j ∈ {0, . . . ,2m−1} and di j = 1,

siki−→ si +M, for i ∈ {0, . . . ,2m−1},

M δ−→∅,(5)

where di j is the Hamming distance between si and s j. Notealso that νi j,ki ≥ 0, and δ > 0. The transition digraphs arisingfrom the reactions defined in (5) can be visualised as hyper-cubes (m-dimensional cubes); see Fig. 4.

Observe that the leaky Telegraph model coincides with the21-multistate model, M1, by setting KB =KI and k0 =KA−KI .As a second example, the 22-multistate model, M2, consists oftwo distinct activating enhancers a0 and a1, each with associ-ated transcription rates k0 and k1, respectively. This leads tofour possible activity states s0 = (0,0),s1 = (0,1),s2 = (1,0)and s3 = (1,1). The state s0 = (0,0) corresponds to whenboth a0 and a1 are unbound. The state s1 = (0,1) corre-sponds to when a0 is bound and a1 is unbound. Similarly,the state s2 = (1,0) corresponds to when a0 is unbound anda1 is bound, and state s3 = (1,1) corresponds to when botha0 and a1 are bound. The enhancers contribute additively tothe overall transcription rate, so the rates for s0,s1,s2,s3 areK0 = KB, K1 = K0 + k0, K2 = K0 + k1, K3 = K0 + k0 + k1, re-spectively. Refer to Fig. 5(a) for a representation of the modelM2.

C. Equivalent systems for Mm

As with our motivating example, the 2m-multistate modelhas a courser-grained formulation obtained by disregardingthe finer level information about the states of the activatingenhancers. In this view, the 2m-multistate model can be equiv-alently thought of as a model for the transcriptional activity of


https://doi.org/10.1101/2020.01.05.895359


5

(a) T1 (b) T2 (c) T3 (d) T4

FIG. 4. Transition digraphs arising from the 2m-multistate process. For m = 1, the transition digraph is a line segment, for m = 2 the transitiondigraph is a square, for m = 3 the digraph is a cube, and m = 4 corresponds to when the graph is a tesseract, and so on. Note that here we useTm to denote the underlying transition digraph of the model Mm.

a. multiple enhancers of a single gene, where each en-hancer has a particular additive effect on the mRNAtranscription rate;

b. multiple independent copies of a single gene, whereeach copy of the gene is governed by a (possibly leaky)two-state system, and with possibly different transcrip-tion and switching rates;

c. a gene promoter with a finite, exponential number ofstates, where each promoter has an associated mRNAtranscription rate, and transitions between states occuronly between adjacent states.

The systems given in items (a), (b), and (c) have equiva-lent transition state diagrams, and therefore equivalent masterequations. As an example, see Fig. 5(c) for a visualisationof the decomposition of the 22-multistate model into simplerindependent systems. It is easily seen that a random variablefrom the steady-state copy number distribution of the systemgiven in (b) (which we know exists as the process has a finitetransition digraph) is an independent sum of random variablesfrom simpler distributions, namely Poisson and Telegraph dis-tributions, each of which have exact solutions. It follows thata random variable from the steady-state mRNA copy numberof the system given in (a) must also be amenable to the samedecomposition, as we now show explicitly.

D. Exact solution for the 2m-multistate model

Here we provide analytical solutions for any model of theclass M = {Mm | m ∈ N0}. Let m ∈ N0 and let Y be a ran-dom variable that counts the mRNA copy number of the 2m-multistate system in the steady-state. Recall that in this sys-tem the gene has m distinct activating enhancers a0, . . . ,am−1,each switching between the states bound or unbound, inde-pendently, with rates λi and µi, and contributing additivelywith rate ki, for i ∈ {0, . . . ,m− 1}, to the overall transcrip-tion rate of the gene. There is also a background rate of tran-scription K0 = KB. As before, we let P be a random variablefor a constitutive process with constant rate K0 and, for each

i ∈ {0, . . . ,m− 1}, let Ti be a random variable distributed bya Telegraph distribution, pT (n), with rates λi,µi,ki. We havealready seen that the leaky Telegraph system (which can bethought of as arising from a 21-multistate process) has the as-sociated random variable decomposition X = P + T , whereT is Telegraph distributed with rate k0. It follows from theequivalence of (a) and (b) above that this can be generalisedto:

Y = P+ ∑0≤i<m

Ti (6)

We can then directly write down the probability generatingfunction for the 2m-multistate model as:

gm(z) = exp(KB(z−1)) ∏0≤i<m

1F1 (λi,µi +λi,ki(z−1)) , (7)

which is simply the product of a Poisson probability gener-ating function and m Telegraph probability generating func-tions. The probability mass function is then recovered aspm(n) = 1

n! g(n)(0), which by the extension of the generalLeibniz rule to more than two factors gives:

pm(n)=1n! ∑

rc+r0+···+rm−1=n

[(n

rc,r0 . . . ,rm−1

)Krc

B exp(−KB)

∏0≤t<m

krti λ

(rt )i

(λi +µi)(rt )1F1(λi + rt ,µi +λi + rt ,−ki)

]. (8)

As an example of (8), the 22-multistate model has probabil-ity mass function:

p2(n) =1n! ∑

rc+r0+r1=n

[(n

rc,r0,r1

)Krc

0 exp(−K0)

kr00 λ

(r0)0

(λ0 +µ0)(r0)1F1(λ0 + r0,µ0 +λ0 + r0,−k0)

kr11 λ

(r1)1

(λ1 +µ1)(r1)1F1(λ1 + r1,µ1 +λ1 + r1,−k1)

]. (9)

Figure 5(b) gives a comparison of p2(n) and the probabilitymass function from simulated data. A second example of (8)


https://doi.org/10.1101/2020.01.05.895359


6

0.04

0.02

0.000 25 50 75 100

Prob

abili

ty, p

(n)

SimulationAnalytic

Copy number, n

FIG. 5. (a): A schematic of the 22-multistate model (b): A comparison of the analytical solution for the steady-state mRNA distribution for the22-multistate model and the probability mass function from simulated data. The parameter values for both curves are K0 = 5, k0 = 21, k1 = 33and λi = µi = 0.01, for i ∈ {0,1}. The degradation rate δ is set to 1. (c): Decomposing the 22-multistate model (left) into constitutive andTelegraph parts (right).

is a solution to the 23-multistate model, depicted in Fig. 6.In both cases we observe excellent agreement of our methodwith stochastic simulations.

While the 21-multistate model (corresponding to the leakyTelegraph model) has been solved before, to the best of ourknowledge this is not the case for 2n-multistate models withn≥ 2.

III. EFFECTS OF EXTRINSIC NOISE

In a similar way to Ref. 11, we now jointly consider theeffects of intrinsic and extrinsic noise on the probability dis-tributions arising from the class of multistate models M ={Mm | m ∈ N0}. The results given in Ref. 11 are in the con-text of the leaky and standard Telegraph processes. We willshow that the decomposition established in (6) allows us tostraightforwardly extend the main analysis and results derivedin Ref. 11 to M . More specifically, we show that

(a) intrinsic noise alone, as arising from the inherent stochas-ticity of the 2m-multistate process, never results in aheavy-tailed mRNA copy number distribution;

(b) certain forms of extrinsic noise on at least one of the tran-scription rate parameters ki (for i ∈ {0, . . . ,m−1}) or KBis a sufficient condition for a heavy-tailed mRNA distri-bution;

(c) the forms of this extrinsic noise are not limited to a spe-cific distribution, but to any distribution that satisfies cer-

tain properties.

Our argument relies on moment generating functions, whichfor a random variable X with distribution f , is defined asM f (t) := E(etx) for t ∈ R. We here take heavy-tailed to meanthat the moment generating function is undefined for positivet. As in Ref. 11, the effects of extrinsic variability on the mul-tistate systems is captured via a compound distribution; seeEq. 9 there. Items (a), (b), (c) follow directly from the fol-lowing inequality for the moment generating function of theTelegraph distribution, pT (n): (Ref. 11, Eq. 16) for all posi-tive t,

MPois

(λ

µ+λKA

)(t)≤MpT (t)≤MPois(KA)(t), (10)

where Mg denotes the moment generating function for distri-bution g. We establish an analogous result for our multistatesystems. First let m ∈ N0. Now since the moment generat-ing function for Mm is simply the product of a Poisson mo-ment generating function and m Telegraph moment generatingfunctions it follows immediately from (10) that

MPois(KB)(t) ∏0≤i<m

MPois

(λi

µi+λiki

)(t)≤Mpm(t)

≤MPois(KB)(t) ∏0≤i<m

MPois(ki)(t). (11)

Now using the well-known result that the sum of k indepen-dent random variables Xi ∼ Pois(γi) is a Pois

(∑

1≤i≤kγi)

ran-

dom variable, it follows that

MPois(η1)(t)≤Mpm(t)≤MPois(η2)(t), (12)


https://doi.org/10.1101/2020.01.05.895359


7

0 100 200 300

0.02

0.01

0.000 100 200 300

0.02

0.01

0.00

Prob

abili

ty,p

(n)

Prob

abili

ty, p

(n)

(a) (b)

SimulationAnalytic

Copy number, n Copy number, n

FIG. 6. A comparison of the analytic solution to the 23-multistate model and the probability mass function from simulated data. The parametervalues for (a) are KB = 5, k0 = 30, k1 = 50, k2 = 100 and λi = µi = 0.01, for i ∈ {0,1,2}. For (b) the parameters are KB = 5, k0 = 20, k1 = 65,k2 = 115 and λi = µi = 0.01, for i ∈ {0,1,2}.

where η1 = KB + ∑0≤i<m

λiµi+λi

ki and η2 = KB + ∑0≤i<m

ki. Now

the arguments in Ref. 11 can be followed through identically:the moment generating function Mpm(t) is bounded above bya Poissonian moment generating function that does not de-pend on any of the λi or µi. Thus pm(n) itself is not heavytailed (showing (a)), and we require compounding of at leastone of the ki or KB to make it so. Thus, (b) holds. On the otherhand, any extrinsic noise on KB or ki that renders the momentgenerating function for Pois(η1) undefined, will also result inMpm(t) being undefined and the resulting compound distribu-tion will be heavy tailed. A particular example is log normalnoise on at least one of the ki or KB.

IV. THE RECURRENCE METHOD

We now introduce a recurrence method that can be used forapproximating solutions to master equation models of genetranscription. We show that the method applies to a class ofgeneral multistate models in which any finite number of dis-crete states, along with arbitrary transitions between states,is allowed. In comparison to the 2m-multistate model, thenumber of states is no longer restricted to powers of two, andtransitions between states are no longer restricted to adjacentstates. In fact, the 2m-multistate model is a special case ofthis more general model. In addition, we show that the recur-rence method is applicable to systems that are non-linear. Weillustrate this by applying the method to the gene regulatoryfeedback model given in Ref 29.

We begin this section by introducing the master equationfor the class of more general multistate models. Followingthis, we present a step-by-step breakdown of our recurrencemethod that can be used to provide a solution of desired ac-curacy to any multistate model from this class. We then il-lustrate the method through an example; see Subsection IV C.When an analytical solution can be obtained using the resultsof the previous section, we assess the computational efficiencyand accuracy of the two solutions. We also make a compari-son of our approach with the stochastic simulation algorithm.These results are given in Subsection IV F. Finally, we discussthe applicability and usefulness of our recurrence approach to

other systems of gene transcription, including those involvingfeedback, as well as multiple stages, such as where proteincopy number is modelled and the production of mRNA andprotein are treated as separate stages.

A. The l-switch model

In this section, we consider a system with ` distinct activitystates s1, . . . ,sl , where in each state s j, the mRNA are tran-scribed according to a Poisson process with a constant tran-scription rate K j. We assume switching events between statess j1 and s j2 occur at rate ν j1 j2 As before, the degradation ofmRNA occurs as a first-order Poisson process with rate δ ,and is assumed to be independent of the activity state. Thereactions defining this `-state model can be written as:

M` :=

si

νi j−→ s j, for i, j ∈ {1, . . . , `},si

Ki−→ si +M, for i ∈ {1, . . . , `},M δ−→∅

(13)

where again M represents the mRNA molecules, and νi j,ki ≥0, and δ > 0. We let M` denote the model defined by (13),and refer to M` as the `-switch model.

Comparing with (5) we find that the 2m-multistate modelagrees with the `-switch model by setting `= 2m, and settingνi j = 0 whenever d(si,s j)> 1 and provided that the transcrip-tions rates adhere to the necessary additivity condition.

We are interested in analysing stationary distributions of thesystems described in (13). Let pi(n) denote the probability atstationarity that the gene is in state i with n mRNA molecules,for i ∈ {1, . . . , `}. The generalised chemical master equationfor the probability mass function, pi(n), is given by:

(∀n≥ 1) ∂t pi(n, t) =−

(`

∑j=1

νi j +Ki +δn

)pi(n, t)

+`

∑j=1

ν ji p j(n, t)+δ (n+1)pi(n+1, t)

+Ki pi(n−1, t), (14)


https://doi.org/10.1101/2020.01.05.895359


8

for i ∈ {1, . . . , `}.

B. The recurrence method

In the following, we give a step-by-step breakdown of therecurrence method applied to the system described in (14).After transforming the master equation (Step 1), the idea isto transform the resulting coupled system of differential equa-tions into a closed system of recurrence relations in g(n)i (1)(Step 2), where gi(z) is the generating function of pi(n). Oncethe initial conditions gi(1) have been found, we can iterativelygenerate the g(n)i (1) from the obtained recurrence relations,which in turn can be used to solve for g(z). This is detailedin Step 3. Finally, in Step 4, we can recover the solution forp(n) by way of p(n) = 1

n! g(n)(0).A similar approach for transforming differential equa-

tions into recurrence relations is employed in Ref. 33 tofind approximate solutions to l-switch models. It can bedemonstrated that the final recurrence relations obtained fromthe transformation given here agree with those obtained inRef. 33. While our method presents an alternative pathwayto obtaining recurrence relations, the primary focus here willbe on efficiency comparisons between our method, the previ-ously obtained analytical results, and stochastic simulations,as well as applicability to non-linear models that do not be-long to the l-switch class.

Step 1. Transform the master equation

As usual we can follow the generating functionapproach35,49 by defining, for each i ∈ {1, . . . , `}, a sta-tionary generating function:

gi(z) =∞

∑n=0

zn pi(n). (15)

We transform the master equation (14) by multiplying throughby zn and summing over n from 0 to ∞, obtaining a system of` coupled differential equations:

δ (z−1)g′i(z) =−

(`

∑j=1

νi j +Ki(1− z)

)gi(z)+

`

∑j=1

ν jig j(z).

(16)

Step 2. Transform the differential equations into a system offirst-order recurrence relations

Noting that in general the kth derivative of a function ofthe form (z− 1)h(z) is (z− 1)h(k)(z)+ kh(k−1)(z), by differ-entiating (n times) the equations in (16), we obtain for each

i ∈ {1, . . . , `}:

δ (z−1)g(n+1)i (z) =−

(δn+

`

∑j=1

vi j +Ki(1− z))

g(n)i (z)

+King(n−1)i (z)+

`

∑j=1

v jig(n)j (z). (17)

Evaluating each of the equations in (17) at z = 1, we imme-diately obtain the system of recurrence relations:(

δn+`

∑j=1

vi j

)g(n)i (1) = King(n−1)

i (1)+`

∑j=1

v jig(n)j (1). (18)

Rearranging each of these so that the LHS is equal to 0, the re-sulting system of equations can be represented in matrix formas:

RX = 0, (19)

where R is a `×2` matrix defined as:

[A B], (20)

where A is an `× ` matrix defined by:

Ai j :=

δn+`

∑k=1

vik, for i = j and k 6= i

−v ji, for i 6= j(21)

and B is a `× ` matrix defined by:

Bi j :=

{Kin, for i = j0 for i 6= j.

(22)

We also have

X = [g(n)1 (1), . . . ,g(n)` (1),g(n−1)1 (1), . . . ,g(n−1)

` (1)]T .

Eq. (18) constitutes a system of first-order linear recurrencerelations and can be decoupled by applying Gaussian elimina-tion to (19). This enables us to write each of the g(n)i (1) interms of the g(n−1)

j . We let R denote the recurrence relationsobtained from the resulting matrix; as usual with Gaussianelimination, it is not practical to present a general form for thefinal simplified matrix.

Step 3. Iteratively solve the recurrence relations

We require first the initial conditions gi(1), for i ∈{1, . . . , `}. These can be found by evaluating equation (16)at z = 1, and solving the resulting system of linear equations.This system can be represented in matrix form as:

CX = 0, (23)

where the matrix C is defined as:

Ci j :=

ν ji, for i 6= j

−`

∑k=1

νik, for i = j and k 6= i, (24)


https://doi.org/10.1101/2020.01.05.895359


9

and we have

X =[g1(1), . . . ,g`(1)

]T. (25)

Now, as ∑1≤i≤`

gi(1) = 1, the gi(1) are found as a normalised

version of Null(A), which is necessarily one dimensional forthese multistate systems.

Now, if gi(z) is considered as a power series in (z−1), thenthe coefficient of (z− 1)n is 1

n! g(n)i (1). Thus, the coefficients

hi(n) := 1n! g(n)i (1) may be generated by iteration using the

first-order recurrence relations, R, obtained from (20). Oneway to do this would be to simply solve the recurrence rela-tions for g(n)i (1) and to subsequently divide by n!. However,this involves computing a large number (g(n)i (1)) and dividingit by another large number (n!), which can lead to numericalproblems. Instead, we transform the recurrence relations forg(n)i (1) into recurrence relations for the hi(n).

Computationally, we may store the values for hi(n) in asystem of ` lists from which we can compute the coefficientsh(n) = ∑i hi(n) of (z−1)n in the expansion of g(z) as a powerseries in (z−1). So,

g(z) =∞

∑i=0

h(i)(z−1)n. (26)

Step 4. Recover the stationary probability distribution.

As p(n) is the coefficient of zn in the expansion of g(z) asa power series in z, we are able to recover p(n) by way ofp(n) = 1

n! g(n)(0). It then follows from (26) that

p(n) =∞

∑i=0

(i+1)(n)

n!h(n+ i)(−1)i, (27)

where again x(n) denotes the rising factorial. For practicalimplementations, we truncate the sum in (27) after a certainnumber of terms. The distribution p(n) typically becomesnegligibly small for n larger than a certain value, say 100.Computing h(n) up to n = 500 for example would thus leaveat least 400 terms for approximating p(n) for n≤ 100. For thestudied example systems we found that 100−350 terms weresufficient to achieve accurate results.

C. Example 1: the 3-switch model

We illustrate the recurrence method for the case l = 3.Steps 1 and 2. From (20), the matrix, R := [A B] is com-

prised of:

A =

δn+(v12 + v13) −v21 −v31−v12 δn++(v21 + v23) −v32−v13 −v23 δn+(v31 + v32)

and

B =

K1n 0 00 K2n 00 0 K3n

We also have that

X = [g(n)1 (1),g(n)2 (1),g(n)3 (1),g(n−1)1 (1),g(n−1)

2 (1),g(n−1)3 (1)]T .

After applying Gaussian elimination to R, we obtain the fol-lowing system of first-order linear recurrence relations, for alli ∈ {1,2,3} and j < k from {1,2,3}\{i}:

g(n)i (1) =1

D(n)

[Ki[δ

2n2 +δn(v ji + v jk + vki + vk j)

+ vk jv ji + v jkvki + v jivki]g(n−1)

i (1)

+K j[δnv ji + v jkvki + v ji(vk j + vki)

]g(n−1)

j (1)

+Kk[δnvki + v jkvki + v ji(vk j + vki)

]g(n−1)

k (1)],

(28)

where

D(n) =δ[δ

2n2 +δn(v12 + v13 + v21 + v23 + v31 + v32)

+ v12(v23 + v31 + v32)+ v13(v21 + v23 + v32)

+ v21(v31 + v32)+ v23v31]. (29)

Step 3. We first find the initial conditions. From (24), thematrix, C, is given by:

C =

−(v12 + v13) v21 v31v12 −(v21 + v23) v32v13 v23 −(v31 + v32)

It can be shown that the null space, Null(C), is given by:

Null(A) =

v21v31 + v23v31 + v21v32v12v31 + v12v32 + v13v32v13v21 + v12v23 + v13v23

Thus, the initial conditions are:

g1(1) =1N(v21v31 + v23v31 + v21v32), (30)

g2(1) =1N(v12v31 + v12v32 + v13v32), (31)

g3(1) =1N(v13v21 + v12v23 + v13v23). (32)

Here N is the normalisation constant and is equal to3∑

r=1ar,1,

where each ar,1 is an element of the matrix Null(A).Next, we need to compute the coefficients hi(n) :=

1n! g(n)i (1) of gi(z) considered as a power series in (z− 1). Asmentioned in Step 3 above, we transform the recurrence rela-tions for g(n)i (1) into recurrence relations for the hi(n). These


https://doi.org/10.1101/2020.01.05.895359


10

can be obtained from (28) by dividing the RHS of each equa-tion by n: for each {i, j,k}= {1,2,3},

h(n)i (1) =g(n)i (1)D(n)n

(33)

Step 4. Finally we recover the stationary distribution, p(n).For a given set of parameter values, we generated a list ofvalues for h(n) of length 500, and used this to approximate theprobability mass function (27) from approximately 400 terms.The results for two different sets of parameter combinationsare given in Fig. 7.

D. Discussion of wider applicability

So far, we have seen how the recurrence method can be ap-plied to linear l-switch models. In this section, we discuss thewider applicability of the method, and demonstrate that it canin principle also be applied to certain non-linear systems. Asan example, we show that the method can be used to approxi-mate the solution of the gene regulatory feedback model givenin Ref. 29.

We would like to point out that the applicability to this sys-tem is limited, however. Application of the recurrence methodto the feedback model relies on the initial conditions beingsupplied, and these are currently derived from the known an-alytic solution. This apparent circularity does not underminethe overall approach. The problem of solving chemical mas-ter equation systems can essentially be broken into two sep-arate tasks: solving for the initial conditions for the gener-ating function and solving for the probability mass function.Indeed, there are situations where the initial conditions of agiven system can be easily found, but the probability massfunction remains unknown; the `-model is an example. Onthe other hand, there are situations where a general form forthe probability mass function can be stated for a given set ofinitial conditions, but the initial conditions for the system arenot yet known; this is the situation presented in Ref. 29. Forthe feedback model considered here, we are able to illustrateapplicability to one half of the problem: for a given set of ini-tial conditions for the generating function we can provide anapproximation to the probability mass function.

We now apply the recurrence approximation to the systemof differential equations arising from the feedback loop inRef. 29. In this model of a single gene, the protein producedcan bind to the promoter of this same gene, regulating its ownexpression. The process is modelled by recording only thenumber of proteins and the state of the gene promoter, whichcan be either bound or unbound. Here the rate of productionof proteins depends on the state of the promoter region of thegene. Following Ref. 29, we let ru and rb denote the proteinproduction rates for the bound and unbound states, respec-tively. We let k f be the degradation rate of proteins, and let kbbe the degradation rate of the bound protein. Also letting Du,Db denote the unbound and bound states of the gene promoter,respectively, and P denote the protein, the reaction scheme for

the process can be written as:

Duru−→ Du +P,

Dbrb−→ Db +P,

Pk f−→∅,

Dbkb−→ Du,

Du +Psbsu

Db

(34)

We refer the reader to Ref. 29 for further details of the modelincluding the associated master equation (Ref. 29, Eq. 8). Ap-plying the recurrence method to the feedback model (34), weobtain an approximation to the probability mass function. Fortwo sets of parameter values, we compare this with the knownanalytic solution and present the results in Fig. 8. For detailsof the application of the method, including the derivation ofthe recurrence relations used for the approximation see Sec-tion A 2 of the appendix. The analytic solution for this modelis given as equation (A23) in the appendix.

To this point, our discussion of the applicability of the re-currence method has concerned only chemical master equa-tion models with a two-dimensional state space (tuples con-sisting of the state of the gene and the mRNA copy numberor protein). Preliminary investigations suggest the methodcan be extended to models of gene transcription giving riseto higher dimensional state spaces. The three-stage modelof Shahrezaei and Swain31 is an example. In this model, theproduction of mRNA and protein are treated as separate pro-cesses, resulting in a three-dimensional state space. In suchcases, the recurrence method gives rise to equations contain-ing partial derivatives. This makes the task of finding the ini-tial conditions for the recurrence approach even more chal-lenging. A characterisation of when the recurrence methodis applicable to chemical master equation models will be thefocus of future work.

E. Notes on numerical implementation

We found that the recurrence method can become numeri-cally unstable when solved with standard numerical precisionfor some of the studied example systems and parameter val-ues. However, for sufficiently larger numerical precision weare able to obtain numerically stable and accurate results forall systems. We used Mathematica for the numerically dif-ficult cases and provide example code in the supplementaryinformation.

F. Assessment of computational efficiency

We now assess the computational efficiency of the re-currence method by examining how the computational timescales with system size. We do this for the leaky Telegraphmodel, using the analytical solution to verify the accuracy ofthe result. As displayed by the example in Fig. 9(a), we findthe two methods to be in complete agreement provided that


https://doi.org/10.1101/2020.01.05.895359


11

0 10 20 30

0.04

0.02

0.00

0.06

0 20 40 60 80

0.02

0.01

0.00

0.03

40

Prob

abili

ty,p

(n)

Prob

abili

ty, p

(n)

(a) (b)

SimulationRecurrence


FIG. 7. Stationary distributions of the 3-switch model in Section IV C. The figure shows the results obtained from stochastic simulationsand from the recurrence method describes in Section IV. Figure (a) has parameters v12 = v21 = v31 = 0.1, v13 = 0.4, v32 = 0.9, and K1 = 2,K2 = 16, K3 = 5. Figure (b) displays a trimodal distribution with parameters v12 = v13 = 0.045, v21 = v32 = 0.035, v31 = v23 = 0.015, K1 = 5,K2 = 30, K3 = 60. We used 300 (a) and 305 (b) terms respectively for the recurrence method to approximate the sum in (27).

0 10 20 30

0.04

0.02

0.00

0.06

0 20 40 60 80

0.02

0.01

0.00

0.03

Prob

abili

ty,p

(n)

Prob

abili

ty, p

(n)

(a) (b)

AnalyticRecurrence


FIG. 8. Stationary distributions of the feedback model in Section IV D. We compare the analytic solution with the results obtained from therecurrence approximation. Figure (a) has parameters ρu = 0.125, ρb = 20, θ = 0.5, σu = 0.2, σb = 0.35, and (b) has parameters ρu = 60,ρb = 6, θ = 0, σu = 0.5, σb = 0.004. We used 110 (a) and 300 (b) terms, respectively, for the recurrence method to approximate the probabilitymass function.

a sufficient number of terms are considered in the expansion.For all of the models considered in this paper, the maximumcopy number that must be considered nmax, is directly relatedto the largest rate, in this case K1. We therefore apply themethod for a number of K1, some examples of which are dis-played in Fig. 9(b). We find that for each case a differentnumber of terms in the expansion, and consequently a differ-ent computational cost, are required (Figs. 9(c), (d)). Scalingof the number of terms is observed to be approximately linearwhile scaling of the run-time is polynomial of O(n3

max).Interestingly, we observe in the numerical examples that the

main computational cost of the recurrence method does notoccur from solving the recurrence relation itself, but insteadfrom the subsequent reconstruction of the probability massfunction according to (27). This suggests that the computa-tional cost of our method should be similar to or less than thecomputational cost when the analytic solution to the generat-ing functions is known. For the feedback model studied abovewe observe that this is indeed the case.

For systems where no analytic solution is known one typi-cally utilises the SSA to simulate sample trajectories from thestochastic process. Use of the SSA to obtain distributions isnotoriously expensive Ref.34, and requires a sufficiently largenumber of statistically independent samples to be obtained.

For the results shown in Fig. 7 for the 3-switch model, for ex-ample, we found that simulating 300,000 samples using ourimplementation of the SSA was about two orders of magni-tudes slower than the recurrence method (∼ 57 (a) and ∼ 155(b) seconds using the SSA, respectively, versus ∼ 0.1 (a) and∼ 2 (b) seconds using the recurrence method.).

It is worth noting that even for systems for which analyticsolutions exist, the solutions are typically expressed in termsof so-called “special functions” such as Bessel or Hypergeo-metric functions, which still need to be evaluated numericallyand often by recurrence relations or series expansions Ref. 50.As an example see the probability mass function of the feed-back model (A23) which involves a sum over confluent Hy-pergeometric functions. Such solutions can hence still be nu-merically expensive if one needs to evaluate over many pointsin state space. Indeed, we find that our recurrence methodis considerably faster for approximating the probability massfunction of the feedback model than evaluating the analyticexpression.


https://doi.org/10.1101/2020.01.05.895359


12

0.04

0.02

0.000 20 40 60 80

500

400

300

200

100

2525 50 75 100 125

0.05

0.00

0.10

0 25 50 75 100

0

-1

-2

-3

-4

-53.5 4.0 4.5 5.0

nmax

Num

ber

ofte

rms

Prob

abili

ty,p

(n)

Prob

abili

ty, p

(n)

O(n3)Results

log(nmax)

AnalyticRecurrence

K1=10K1=30K1=50K1=70K1=90

n(c) Required number of terms in expansion (d) Runtime

(a) Comparison for K1=50 (b) Distribution for varying K1

n

log(

runt

ime)

, [s]

FIG. 9. Scaling of the computational cost with increasing system size for the two-state model. In Fig. (a), we compare the analytic probabilitymass function for the leaky Telegraph model with the approximation obtained by recurrence method. The parameters are λ = 0.1, µ = 0.1,K0 = 5 and K1 = 50 and we used 270 terms to approximate the sum in (27) for the recurrence method. In (b), we plot the different leakyTelegraph distributions obtained by varying the parameter K1. Figure (c) shows the number of terms required in the recurrence expansion as afunction of the maximum copy number with non-negligible probability in the system. We observe a linear relationship. Figure (d) shows thatthe runtime of the recurrence method scales roughly as the cube of the maximum copy number.

V. CONCLUSION

Despite the importance of stochastic gene expression mod-els in the analysis of intracellular behaviour, their mathemat-ical analysis still poses considerable challenges. Analytic so-lutions are only known for special cases, while few system-atic approximation methods exist and stochastic simulationsare computationally expensive. In this work, we provide twopartial solutions to these challenges.

In this first part, we developed an analytic solution methodfor chemical master equations of a certain class of multistategene expression models, which we call 2m-multistate models.The method is based on a decomposition of a process into in-dependent sub-processes. Convolution of the solutions of thelatter, which are easily obtained, gives rise to the solution ofthe full model. The solutions can be derived straightforwardlyand are computationally efficient to evaluate. While most ex-isting solutions only apply to specific models, our decomposi-tion approach applies to the entire class of 2m-multistate mod-els. This class covers a wide range of natural examples ofmultistate gene systems, including the leaky Telegraph model.

In the second part of this work, we derived a recurrencemethod for directly approximating steady-state solutions tochemical master equation models. The method was employedto approximate solutions to a broad class of linear multi-state gene expression models, which we term `-switch mod-els, most of which currently have no known analytic solution.In particular, the class of `-switch models to which the recur-

rence method applies is more general than (and contains) theclass of 2m-multistate models to which the analytic methodapplies. The recurrence method is not limited to linear `-switch models, however, but can also be applied to certainnon-linear systems. Specifically, we have shown that givensome known initial conditions, the method can be used to ap-proximate the solution to a gene regulatory network with afeedback loop. For all studied systems, we found an excellentagreement of the recurrence method and analytic solutions orstochastic simulations.

In cases where no analytic solution is known, the recurrencemethod was found to significantly outperform stochastic sim-ulations in terms of computational efficiency. Even in caseswhere an analytic solution exists, we found the method to becomputationally more efficient than the evaluation of the an-alytic solution. While a characterisation of precisely whichmodels the method is applicable to is not yet known, the re-sults presented here suggest the recurrence method is an accu-rate and flexible tool for analysing stochastic models of geneexpression. We believe that together with the presented ana-lytical method it will contribute to a deeper understanding ofthe underlying genetic processes in biological systems.

SUPPLEMENTARY MATERIAL

See the supplementary material for sample Mathematicacode for each application of the recurrence method considered


https://doi.org/10.1101/2020.01.05.895359


13

in the present paper.

ACKNOWLEDGMENTS

LH and MPHS gratefully acknowledge support from theUniversity of Melbourne DVCR fund; RDB and MPHS werefunded by the BBSRC through Grant BB/N003608/1; DS wasfunded by the BBSRC through Grant BB/P028306/1.

Appendix A

1. Recurrence relations for the leaky Telegraph model

The following recurrence relations and initial conditions,obtained by applying the recurrence method to the leaky Tele-graph model, can be used to give an approximation to theprobability mass function (1).

h0(1) = µ/(λ +µ) and h1(1) = λ/(λ +µ); (A1)

h0(n) =µ

(nδ +µ)(nδ +λ )−λ µ

[nδ +µ

µKIh0(n−1)

+KAh1(n−1)]

(A2)

h1(n) =λ

(nδ +µ)(nδ +λ )−λ µ

[nδ +λ

λKAh1(n−1)(1)

+KIh0(n−1)]. (A3)

2. Recurrence relations for the feedback model

We provide details of the recurrence approximation appliedto the system of differential equations arising from the feed-back loop in Ref. 29. For convenience we list here the dimen-sionless variables that appear there:

ρu = ru/k f , ρb = rb/k f , θ = kb/k f , σu = su/k f , σb = sb/k f(A4)

and the parameters:

Σb = 1+σb, R = ρu−ρbΣb. (A5)

Step 1. The conveniently transformed equations fromRef. 29 (Equations (12) and (13)) are:

ρu(z−1)g0(z)− (z−1)g′0(z)+(θ +σuz)g1(z)

−σbzg′0(z) = 0 (A6)

ρb(z−1)g1(z)− (z−1)g′1(z)− (θ +σu)g1(z)

+σbg′0(z) = 0 (A7)

Step 2. Differentiating (n times) the equations (A6) and (A7)gives:

((z−1)+σbz)g(n+1)0 (z)+(n+σbn)g(n)0 (z)= (θ +σuz)g(n)1 (z)

+σung(n−1)1 (z)+ρu(z−1)g(n)0 (z)+ρung(n−1)

0 (z) (A8)

and

(z−1)g(n+1)1 (z)+ng(n)1 (z) = (ρb(z−1)− (θ +σu))g(n)1 (z)

+nρbg(n−1)1 (z)+σbg(n+1)

0 (z). (A9)

Evaluated at z = 1, we immediately obtain the recurrence re-lations

σbg(n+1)0 (1) = (θ +σu)g

(n)1 (1)+σung(n−1)

1 (1)

− (n+σbn)g(n)0 (1)+ρung(n−1)0 (1) (A10)

(n+θ +σu)g(n)1 (1) = ρbng(n−1)

1 (1)+σbg(n+1)0 (1). (A11)

Substituting (A10) into (A11) leads to the following systemof second-order linear recurrence relations: for n≥ 1,

g(n+1)0 (1) =

1σb

[(θ +σu)g

(n)1 (1)+σung(n−1)

1 (1)

− (σb +1)ng(n)0 (1)+ρung(n−1)0 (1)

](A12)

g(n)1 (1) = (ρb +σu)g(n−1)1 (1)− (σb +1)g(n)0 (1)

+ρug(n−1)0 (1). (A13)

Step 3. Provided that the initial conditions g0(1), g1(1)and g′0(1) are known (here g′0(1) =

θ+σuσb

g1(1)), the values of

g(n)0 (1) and g(n)1 (1) can be obtained by iteration. In this exam-ple, we use the initial conditions calculated from the knownanalytic solution (see Section A 3 for details of the initial con-ditions and the analytic solution).

Step 4. As before, we use a simplification that avoids thecomputational issue of calculating one huge number g(n)i (1),and then dividing by another huge number n!. The actual coef-ficients hi(n) := 1

n! g(n)i (1) may be generated by iteration usingthe following recurrence relations obtained from (A12) and(A13):

h(n+1)0 (1) =

1σb(n+1)

[(θ +σu)h

(n)1 (1)+σuh(n−1)

1 (1)

− (σb +1)nh(n)0 (1)]+ρuh(n−1)0 (1)

](A14)

h(n)1 (1) =[ (ρb +σu)

nh(n−1)

1 (1)− (σb +1)h(n)0 (1)

+ρu

nh(n−1)

0 (1)]. (A15)


https://doi.org/10.1101/2020.01.05.895359


14

Note that here Equation (A14) is obtained from (A12) by di-viding each term gi(n+ 1− j) by (n+ 1)( j+1), and similarly,Equation (A15) is obtained from (A13) by dividing each termgi(n− j) by n( j), where x(i) denotes the falling factorial of x.

Step 5. We are now able to recover the stationary proba-bility mass function. Again, we generate a list of values forh(n) and use this to approximate the probability mass func-tion (27). A comparison of the approximate solution and theknown analytic solution is given in Figure 8.

3. Explicit solution to the feedback model

Here we provide an explicit solution for the stationary prob-ability mass function of the regulatory feedback loop consid-ered in Section IV D. From Refs. 29 and 30, the exact solu-tions for the generating functions are given by:

g0(z) = Aeρb(z−1)[

Σb

σb

α

ρu1F1(α +1,β ,w)

+θ −α

ρu−ρb1F1(α,β ,w)

], (A16)

and

g1(z) = Aeρb(z−1)1F1(α,β ,w), (A17)

where

α = θ +σu(ρu−ρp)

R, (A18)

β = 1+θ +1

Σb

[σu +ρu

(Σb−1

Σb

)], (A19)

w = RΣbz−1

Σ2b

, (A20)

and

A−1 =Σb

σb

α

ρu1F1(α+1,β ,w1)+

(1+

θ −α

ρu−ρb

)1F1(α,β ,w1).

(A21)Here w1 denotes w evaluated at z= 1. The generating functionis then obtained by g(z) = g0(z)+g1(z). So,

g(z) = Aeρb(z−1)[

Σb

σb

α

ρu1F1(α +1,β ,w)

+(

1+θ −α

ρu−ρb

)1F1(α,β ,w)

](A22)

It follows that the stationary probability mass function is:

p(n) =An!

n

∑m=0

(nm

)ρ

n−mb e−ρb

Rmα(m)

Σmb β (m)[

Σb(α +m)

σbρu1F1(α +m+1,β +m,w0)

+

(1+

θ −α

ρu−ρb

)1F1(α +m,β +m,w0)

]. (A23)

Again from (A22), the initial conditions g0(1) and g1(1) are:

g0(1) = A[

Σb

σb

α

ρu1F1(α +1,β ,w1)+

θ −α

ρu−ρb1F1(α,β ,w1)

](A24)

and

g1(1) = A 1F1(α,β ,w1). (A25)

1M. B. Elowitz and S. Leibler, “A synthetic oscillatory network of transcrip-tional regulators,” Nature 403, 335–339 (2000).

2M. B. Elowitz, A. J. Levine, E. D. Siggia, and P. S. Swain, “Stochastic geneexpression in a single cell,” Science 297, 1183–1186 (2002).

3A. P. Gasch, F. B. Yu, J. Hose, L. E. Escalante, M. Place, R. Bacher, J. Kan-bar, D. Ciobanu, L. Sandor, I. V. Grigoriev, C. Kendziorski, S. R. Quake,and M. N. McClean, “Single-cell rna sequencing reveals intrinsic and ex-trinsic regulatory heterogeneity in yeast responding to stress,” PLOS Biol.15, e2004050 (2017).

4A. Raj, C. S. Peskin, D. Tranchina, D. Y. Vargas, and S. Tyagi, “StochasticmRNA synthesis in mammalian cells,” PLoS Biol. 4, e309 (2006).

5J. M. Raser and E. K. O’Shea, “Noise in gene expression: origins, conse-quences, and control,” Science 304, 1811–1814 (2004).

6P. S. Swain, M. B. Elowitz, and E. D. Siggia, “Intrinsic and extrinsic con-tributions to stochasticity in gene expression,” Proc. Natl. Acad. Sci. 99,12795–12800 (2002).

7M. S. H. Ko, “A stochastic model for gene induction,” J. Theor. Biol. 153,181–194 (1991).

8J. Peccoud and B. Ycart, “Markovian modeling of gene-product synthesis,”Theor. Popul. Biol. 48, 222–234 (1995).

9I.-B. Srividya, F. Hayot, and C. Jayaprakash, “Transcriptional pulsingand consequent stochasticity in gene expression,” Phys. Rev. E 79, 031911(2009).

10J. Paulsson and M. Ehrenberg, “Random signal fluctuations can reducerandom fluctuations in regulated components of chemical regulatory net-works,” Phys. Rev. Lett. 84, 5447–5450 (2000).

11L. Ham, R. D. Brackston, and M. P. Stumpf, “Ex-trinsic noise and heavy-tailed laws in gene expres-sion,” bioRxiv:10.1101/623371v1 (2019), 10.1101/623371,https://www.biorxiv.org/content/early/2019/04/30/623371.full.pdf.

12A. Coulon, O. Gandrillon, and G. Beslon, “On the spontaneous stochasticdynamics of a single gene: complexity of the molecular interplay at thepromoter,” BMC Syst. Biol. 4 (2010), 10.1186/1752-0509-4-2.

13A. Sánchez and J. Kondev, “Transcriptional control ofnoise in gene expression,” PNAS 105, 5081–5086 (2008),https://www.pnas.org/content/105/13/5081.full.pdf.

14T. C. Voss and G. L. Hager, “Dynamic regulation of transcriptional statesby chromatin and transcription factors,” Nat. Rev. Genet. 15, 69–81 (2014).

15T. C. Voss, R. L. Schiltz, M.-H. Sung, T. A. Johnson, S. John, andG. L. Hager, “Combinatorial probabilistic chromatin interactions pro-duce transcriptional heterogeneity,” J. Cell Sci. 122, 345–356 (2009),https://jcs.biologists.org/content/122/3/345.full.pdf.

16A. Sánchez, H. G. Garcia, J. D., R. Phillips, and J. Kondev, “Effect of pro-moter architecture on the cell-to-cell variability in gene expression,” PLoSComput. Biol. 7 (2011), 10.1371/journal.pcbi.1001100.

17S. A. Sevier, D. A. Kessler, and H. Levine, “Mechanical bounds to tran-scriptional noise,” Proc. Natl. Acad. Sci. 113, 13983–13988 (2016).

18T. B. Kepler and T. C. Elston, “Stochasticity in transcriptional regulation:origins, consequences, and mathematical representations,” Biophys. J. 81,3116–3136 (2001).

19P. Thomas, N. Popovic, and R. Grima, “Phenotypic switching in gene reg-ulatory networks,” Proceedings of the National Academy of Sciences of theUnited States of America 111, 6994–6999 (2014).

20M. S. Sherman, K. Lorenz, M. H. Lanier, and B. A. Cohen, “Cell-to-cellvariability in the propensity to transcribe explains correlated fluctuations ingene expression,” Cell Systems 1, 315–325 (2015).

21O. Lenive, P. D. W. Kirk, and M. P. H. Stumpf, “Inferring extrinsic noisefrom single-cell gene expression data using approximate bayesian compu-tation,” BMC Syst. Biol. 10, 81 (2016).


https://doi.org/10.1101/2020.01.05.895359


15

22C. V. Harper, B. Finkenstadt, D. J. Woodcock, S. Friedrichsen, S. Sem-prini, L. Ashall, D. G. Spiller, J. J. Mullins, D. A. Rand, J. R. E. Davis,and M. R. H. White, “Dynamic analysis of stochastic transcription cycles,”PLoS Biology 9 (2011), 10.1371/journal.pbio.1000607.

23D. M. Suter, N. Molina, D. Gatfield, K. Schneider, U. Schi-bler, and F. Naef, “Mammalian genes are transcribed withwidely different bursting kinetics,” Science 332, 472–474 (2011),https://science.sciencemag.org/content/332/6028/472.full.pdf.

24B. Zoller, D. Nicolas, N. Molina, and F. Naef, “Structure of silent transcrip-tion intervals and noise characteristics of mammalian genes,” Mol. Syst.Biol. 11 (2015).

25J. Dattani, Exact solutions of master equations for the analysis of gene tran-scription models, Ph.D. thesis, Imperial College London (2016).

26U. Herbach, “Stochastic gene expression with a multistate promoter:Breaking down exact distributions,” SIAM J. Appl. Math 79, 1007–1029(2019).

27T. Zhou and J. Zhang, “Analytical results for a multistate gene model,”SIAM J. Appl. Math. 72, 789–818. (2012).

28J. Zhang, L. Chen, and T. Zhou, “Analytical distribution and tunability ofnoise in a model of promoter progress,” Biophys J. 102, 1247–1257 (2012).

29R. Grima, D. R. Schmidt, and T. J. Newman, “Steady-state fluctuations ofa genetic feedback loop: an exact solution,” J. Chem. Phys. 137, 035104(2012).

30G. C. P. Innocentini, A. F. Ramos, and J. E. M. Hornos, “A comment on“steady-state fluctuations of a genetic feedback loop: an exact solution”,”J. Chem. Phys. 142 (2015).

31V. Shahrezaei and P. S. Swain, “Analytical distributions for stochastic geneexpression,” Proc. Natl. Acad. Sci. 105, 17256–17261 (2008).

32C. Zhixing, T. Filatova, D. A. Oyarzún, and R. Grima,“Multi-scale bursting in stochastic gene expression,”bioRxiv:10.1101/717199v1 (2019), 10.1101/717199,https://www.biorxiv.org/content/early/2019/07/28/717199.full.pdf.

33G. C. P. Innocentini, M. Forger, A. F. Ramos, O. Radulescu, and J. E. M.Hornos, “Multimodality and flexibility of stochastic gene expression,” Bul-letin of Mathematical Biology 75, 2600–2630 (2013).

34D. Schnoerr, G. Sanguinetti, and R. Grima, “Approximation and infer-ence methods for stochastic biochemical kinetics–a tutorial review,” J. Phys.Math. Theor. 50, 093001 (2017).

35C. Gardiner, Stochastic Methods: A Handbook For The Natural And SocialSciences (Springer, 2009).

36D. T. Gillespie, A. Hellander, and L. R. Petzold, “Perspective: Stochasticalgorithms for chemical kinetics,” J. Chem. Phys. 138, 05B201_1 (2013).

37G. Neuert, B. Munsky, R. Z. Tan, L. Teytelman, M. Khammash, and A. vanOudenaarden, “Systematic Identification of Signal-Activated StochasticGene Regulation,” Science 339, 584–587 (2013).

38L. A. Sepúlveda, H. Xu, J. Zhang, M. Wang, and I. Golding, “Measure-ment of gene regulation in individual cells reveals rapid switching betweenpromoter states,” Science 351, 1218–1222 (2016).

39G. Corliss and Y. F. Chang, “Solving Ordinary Differential Equations UsingTaylor Series,” ACM Trans. Math. Softw. 8, 114–144 (1982).

40D. T. Gillespie, “Exact stochastic simulation of coupled chemical reac-tions,” J. Phys. Chem. 81, 2340–2361 (1977).

41L. Huang, Z. Yuan, P. Liu, and T. Zhou, “Effects of promoter leakage ondynamics of gene expression,” BMC Syst Biol 9, 16 (2015).

42A. Sanchez and I. Golding, “Genetic Determinants and Cellular Constraintsin Noisy Gene Expression,” Science 342, 1188–1193 (2013).

43C. R. Clapier and B. R. Cairns, “The Biology of Chromatin RemodelingComplexes,” Annu. Rev. Biochem. 78, 273–304 (2009).

44M. Abramowitz and I. A. Stegun, Handbook Of Mathematical FunctionsWith Formulas, Graphs, And Mathematical Tables (U.S. Govt. Print. Off.,1965).

45M. Thattai and A. van Oudenaarden, “Intrinsic noise in gene regulatorynetworks,” Proceedings of the National Academy of Sciences 98, 8614–8619 (2001).

46B. B. Kaufmann and A. van Oudenaarden, “Stochastic gene expression:From single molecules to the proteome,” Current Opinion in Genetics &Development 17, 107–112 (2007).

47B. Munsky, G. Neuert, and A. van Oudenaarden, “Using gene expressionnoise to understand gene regulation,” Science 336, 183–187 (2012).

48D. A. Berry and B. W. Lindgren, Statistics: Theory and Methods(Brooks/Cole Publishing Company, 1990).

49M. F. Weber and E. Frey, Reports on progress in physics. Physical Society(Great Britain) 80, 046601 (2017).

50J. W. Pearson, S. Olver, and M. A. Porter, “Numerical methods for thecomputation of the confluent and Gauss hypergeometric functions,” Numer.Algorithms 74, 821–866 (2017).


https://doi.org/10.1101/2020.01.05.895359


exactly solvable models of stochastic gene expression...2020/01/05 · exactly solvable models of...

Documents