fitting ordinary differential equations to short time course data · fitting ordinary differential...

27
doi: 10.1098/rsta.2007.2108 , 519-544 366 2008 Phil. Trans. R. Soc. A Stark Daniel Brewer, Martino Barenco, Robin Callard, Michael Hubank and Jaroslav course data Fitting ordinary differential equations to short time References elated-urls http://rsta.royalsocietypublishing.org/content/366/1865/519.full.html#r Article cited in: html#ref-list-1 http://rsta.royalsocietypublishing.org/content/366/1865/519.full. This article cites 31 articles, 2 of which can be accessed free Email alerting service here in the box at the top right-hand corner of the article or click Receive free email alerts when new articles cite this article - sign up http://rsta.royalsocietypublishing.org/subscriptions go to: Phil. Trans. R. Soc. A To subscribe to on October 16, 2012 rsta.royalsocietypublishing.org Downloaded from

Upload: others

Post on 20-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

doi: 10.1098/rsta.2007.2108, 519-544366 2008 Phil. Trans. R. Soc. A

 StarkDaniel Brewer, Martino Barenco, Robin Callard, Michael Hubank and Jaroslav course dataFitting ordinary differential equations to short time  

References

elated-urlshttp://rsta.royalsocietypublishing.org/content/366/1865/519.full.html#r

Article cited in: html#ref-list-1http://rsta.royalsocietypublishing.org/content/366/1865/519.full.

This article cites 31 articles, 2 of which can be accessed free

Email alerting service herein the box at the top right-hand corner of the article or click Receive free email alerts when new articles cite this article - sign up

http://rsta.royalsocietypublishing.org/subscriptions go to: Phil. Trans. R. Soc. ATo subscribe to

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

Page 2: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

Fitting ordinary differential equations to shorttime course data

BY DANIEL BREWER1,2, MARTINO BARENCO

1,2, ROBIN CALLARD1,2,

MICHAEL HUBANK1,2

AND JAROSLAV STARK3,4,*

1Institute of Child Health, University College London, 30 Guilford Street,London WC1N 1EH, UK

2CoMPLEX, University College London, 4 Stephenson Way,London NW1 2HE, UK

3Department of Mathematics, and 4Centre for Integrative Systems Biology atImperial College, Imperial College London, London SW7 2AZ, UK

Ordinary differential equations (ODEs) are widely used to model many systems in physics,chemistry, engineering and biology. Often one wants to compare such equations withobserved time course data, and use this to estimate parameters. Surprisingly, practicalalgorithms for doing this are relatively poorly developed, particularly in comparisonwith thesophistication of numerical methods for solving both initial and boundary value problems fordifferential equations, and for locating and analysing bifurcations. A lack of good numericalfitting methods is particularly problematic in the context of systems biology where only ahandful of time points may be available. In this paper, we present a survey of existingalgorithms and describe themain approaches.We also introduce and evaluate a new efficienttechnique for estimating ODEs linear in parameters particularly suited to situations wherenoise levels are high and the number of data points is low. It employs a spline-basedcollocation scheme and alternates linear least squares minimization steps with repeatedestimates of the noise-free values of the variables. This is reminiscent of expectation–maximization methods widely used for problems with nuisance parameters or missing data.

Keywords: parameter estimation; ordinary differential equation; time series; splines;collocation; systems biology

On

*ALon

1. Introduction

Ordinary differential equations (ODEs) are one of the most popular frameworks fordescribing the temporal evolution of a wide variety of systems in physics, chemistry,engineering and biology (e.g. Gershenfeld 1999). Such models take the form

dx

dtZ f ðx; t;aÞ; ð1:1Þ

where x is a vector of variables evolvingwith time; f is a vector field; anda denotes an(optional) set of parameters. Once an ODE model has been built, a vast array of

Phil. Trans. R. Soc. A (2008) 366, 519–544

doi:10.1098/rsta.2007.2108

Published online 13 August 2007

e contribution of 14 to a Theme Issue ‘Experimental chaos II’.

uthor and address for correspondence: Department of Mathematics, Imperial College London,don SW7 2AZ, UK ([email protected]).

519 This journal is q 2007 The Royal Society

Page 3: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

D. Brewer et al.520

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

powerful analytical and numerical methods exists for exploring its properties. Anysuchmodelwill usually involve one ormoreparameters,a. In somecases, particularlyin physics where models are based on well-understood physical mechanisms, suchparameters may be derived from first principles, or measured directly. Increasingly,however, ODEs are being applied in disciplines such as cell and molecular biologywhere many parameters cannot be determined by either of these approaches.

In such situations, one can attempt to estimate the unknown parameters fromexperimentally measured data. In most cases, these data consist of time series, ortime courses, of repeated measurements of one or more experimental variables. Incell and molecular biology, these might, for instance, be mRNA or proteinconcentrations measured at several time points throughout an experiment. Onecan fit such data by systematically varying the parameters to determine a set ofparameters which minimize the difference between a solution of the differentialequation and the data (e.g. Gershenfeld 1999). One can also apply the samemethodology to build a ‘black-box’ model by starting with a general set of basisfunctions (such as polynomials or radial basis functions) and then estimatingtheir coefficients by minimizing the discrepancy between model and data. Inprinciple, such an approach can be used to deduce the network of interactionsbetween the variables (Stark et al. 2003a,b) which in turn can lead to morerefined mechanistic models.

A variety of approaches exists to fit data toODEmodels. Unfortunately,many ofthese are poorly documented in the literature, and may only be described in thecontext of specific applications in specialist publications. We therefore present anoverview of the key concepts below.We show that suchmethods can be classified bywhether theODE is solved using a conventional iterative numerical integrator suchas fourth-order Runge–Kutta, or whether a global solution is approximated usingsplines or related methods. This distinction is similar to that between shooting andcollocation methods for boundary value problems (Golub & Ortega 1992).Although global methods are now generally preferred for solving boundary valueproblems, they are poorly developed in the context of fitting ODEs to data. In suchcases, shooting is generally more popular and well known.

Most experiments in cell and molecular biology produce very short time courses,often with very noisy measurements. Shooting-type methods tend to performparticularly poorly in such situations; we show an example of this below. As ODEsbecome more widely applied in this area, particularly with the current rapid growthin systems biology, there is therefore an urgent need to develop alternative methodsmore suited to this type of data. The use of collocation-type algorithms appears tobe particularly attractive. We explain the main ideas behind this approach below,describe existing algorithms and present a new two-stage algorithm for ODEs thatare linear in parameters. To illustrate these ideas, we use a model of the p53 tumour-suppressor network, which is described briefly in appendix A.

2. Estimating parameters in ODEs

(a ) General principles of model fitting

Any method for estimating model parameters from data requires two mainingredients. We need to construct an error function EDðaÞ that quantifies thedifference between a model with parameters a and the data, and we need an

Phil. Trans. R. Soc. A (2008)

Page 4: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

521Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

optimization method that finds the value of a that minimizes EDðaÞ. Except forerror functions that have particular features (such as those occurring in linearleast squares problems), the minimization stage requires an iterative approach.Typically, the minimization method can be chosen independently of theconstruction of the error function. In some cases, however, an integratedapproach can have advantages. This can occur, for instance, if it is sufficient tocompute only a rough estimate of E DðaÞ at each iteration of the minimizationscheme, rather than calculating E DðaÞ exactly.

A wide variety of standardminimization algorithms exist (e.g. Gershenfeld 1999).Where E DðaÞ has no, or only a few, local minima apart from the global minimum,then methods that iteratively step downhill, such as the Nelder–Mead simplexmethod (Nelder & Mead 1965 or see Press et al. 2002) or the Levenberg–Marquardtmethod (Marquardt 1963 or see Press et al. 2002) work well. If, on the other hand,E DðaÞ has a more complex landscape, stochastic search algorithms such assimulated annealing (Gershenfeld 1999 or Press et al. 2002), Markov Chain MonteCarlo (e.g. Gilks et al. 1996) or genetic algorithms (e.g. Gershenfeld 1999) are oftennecessary. These are usually computationally very demanding.

Attempting to estimate the parameters a of the ODE in equation (1.1) fromobserved data presents additional difficulties. These are mainly centred on theconstruction and efficient computation of a suitable error function. This isbecause we cannot directly determine how well a given set of data points

DZ fxðtiÞ : i Z 1;.; ng;

fits the ODE in equation (1.1). We shall describe two common strategies for theconstruction and minimization of appropriate error functions.

(b ) Solution-based approaches

By far the better known, and arguably statistically more valid, approach is tosolve equation (1.1) numerically to obtain an approximate solution uðtÞ such that

du

dtzf ðu; t;aÞ: ð2:1Þ

Since this is meant to provide a model for the data, the points xðtiÞ should be closeto the values uðtiÞ. It is thus natural to base the error function EDðaÞ on thedifference between uðtiÞ and xðtiÞ. The most common choice is to use the leastsquares error

EDðaÞZXniZ1

jjuðtiÞK xðtiÞjj2; ð2:2Þ

possibly weighted by the reciprocal of the noise level at each data point. ThesubscriptD here highlights that this is an error between the data and the solutionu,in order to distinguish this quantity fromother error functions defined below.Whenmeasurement errors are independently normally distributed, EDðaÞ will be thelogarithm of the likelihood of the data (or more strictly, log (likelihood)), andminimizing equation (2.2) is equivalent to maximum likelihood estimation of theparameters. It is also possible to use this approach even if we cannot measure all ofthe components of the state vector xðtÞ. In such a case, the norm in equation (2.2) is

Phil. Trans. R. Soc. A (2008)

Page 5: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

D. Brewer et al.522

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

just taken over themeasured components. As long as we have a sufficient number ofdata points to ensure that EDðaÞ has a non-degenerate minimum, it may still bepossible to estimate parameters successfully.

Observe that when f has linear dependence on its parameters, the errorfunction given by equation (2.2) is a quadratic form. The minimum in such a casecan be obtained efficiently in one step using algebraic methods such as QRdecomposition (e.g. Lawson & Hanson 1974). This is much faster and morerobust than the iterative optimization algorithms mentioned above. For ODEsthis possibility seems irrelevant, however, since even if the vector field f dependslinearly on the parameters a, the solution uðtÞ will depend nonlinearly on a.Nevertheless, we shall present below a new approach that makes use of linearityin parameters.

The most popular method of solving the differential equation is to use astandard numerical integration scheme such as fourth-order Runge–Kutta,possibly with adaptive step-size control (e.g. Press et al. 2002). Such an approachimmediately encounters a major potential obstacle: we very rarely know thecorrect initial condition xðt0Þ for all of the variables. Even if we are able tomeasure all of the components of xðt0Þ, such measurements will inevitably besubject to some error. Thus, in practice, we only have xðt0ÞZxðt0ÞCeðt0Þ whereeðt0Þ denotes the experimental error. If the differential equation is globally stable,and eðt0Þ is small, the difference between xðt0Þ and xðt0Þ may not be toosignificant. However, most models in the real world are nonlinear. In such a case,the dynamics can dramatically amplify small differences in the initial conditions.Attempting to estimate parameters for such systems using xðt0Þ as an initialcondition leads to poor results.

(c ) Shooting

A better approach is to regard the initial condition xðt0Þ as an additional set ofunknown parameters which are incorporated in the minimization scheme.We thusregard the error function Eða;xðt0ÞÞ as depending on both a and xðt0Þ. This typeof approach appears to have first been tried by Bellman et al. (1967) and similarideas appear in Swartz & Bremermann (1975). It is closely related to shootingmethods for boundary value problems, including methods used for finding periodicorbits and other special solutions (e.g. Golub & Ortega 1992; Kuznetsov 1995). Itcan work well if data are plentiful and noise levels are low. However, if we only havea few time points, as is typical of cell and molecular biology datasets, Eða;xðt0ÞÞcan have a large number of local minima separated by steep peaks and ridges.As we shall see below, it is difficult to find the global minimum in such cases.

One possible extension is to use multiple shooting methods where the solutionis broken down into a number of successive segments, with appropriate matchingat the joints (e.g. Kuznetsov 1995; Timmer et al. 2000). This can improve results,but our experience with even moderately complex models is that it can sufferfrom similar problems to simple shooting.

(d ) Collocation methods

An alternative to using iterative numerical integration is to represent thesolution globally using a set of convenient basis functions Bj : jZ0;.; p.A suitable choice is usually given by piecewise polynomials, usually of low order.

Phil. Trans. R. Soc. A (2008)

Page 6: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

523Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

If we match values and derivatives at the joins between adjacent polynomials,globally smooth functions can be obtained, called splines. We can represent theseas linear combinations of the form

uðtÞZXpjZ0

bjBjðtÞ: ð2:3Þ

A variety of different types of spline are available, but often the most convenientare B-splines (De Boor 1978; Gershenfeld 1999). Such splines have minimalsupport with respect to degree, smoothness and partition, and any other splinefunction of a given degree, smoothness and domain partition can be representedas a linear combination of B-splines of that same degree and smoothness over thesame partition. B-splines have the great advantage that each Bj has compactsupport so that evaluating uðtÞ requires only summing over a small number ofBj s. More precisely, if we partition the domain using the knots fsj : jZ0;.; pg,we can recursively define basis functions of increasing order by

4ð0Þj ðtÞZ

0 t!sj

1 sj! t%sjC1

0 sjC1! t

and

8><>:

4ðiÞj ðtÞZ

tK sjsjCiK sj

4ðiK1Þj ðtÞC

sjCiC1Kt

sjCiC1K sjC1

4ðiK1ÞjC1 ðtÞ:

Note that the basis function at a given degree is obtained by interpolatingappropriate combinations of basis functions of one degree less. It is easy to seethat 4

ðiÞj ðtÞ is 0 outside the interval ½sj ; sjCiC1�. In applications, typically cubic

B-splines are used (with iZ3). We shall restrict ourselves to the case of uniform

spacing of the knots and define BjZ4ð3ÞjK2 in order to give a more symmetric form.

An explicit formula is given by

BjðtÞZ

0 t%sjK2

1

6h3ðtKsjK2Þ3 sjK2% t%sjK1

1

6C

1

2hðtKsjK1ÞC

1

2h2ðtKsjK1Þ2K

1

2h3ðtKsjK1Þ3 sjK1% t%sj

1

6C

1

2hðsjC1KtÞC 1

2h2ðsjC1KtÞ2K 1

2h3ðsjC1KtÞ3 sj% t%sjC1

1

6h3ðsjC2KtÞ3 sjC1% t%sjC2

0 tRsjC2

8>>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>>:

where h is the spacing between the knots sj. Observe that for sj% t%sjC1 the sumin equation (2.3) reduces to

uðtÞZ bjK1BjK1ðtÞCbjBjðtÞCbjC1BjC1ðtÞCbjC2BjC2ðtÞ:

Phil. Trans. R. Soc. A (2008)

Page 7: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

D. Brewer et al.524

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

Furthermore, evaluating at the knot point sj, we have

uðsjÞZ bjK1BjK1ðsjÞCbjBjðsjÞCbjC1BjC1ðsjÞZ1

6bjK1C

2

3bj C

1

6bjC1:

This can be written in matrix form as

uðsK1Þuðs0Þuðs1Þ«

uðspÞuðspC1Þ

0BBBBBBBBBB@

1CCCCCCCCCCA

Z1

6

4 1 0 . 0

1 4 1 . 0

0 1 4 . 0

«

0 . 1 4 1

0 . 0 1 4

0BBBBBBBBBB@

1CCCCCCCCCCA

bK1

b0

b1

«

bp

bpC1

0BBBBBBBBBB@

1CCCCCCCCCCA; ð2:4Þ

where uðsK1Þ and uðspC1Þ are dummy values required to give a well-determinedsystem. These determine the behaviour of u at the boundaries and can, forinstance, be chosen to set the second derivatives of u at the boundaries to 0(a so-called natural cubic B-spline). The matrix in (2.4) is explicitly invertible(though in practice one would solve equation (2.4) using a specialized LRdecomposition). This shows that we can either parametrize the spline uðtÞ usingthe coefficients bj , or using its value at the knot points uðs0Þ;uðs1Þ;.;uðspÞ. Thismakes it straightforward to obtain a spline that interpolates any particular set ofpoints. It is also possible to generalize this derivation to irregularly spaced knotpoints, which can have significant advantages in some applications.

The Bj are polynomials and hence using equation (2.3), we can easilydifferentiate u and substitute the derivative into equation (1.1). In general, sincewe only take a finite number of basis functions, we cannot expect this to besatisfied exactly everywhere, i.e. for every t. Instead, we choose a finite number ofso-called collocation points rk : kZ1;.; q, and require (1.1) to hold at thesepoints, so that

XpC1

jZK1

bjdBj

dtðrkÞZ f

XpC1

jZK1

bjBjðrkÞ; rk ;a !

; ð2:5Þ

for all kZ1;.; q. Note that the derivatives dBj=dt can be easily pre-computedfor any given set of basis functions and collocation points, so that equation (2.5)represents a system of algebraic equations for bZðbK1; b0;.; bpC1Þ, or for thevalues uðs0Þ;uðs1Þ;.;uðspÞ via equation (2.4). Note that equation (2.5) isindependent of the particular choice of spline, or indeed other basis functions.

Given an appropriate choice of splines and collocation points, the system (2.5)is well determined and can be solved to yield the coefficients of the approximatesolution of the differential equation. It turns out that this is equivalent to anappropriate implicit Runge–Kutta integration scheme. This approach isparticularly useful for boundary value problems (Villadsen & Stewart 1967)and today forms the basis of standard packages such as AUTO for finding periodicorbits and bifurcation points. The basic principle is to use the Newton method ora similar root finder to solve equation (2.5), together with appropriate side

Phil. Trans. R. Soc. A (2008)

Page 8: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

525Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

conditions. These ensure that u is periodic (or homo-/heteroclinic for certainglobal bifurcations) and in the case of finding local bifurcations has particulareigenvalues. In the case of bifurcations, it is necessary for the root finder tosimultaneously vary both the coefficients bj and the parameters a. This bearsclose resemblance to parameter estimation from data and suggests thatcollocation-based methods may also be useful in the latter problem.

However, there is an important difference between the two problems. Whenfinding bifurcations, we simply want to solve a set of (nonlinear) equations madeup of equation (2.5) and whatever side conditions we need to specify thebifurcation. In the case of parameter fitting, the side conditions are replaced bythe minimization of ED(a) which has to be carried out simultaneously with thesolution of equation (2.5). There are a number of possible approaches to this.Observe that if u is given by equation (2.3) then ED has no direct dependence ona but rather is a function of b. From now on, we shall thus denote it by

EDðbÞZXniZ1

����XpC1

jZK1

bjBjðtiÞK xðtiÞ����2:

(i) Nested minimization and collocation

This is closely related to shooting, except that instead of integrating thedifferential equation using a method such as fourth-order Runge–Kutta, we usethe Newton method to solve equation (2.5) at each candidate value of theparameters a. This Newton solver is then nested inside an optimization methodthat iteratively minimizes EDðb;aÞ. This is potentially very slow, and we are notaware of this method appearing in the published literature.

(ii) Simultaneous minimization and collocation

It is possible to construct combinations of gradient-based minimization andNewton root solving which essentially simultaneously linearize both equation(2.5) and EDðbÞ. Given a reasonable initial guess, such an algorithm will rapidlyconverge to a minimum of EDðbÞ that satisfies equation (2.5). The first suchmethod appears to have been published by Baden & Villadsen (1982). Biegler(1984) independently initiated the application of collocation-based methods todynamic optimization, which includes parameter estimation as a special case. Hismethod uses sequential quadratic programming, a common algorithm forminimizing an objective function subject to equality constraints withoutrequiring satisfaction of the constraints at each iteration. This scheme solvesthe exact constraint such as equation (2.5) once, and then at subsequentiterations, it uses only the linearization of the constraint. This is combined with aquadratic approximation to the objective function EDðbÞ which is easilyminimized subject to the linearized constraint. It can be shown that such aniteration converges quadratically to the desired minimum. Biegler’s algorithmhas stimulated a range of variations and extensions (e.g. Tjoa & Biegler 1991;Esposito & Floudas 2000; Wang 2000; Li et al. 2005) and is widely used in thechemical engineering community.

Phil. Trans. R. Soc. A (2008)

Page 9: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

D. Brewer et al.526

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

(iii) Dual minimization

Observe that instead of using a root finder to solve equation (2.5), we couldminimize the difference between the r.h.s. and l.h.s.

EMðb;aÞZXqkZ1

����XpC1

jZK1

bjdBj

dtðrkÞKf

XpC1

jZK1

bjBjðrkÞ; rk ;a !����2 ð2:6Þ

with respect to bZðbK1; b0;.; bpC1Þ. We could also give a different weight toeach component of the ODE (e.g. Ramsay et al. 2007). We can think of EMðb;aÞas a measure of how well the spline uðtÞ, defined by b, satisfies the differentialequation (1.1). This approach has the advantage that we can take a largernumber q of collocation points. In that case, equation (2.5) will be over-determined, and no longer have a solution, but it is still possible to find aminimum of EMðb;aÞ. Indeed, in the limit q/N, we have

EMðb;aÞZð ����X

pC1

jZK1

bjdBj

dtðrÞKf

XpC1

jZK1

bjBjðrÞ; r ;a !����2dr : ð2:7Þ

The minimization of such a function is often used in the derivation of variouscollocation schemes (e.g. Golub & Ortega 1992). Given either version ofEMðb;aÞ, now our problem is to simultaneously minimize both EDðbÞ andEMðb;aÞ with respect to both b and a. This is most easily achieved byminimizing

~Eðb;aÞZEDðbÞClEMðb;aÞ:

Here l is a weighing factor that determines how much emphasis we place on thedata and the model, respectively. This is a particular attraction of this approach,since if we have high confidence in the model but the data are very noisy, we cantake l large, and conversely if the measurement error for the data is low but themodel is suspect, we can use a low l.

Brewer (2006) compares the Nelder–Mead simplex algorithm and simulatedannealing in the minimization ~Eðb;aÞ, with lZ1 and EMðb;aÞ defined byequation (2.6). A more sophisticated optimization scheme has recently beenpresented by Ramsay et al. (2007) for the case of equation (2.7). Ramsay et al.(2007) also analyse the behaviour of their method in the limit l/N.

An alternative algorithm, applicable when f is linear in parameters, wasintroduced in Brewer (2006) and is presented in §2e. This is motivated by theobservation that if the data are observed without error, so that xðtiÞZxðtiÞ, thenwe know that the rate of change of a solution at ti is precisely f ðxðtiÞ; ti;aÞ. Insuch a case, it would be reasonable to replace EMðb;aÞ by

EMðb;aÞZXniZ1

����XpC1

jZK1

bjdBj

dtðtiÞKf ðxðtiÞ; ti;aÞ

����2:The advantage of this is that EMðb;aÞ is a quadratic form (i.e. a linear leastsquares problem) in ðb;aÞ. Since EDðbÞ is also a quadratic form, the overall errorEDðbÞClEMðb;aÞ is a linear least squares function that can be minimized usinga single QR decomposition.

Phil. Trans. R. Soc. A (2008)

Page 10: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

527Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

Of course, in general, the data are subject to measurement error. We can,however, still use a modified error function of the above type, with our bestestimate uðtiÞ of the real measurement replacing xðtiÞ in the above formula.Motivated by the expectation–maximization methods used in the case ofnuisance parameters or missing data (Dempster et al. 1977; Moon 1996), wealternatively minimize EDðbÞClEMðb;aÞ and generate a new estimate of thedata. Full details are given below.

(e ) Derivative approximation methods

All of the methods presented so far use the least squares error function inequation (2.2) which determines the discrepancy between the observed data anda numerically computed solution of the differential equation. An alternative is tominimize the discrepancy between the r.h.s. and l.h.s. of the differential equationat a selected number of data points. This gives an error function like

EDðaÞZXqkZ1

���� dudt ðrkÞKf ðuðrkÞ; rk ;aÞ����2: ð2:8Þ

This relationship between minimizing EDðaÞ and EDðaÞ under suitable conditionsis discussed by Baden & Villadsen (1982). Observe the close similarity betweenEDðaÞ and EMðb;aÞ in equation (2.6). In particular, if we chose an approximatesolution u given by equation (2.3), then EDðaÞZEMðb;aÞ. In such circum-stances, minimizing EDðaÞ will be equivalent to minimizing ~Eðb;aÞ in the limitl/N.

Note, however, that EDðaÞ has no direct dependence on the observed dataxðtiÞ. In the case of ~Eðb;aÞ, for a finite l, such dependence is obtained throughEDðbÞ. Alternatively, one can incorporate the data directly into EDðaÞ bychoosing the points uðrkÞ in equation (2.8) to be exactly or approximately thedata points xðtiÞ. The advantage of this is that it can be done without the need tosolve the differential equation (1.1). Instead, we simply need to estimate thederivative du=dt at the points of interest. This can be done by smoothing thedata xðtiÞ using an appropriate local polynomial (or globally using a spline), andthen differentiating the result.

This approach appears to have first been employed byvandenBosch&Hellinckx(1974), possibly inspired by collocation-basedmethods for parameter estimation inpartial differential equations (Seinfeld 1969). Baden & Villadsen (1982) suggestedan improvement and compared both schemes with the simultaneous minimizationofEDðbÞ and solution of equation (2.5), as described above. Swartz & Bremermann(1975) independently published an algorithm of this type and compared it with ashooting method. In contrast to van den Bosch & Hellinckx (1974) and Baden &Villadsen (1982) who used a global spline to smooth the data, Swartz &Bremermann (1975) employed local polynomials to either approximate orinterpolate several successive data points, without any attempt to match suchlocal fits at their intersections. Varah (1982) presented and evaluated an algorithmbased on their ideas but again using a global spline smoothing method.

Finally, in a modern systems biology context, where observations are onlyavailable for a very restricted number of time points, Barenco et al. (2006) usedLagrangian interpolation between nearby data points to estimate the required

Phil. Trans. R. Soc. A (2008)

Page 11: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

D. Brewer et al.528

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

derivative. This allowed them to estimate both parameters in a simple model ofgene expression and to estimate the activity of an unobserved transcriptionfactor (p53) driving the system. This, in turn, allowed the prediction ofpreviously unknown targets of p53.

3. A new efficient method for ODEs linear in parameters

(a ) Linearity in parameters

Our experience is that direct minimization of ~Eðb;aÞ can be time-consuming(Brewer 2006). One would thus like to use specific features of a particular class ofmodels to develop a more efficient algorithm. In particular, many models inphysics, chemistry, engineering and biology are linear in their parameters. This istrue for both models derived from underlying principles such as mass–actionmodels in chemistry and cell biology, and for data-driven ‘black-box’ models builtwith general set of basis functions (e.g. radial basis functions). Recall that if amodel is linear in its parameters, then the objective function is a quadratic formand a least squares estimate can be obtained in one step (i.e. non-iteratively) usingstandard linear algebra techniques such as QR decomposition (e.g. Lawson &Hanson 1974). This is much faster than the iterative minimization routinesrequired for models that are nonlinear in parameters.

It seems difficult, however, to make use of this for ODE models, since even ifthe vector field f ðx; t;aÞ depends linearly on the parameters a, the solution uðtÞwill typically exhibit nonlinear dependence. This makes it difficult to benefitfrom the linearity of f for solution-based approaches employing ED or ~E.

On the other hand, if f ðx; t;aÞ is linear in a then for fixed uðrkÞ in equation(2.8) the error function EDðaÞ is a quadratic form that can be minimized in theusual way using QR decomposition. In other words, equation (2.8) is a linearleast squares problem in a. This makes methods based on EDðaÞ particularlyattractive for ODEs that are linear in parameters. However, as Baden &Villadsen (1982) point out, EDðaÞ is the wrong objective function from alikelihood point of view. Our aim here, therefore, is to use the similarity betweenEDðaÞ and EMðb;aÞ highlighted above to derive an algorithm that approximatelyminimizes ~Eðb;aÞ making full use of the linearity of f ðx; t;aÞ with respect to a.

(b ) Overview of the new algorithm

As already indicated above, if the data are known precisely, we can replaceEMðb;aÞ by EMðb;aÞ which is a linear least squares objective function. We canemploy the same principle even with noisy data if we replace xðtiÞ in EMðb;aÞ byour best available estimate of the noise-free data values. More generally, we donot need to restrict to just the observed time points ti. In particular, if we haveestimates uðrkÞ of the variables at the collocation points rk, we obtain themodified model error function

EMðb;a; uÞZXqkZ1

����XpC1

jZK1

bjdBj

dtðrkÞKf ðuðrkÞ; rk ;aÞ

����2; ð3:1Þ

Phil. Trans. R. Soc. A (2008)

Page 12: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

529Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

which yields the overall

Eðb;a; uÞZEDðbÞClEMðb;a; uÞ: ð3:2ÞMotivated by expectation–maximization methods used in the case of nuisanceparameters ormissingdata (Dempster et al. 1977;Moon1996),we canattempt to fitthe model by generating a sequence of better and better estimates uðmÞ. At eachstage, this estimate is substituted into E which is minimized with respect to ðb;aÞ.The resulting b is used to generate a new estimate uðmC1Þ using equation (2.3). Wethus alternate estimating the expected value of the nuisance parameter with aminimization of the parameter of interest. Note, however, that our algorithmdiffersfrom a conventional expectation–maximization method where we would onlyminimize a, whereas here we simultaneously minimize both a and b.

(c ) Description of the algorithm

We iteratively generate a sequence of estimates ðbðmÞ;aðmÞÞ with uðmÞ given by

uðmÞðtÞZXpC1

jZK1

bðmÞj BjðtÞ: ð3:3Þ

To do this, we proceed as follows.

(i) The iteration is initialized with a spline uð0Þ obtained by smoothing thedata. More precisely,uð0Þ is given by equation (3.3) with b(0) the minimumof EDðbÞ.

(ii) For each mZ1; 2;.; we obtain ðbðmÞ;aðmÞÞ from uðmK1Þ by minimizingEðb;a; uðmK1ÞÞ with respect to ðb;aÞ. This is a linear least squares problemwhich is carried out in one step using QR decomposition.

(iii) We define uðmÞ from bðmÞ using equation (3.3).

(iv) If for some preset tolerance d we have jjbðmÞKbðmK1Þjj!d then terminate,otherwise return to step (ii).

(d ) Properties of the solution

When the iteration terminates, we have bðmÞZbðmK1Þ to some pre-specifiednumerical tolerance. Thus

EMðbðmÞ;a; uðmK1ÞÞZ EMðbðmÞ;a; uðmÞÞZEMðbðmÞ;aÞand hence

EðbðmÞ;a; uðmK1ÞÞZ ~EðbðmÞ;aÞ:Since ðbðmÞ;aðmÞÞ minimizes Eðb;a; uðmK1ÞÞ with respect to ðb;aÞ, we see that aðmÞ

is a minimum of ~EðbðmÞ;aÞ with respect to a. In general, it does not appear to be

possible to ensure that ~Eðb;aÞ is also minimized with respect to b but in practice

the distinction between EM and EM appears to have negligible effect.

(e ) Implementation

The algorithm was implemented in CCC, using the TNT library (Pozo 2004).A weight of lZ1 and a stopping tolerance of dZ10K8 were used, unless otherwisestated. In evaluating the accuracy of the final spline to the model, we used the

Phil. Trans. R. Soc. A (2008)

Page 13: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

D. Brewer et al.530

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

integral form of EÐZEMðb;aÞ, as defined in equation (2.7). This was evaluated

using Simpson’s rule with 10 000 steps (Press et al. 2002).

4. Numerical results

(a ) A test model

As a specific example, we consider a simple four-component model of the core ofp53 gene regulatory network; see appendix A and Brewer (2006). This ODEdescribes the behaviour of active ATM (a), p53 (z), active p53 (x) and MDM2( y) after a cell experiences DNA damage

da

dtZKDATMa;

dz

dtZ pp53KDp53zK k1yzK k 2az;

dx

dtZKDp53xK k1yxCk 2az and

dy

dtZ pMDM2Ck3xKDMDM2yK k4ay:

In total, the model depends linearly on its nine parameters DATM; DMDM2; Dp53;pMDM2; pp53; k1; k 2; k3 and k4. We generated a simulated dataset for the parametervalues given in table 4a using a fourth-order Runge–Kutta scheme with step size of10K8 implemented in CCC (Press et al. 2002). Different sized datasets werecreated by sampling from this set at fixed intervals.

To simulate measurement noise, we added an independent Gaussian error toeach data point. This had mean 0 and variance s2, with s a constant value for alldata points. A number of different noise levels were used, ranging from sZ0 tosZ0.06. For each noise level, we report results averaged over 1000 independentrealizations, each of which was fitted to the model.

(b ) Results: shooting method using Nelder–Mead optimization

We implemented a shooting algorithm approach in CCC using a Runge–Kutta algorithm with adaptive step size control as the integrator (Fehlberg 1968;Cash & Karp 1990) and a Nelder–Mead simplex method as the optimizationroutine (Nelder & Mead 1965; Press et al. 2002). A starting simplex wasconstructed around an initial parameter estimate P0 by

Pi ZP0Czei;

where ei is the unit vector in the ith coordinate direction. Four different choicesof initial guess were tried, as shown in table 1.

We used a stopping criterion based on the fractional range of the simplex

jlhK l ljjlhjC jl lj

!1

2h;

Phil. Trans. R. Soc. A (2008)

Page 14: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

Table 1. The points in parameter space used as the initial starting point P0 for the Nelder–Meadoptimization. (The first of these (A) consists of the true parameter values used to generate the dataand was used as a stability check for the algorithm.)

label DATM DMDM2 Dp53 pMDM2 pp53 k1 k2 k3 k4

A 0.05 0.18 0.041 0.01 0.52 1.42 0.39 2.50 0.75B 1 1 1 1 1 1 1 1 1C 9 13 9 23 60 33 2 53 10D 5 10 1 8.1 0.02 6.7 2.1 0.3 7.7

Table 2. Parameter estimates obtained using a Nelder–Mead-based shooting method. The firstcolumn indicates the starting point P0 for the simplex as in table 1. (The dataset consisted of allfour variables sampled at 1052 time points with no noise added (so that sZ0). The notation ‘it’indicates the number of iterations before convergence and ‘LSQ’ the final least squares error ED.Implementation constants were zZ10 and hZ10K10.)

point DATM DMDM2 Dp53 pMDM2 pp53 k 1 k 2 k 3 k4 it LSQ

A 0.05 0.18 0.041 0.01 0.52 1.42 0.39 2.50 0.75 1379 1.16!10K8

B 0.0502 0.844 K0.637 0.353 0.374 2.27 0.282 3.02 0.995 3697 0.363

C 0.0509 26.5 21.0 13.3 27.6 43.5 21.9 48.2 21.6 604 7.16

D 0.0498 K14.8 42.1 4.26 42.6 2.67 K61.1 0.3 K21.3 1690 5.27

531Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

where h is the required accuracy; lh is the highest value of the objective functionamong the vertices of the simplex; and l l is the lowest value.

Table 2 shows the results of this method for the four different initial startingpoints from table 1. This demonstrates that the parameter space contains manylocal minima, and unless the algorithm is started close to the true value, it isdifficult to recover the correct parameter estimates. Using the Powellminimization method (Acton 1990), instead of Nelder–Mead, produced similarresults (data not shown). Multiple local minima are common when applyingparameter estimation to ODE models (Esposito & Floudas 2000), especiallywhen the model is nonlinear and there are a large number of parameters. In thiscase, using any kind of local method, including other approaches such asgradient- or Hessian-based methods, will not be sufficient unless the startingparameter estimates are close to their true values.

(c ) Results: shooting method using simulated annealing

An alternative is to use a global minimization method such as simulatedannealing (Metropolis et al. 1953; Kirkpatrick et al. 1983; Kirkpatrick 1984;Gershenfeld 1999 or Press et al. 2002), Markov Chain Monte Carlo (e.g. Gilkset al. 1996) or genetic algorithms (e.g. Gershenfeld 1999). Such methods all havesome possibility of moving to a worse solution during a systematic search of theparameter space and hence are able to escape out of local minima. Simulatedannealing is probably the oldest and best established of these methods.This global minimization method slowly ‘cools’ the system, where the

Phil. Trans. R. Soc. A (2008)

Page 15: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

Table 3. Parameter estimation using shooting with simplex simulated annealing. (Initialtemperatures of 10, 100 and 1000 correspond to initial acceptance percentages of 52, 68 and 71%(based on 1000 proposed steps after an initial transient of 100 steps). The total number of steps waschosen for its computational feasibility: 106 steps can require several days of computer time.)

initial point

startingtemperature(T0)

estimatedcount (K)

no. ofiterations

least squaresscore

approx.duration (h)

B 10 106 1 000 000 6.36!10K4 22B 10 106 1 000 000 6.36!10K4 49B 10 106 1 000 000 6.36!10K4 28B 10 106 1 000 000 6.36!10K4 31B 10 106 1 000 000 0.288 25B 10 107 10 000 000 6.36!10K4 259B 10 107 10 000 000 0.237 276B 100 106 1 000 000 6.36!10K4 52B 100 106 1 000 000 6.11 276B 100 106 1 000 002 16.4 369B 100 107 10 000 000 11.4 573B 1000 106 1 000 000 110 315C 10 106 1 000 001 5.22 —C 10 106 1 000 001 5.39 104C 10 106 1 000 000 5.48 60C 10 106 1 000 000 5.77 124C 10 106 1 000 000 5.98 62C 10 107 10 000 000 5.37 444D 10 106 1 000 000 4.28 56D 10 106 1 000 000 5.50 253D 10 106 1 000 001 5.67 218D 10 106 1 000 002 5.71 217D 10 106 1 000 000 12.4 428

D. Brewer et al.532

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

‘temperature’ determines the probability that a step in parameter space thatworsens the error is accepted. Generally, the step is determined by taking arandom point from a Gaussian distribution with the mean set at the currentpoint but here a refinement is implemented that uses the Nelder–Mead algorithmto propose the step (Cardoso et al. 1996; Kvasnicka & Pospichal 1997; Torreset al. 1997; Press et al. 2002). The cooling scheme is an important factor in theeffectiveness of the algorithm. Here we employ the one recommended by Presset al. (2002), taking TZT0ð1Kk=KÞ4 where T0 is the initial temperature, k isthe total number of moves so far and K is the estimated number of movesrequired.

The above scheme was implemented in CCC. A number of different choices ofcooling parameters (starting temperature and rate of cooling) and initialconditions were applied to the 1052 time point dataset with no measurementerror (sZ0) using a number of different G5-based computers with processorspeeds between 1.6 and 2.0 GHz (tables 3 and 5). Convergence to the globalminimum only occurred in approximately one-quarter of the runs and thesesuccessful runs all started with the same initial parameters. The duration of the

Phil. Trans. R. Soc. A (2008)

Page 16: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

Table 4. Evaluation of the algorithm presented in §3. (a) True parameter values. (b) Percentageerror for parameter estimates from a 1052 time point dataset with pZ19, qZ1052, lZ1 anduZ0.464. (c) Parameter estimates from a six time point dataset when pZ19, qZ29 and uZ0.464,where uZlq/n.

DATM DMDM2 Dp53 pMDM2 pp53 k1 k2 k3 k4

(a) 0.05 0.18 0.041 0.01 0.52 1.42 0.39 2.50 0.75(b) 0% 0.0335% 0.4221% 0.1684% 0.0562% 0.0835% 0.0604% 0.0172% 0.00160%(c) 0.0500 0.176 0.0391 0.0954 0.525 1.44 0.394 2.47 0.741

533Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

runs was consistently high, almost always taking more than a day to converge. Inorder to provide the best chance of success, the test was performed with a largedataset; a reduction in the dataset would render the parameter space morecomplex and make convergence to the optimal solution harder.

We conclude that simulated annealing can give good parameter estimates,but requires a lot of time tuning the minimization algorithm for success.Even when it does work, it is slow and is at the limit of practicality oncurrent popular hardware. These conclusions will apply to any globalminimization method that relies on single shooting to determine the errorfunction. Multiple shooting (e.g. Kuznetsov 1995; Timmer et al. 2000) canhelp to regularize the parameter space and generally provides a more robustalgorithm, but in our experience still retains many of the same problems asordinary shooting.

(d ) Results: novel collocation scheme for ODEs linear in parameters

Finally, we applied the algorithm presented in §3 for a variety of combinationsof noise levels and sizes of dataset. When the dataset is relatively large (1052time points), we obtain accurate parameter estimates using 22 splines (pZ19).In this case, except for Dp53, all estimated parameters are within 0.1% of theirtrue values (table 4b). The solution splines are virtually indistinguishable fromthe true time course (data not shown) and satisfy the model to a high degree ofaccuracy (EÐZ2:38!10K5). The algorithm took approximately 2 min on a

2.0 GHz G5 Power Macintosh when pZ19, nZ1052, qZn, lZ1 and sZ0.06.This is significantly faster than the shooting methods above. Furthermore, thespeed improves considerably as the amount of data is reduced (12 s requiredwhen nZ106).

As the size of the dataset is decreased, the accuracy of the estimates alsodeclines (figure 1). This occurs because the larger the dataset, the moreconstrained the spline is and hence the closer it will be to the ‘true’ solution andthe more accurate the estimates will be. However, the loss of accuracy is minimaldown to approximately nZ150. Even with very small amounts of data, theestimates are still reasonably accurate; when nZ14 the error is less than 7%which is perfectly usable in many systems biology contexts. This behaviour wasconsistent for each of the parameter estimates, but there were orders ofmagnitude differences in the error, ranging from less than 10K5 to 15% whennZ14 (see figure 7 in appendix B). This may reflect the relative contributions a

Phil. Trans. R. Soc. A (2008)

Page 17: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

0.0001

0.001

0.01

0.1

0 200 400 600 800 1000

aver

age

rela

tive

diff

eren

ce b

etw

een

estim

ated

and

act

ual p

aram

eter

val

ues

no. of time points in dataset

Figure 1. The relationship between the amount of data and the accuracy of the parameterestimates. This plot shows the accuracy when applied to datasets with between 14 (the minimumpossible in this situation) and 1052 time points, with pZ19, qZn, lZ1. Plots for individualparameters are given in figure 7 in appendix B.

D. Brewer et al.534

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

parameter has on the model solution. We used qZn throughout as preliminaryexperiments showed that when there was no error in the data, this consistentlyproduced good results in the minimum amount of processing time.

We next added varying levels of noise to the data, and found that thealgorithm continues to perform well (figure 2). The mean estimate generallyoccurs close to the true value and is always within one standard error. As sincreases, the estimates shift away from the true parameter value and this shift issignificantly greater in some parameters than others. As the error in the dataincreases, the positional constraints of the spline (i.e. the data points) lessresemble the true solution and so the optimal spline is likely to deviate, which inturn produces less accurate estimates.

The performance of the algorithm depends on a number of adjustable factors,including p, q and l. It is beyond the scope of this paper to look at these factors indetail (Brewer 2006), but here we will briefly examine two situations where theiroptimization is beneficial. When the amount of error in the data is large, it is nolonger appropriate to give equal weight to the spline being close to the data andthe spline satisfying the model. This is because the solution spline is unable torepresent the model solution accurately and so poor parameter estimates willresult. More weight can be placed on the spline satisfying the model by increasingthe number of collocation points and/or increasing l. This gives an improvementin the spline quality (figure 3) and hence the parameter estimates. Increasing thenumber of collocation points has the additional benefit of spreading the positionswhere the model needs to be satisfied, which is important where the amount ofdata is small. There is a limit to how much additional weight can be used: if it istoo large, the procedure fails to converge. In this case, the solution spline movesaway from the data points, becoming a worse estimate of the model solution ateach iteration. This occurs because the data no longer have a strong enough

Phil. Trans. R. Soc. A (2008)

Page 18: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

0.045

0.046

0.047

0.048

0.049

0.050

0.051

0.052

0.053

0.054

0.055(a)

(b)

para

met

er v

alue

DATM

true valueaverage estimates.d. of estimates

true valueaverage estimates.d. of estimates

0.36

0.38

0.40

0.42

0.44

0.46

0.48

0.50

0.52

0.54

0.56

0 0.01 0.02 0.03 0.04 0.05 0.06

para

met

er v

alue

s

pp53

Figure 2. (a,b) The effect of increasing measurement error on the accuracy of parameter estimates.Results are based on 1000 independent noise realizations with pZ19, qZnZ106, lZ1. The errorbars indicate the standard error of the parameter estimates. A s of 0.06 corresponds toapproximately 75% relative error for active p53 and 12% for the other components. For results ofthe remaining parameters see figure 6 in appendix B.

535Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

constraining influence on the spline. When there is a small amount of data, it isdifficult to get reasonable parameter estimates, but with the appropriateoptimization of the factors, it is possible to get good results (table 4c andfigure 4). At such low amount of data, convergence is extremely sensitive to thefactor values.

5. Discussion

Estimation of parameters in ODE models is of considerable importance in manymodelling fields. Increasingly in systems biology, this needs to be done for veryshort time courses with high levels of noise on the data. Traditional algorithmsare poorly suited to such problems.

Phil. Trans. R. Soc. A (2008)

Page 19: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

0.018

100 150

(a)

200 250 300

aver

age

E∫ o

f co

nver

gent

run

s

q

(b)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 2 3 4 5 6 7

aver

age

E∫ o

f co

nver

gent

run

s

w

Figure 3. The relationship between model error EÐ and (a) the number of collocation points q ; and(b) the ratio uZlq=n. Results are based on 1000 independent noise realizations with pZ19, sZ0:06and nZ106. In (a) we have lZ1 and in (b) qZ500. Error bars indicate standard deviations.

D. Brewer et al.536

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

We have presented a summary of the various approaches to parameter fittingin ODEs. We have then introduced a new algorithm for the case of modelslinear in parameters. It employs a spline-based collocation scheme andalternates linear least squares minimization steps with repeated estimates ofthe noise-free values of the variables. This has the advantage that fast,established linear algebra solvers can be used to provide optimal parametervalues. The proposed procedure also avoids the problems of the model becomingstiff which can hamper shooting-based methods, in particular, when applied tomodels that are linear in their parameters. Additionally, the proposedprocedure is effective at dealing with large-scale models and does not requirethe estimation of initial conditions.

Phil. Trans. R. Soc. A (2008)

Page 20: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0.11

0.12

0.13

0 5 10 15 20

amou

nt o

f ac

tive

p53

time (h)

22 B-spline solution14 B-spline solution

true solutiondata

Figure 4. The solution spline for active p53 produced by the algorithm on a small amount of datawith nZ6 and either pZ19, qZ29 and uZ0:464 or pZ11, qZ14 and uZ1:09.

537Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

We have shown that the proposed procedure produces reasonable estimates,even at low amounts of data and quite high amounts of error. The accuracy of theproposed procedure depends on how close the intermediary spline is to the truemodel solution. At low amounts of data and high error, there is less validinformation to constrain the spline and so the spline accuracy and hence theestimates suffer. By optimizing the algorithm factors, in particular the weightplaced on the spline satisfying the model, it is possible to significantly improvethe accuracy of the spline and hence the estimates.

In comparison with single shooting methods, there are many advantages to ournew algorithm. Firstly, it is more accurate in converging to reasonable parameterestimates across a wide range of dataset sizes and amounts of error. Many singleshooting global optimization methods rely on probabilistic steps but this canoften result in inconsistent results. Secondly, our procedure is considerablyfaster. This arises because the problem is linearized so that efficient linear algebrasolvers can be used and by using splines the model has been effectivelydiscretized, negating the need for costly integration. Also, our procedure issimple, with only three key factors that can be varied. In comparison, simulatedannealing requires numerous algorithm parameters to be optimized. Finally, thistechnique neither requires the estimation of parameter values to seed theoptimization nor the initial conditions of the dynamic system. This simplifies therequirements to get reasonable parameter estimates, which are of particularconcern at low amounts of data or when not much is known about the systemin advance.

Despite our procedure being limited to models that are linear in theirparameters, it is still a fast and useful tool that is applicable in many areas ofmodelling. Furthermore, there is potential for this method to be applied tosimplifications of more complex problems to provide good initial estimates thatcan then be refined using a more complex optimization methods applied to thefull model.

Phil. Trans. R. Soc. A (2008)

Page 21: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

activeATM (a)

p53(z)

activep53 (x)

MDM2(y)

DNA damage positive action

functional change

production

degradation

k1k1

k2

k3

k4

Figure 5. A schematic of a simple model of the core of the p53 gene regulatory network (equation(A 1)). ki are interaction rate constants that indicate the strength of the interaction between thetwo components joined by the arrow.

D. Brewer et al.538

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

D.M.B. held a CHRAT studentship by the ICH and M.B. was supported by UK Biotechnology andBiological Sciences Research Council (BBSRC) Exploiting Genomics Initiative grant(39/EGM16102). J.S. is supported by the BBSRC via the Centre for Integrative Systems Biologyat Imperial College (CISBIC), BB/C519670/1.

Appendix A. p53 model

p53 is a tumour-suppressor and has been described as the ‘guardian of thegenome’ (Lane 1992). It is part of a complex and extensive gene regulatorynetwork that integrates a variety of stress signals to produce a range ofeffects including apoptosis, growth arrest and DNA damage repair. Ofparticular importance is p53’s role in the decision to commence apoptosis,which is not well understood. p53 is known to play a vital role in preventingcancer; p53 is dysfunctional in the majority of cancer types (Soussi et al.2000) and more than 18 000 different p53 mutations have been found incancers (Bode & Dong 2004).

A simple model is proposed that includes the main protein regulatoryinteractions of the p53 network (figure 5). The following interactions aremodelled: through phosphorylation, active ATM enables the stabilization andhence activation of p53 (k 2); ATM phosphorylates MDM2 compromisingMDM2’s ability to ubiquitinate and bind p53, hence ATM increases the rate atwhich MDM2 is inactivated/degraded (k4); active p53 transcribes MDM2 (k3);and MDM2 encourages the degradation of both forms of p53 throughubiquitination and also prevents p53 acting as a transcription factor bybinding (k1). Active ATM drives the system and it is assumed that at time tZ0the active ATM level has been ‘kicked’ to a value away from equilibrium andthis decays exponentially according to the rate constant, DATM. The modelODEs are

Phil. Trans. R. Soc. A (2008)

Page 22: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

539Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

ðA 1)

where

a Z active ATM concentration

z Z inactive p53 concentration

x Z active p53 concentration

y ZMDM2 concentration

ki Z interaction rate constant i

pq Z basal production rate of q

Dq Z basal degradation rate of q:

The following simplifications have been made.

—The path containing CHK2 was removed. This is because ATM/CHK2/p53 duplicates the behaviour of the more direct ATM/p53 and so the effectof the CHK2 pathway can be included in the interaction between activeATM and p53.

—Once MDM2 has bound to ARF or become inactivated through phosphoryl-ation, it is removed from the system (in effect degraded). This means thatonly one component of MDM2 (the active form) is required.

—Active ATM is the only protein that can convert p53 into its active state andthere is no ‘basal’ rate of activation.

— If MDM2 interacts with both inactive p53 and active p53, then it degradesthem at an equal rate.

— p53 forms a tetramer when activated before it can perform its function as atranscription factor. The details of this mechanism are ignored.

Appendix B. Additional plots and tables

Phil. Trans. R. Soc. A (2008)

Page 23: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

Table 5. The results from simulated annealing with downhill simplex parameter estimation. A range of initial temperatures, estimated counts andinitial points were used.

initialpoint

initialtemp

estimatedcount (K)

no. ofiterations

leastsquaresvalue DATM DMDM Dp53 pMDM2 pp53 k1 k2 k3 k4

true parameter values 0.05 0.18 0.041 0.01 0.52 1.42 0.39 2.50 0.75

(b) 10 106 1 000 000 6.36!10K4 0.0500 0.200 0.0445 0.0187 0.521 1.42 0.391 2.53 0.755(b) 10 106 1 000 000 6.36!10K4 0.0500 0.200 0.0445 0.0187 0.521 1.42 0.391 2.53 0.755(b) 10 106 1 000 000 6.36!10K4 0.0500 0.200 0.0445 0.0187 0.521 1.42 0.391 2.53 0.755(b) 10 106 1 000 000 6.36!10K4 0.0500 0.200 0.0445 0.0187 0.521 1.42 0.391 2.53 0.755(b) 10 106 1 000 000 0.288 0.0446 0.215 0.0617 0.0119 0.522 1.39 0.376 2.66 0.698(b) 10 107 10 000 000 6.36!10K4 0.0500 0.200 0.0445 0.0187 0.521 1.42 0.391 2.53 0.755(b) 10 107 10 000 000 0.237 0.0553 0.191 0.0260 0.0257 0.519 1.45 0.406 2.42 0.800(b) 100 106 1 000 000 6.36!10K4 0.0500 0.200 0.0445 0.0187 0.521 1.42 0.391 2.53 0.755(b) 100 106 1 000 000 6.11 0.0505 K39.6 835 K289 589 107 417 10 600 2780(b) 100 106 1 000 002 16.4 1810 1110 1100 469 359 K1300 K208 K340 K561(b) 100 107 10 000 000 11.4 1870 652 1390 157 758 232 K141 1520 1870(b) 1000 106 1 000 000 110 1640 517 1520 115 565 K852 436 K8010 34 800(c) 10 106 1 000 001 5.22 0.0534 626 1160 313 307 K1300 235 1820 854(c) 10 106 1 000 001 5.39 0.0495 1540 1590 666 806 K690 559 0.0208 13.7(c) 10 106 1 000 000 5.48 0.0484 945 2500 597 604 K2970 418 3.28 696(c) 10 106 1 000 000 5.77 0.0455 1510 K1080 520 267 2990 187 4200 176(c) 10 106 1 000 000 5.98 0.0502 1490 1240 619 567 K666 526 K1340 K197(c) 10 107 10 000 000 5.37 0.0501 2160 112 3500 56.1 K28.7 43.8 K29 200 K2160(d ) 10 106 1 000 000 4.28 0.0519 1870 K333 726 143 1650 192 K2100 K1100(d ) 10 106 1 000 000 5.50 0.0497 1650 1490 360 960 K177 579 191 145(d ) 10 106 1 000 001 5.67 0.0505 970 1830 595 526 K2000 403 K413 547(d ) 10 106 1 000 002 5.71 0.0484 1720 1750 1200 296 K2311 247 K4660 K360(d ) 10 106 1 000 000 12.4 1840 K44 K1973 K23.9 136 4660 K848 K143 314

D.Brew

eret

al.

540

Phil.

Trans.

R.Soc.

A(2008)

on October 16, 2012

rsta.royalsocietypublishing.orgD

ownloaded from

Page 24: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

0.0450.0460.0470.0480.0490.0500.0510.0520.0530.0540.055

(a)

(c)

(g)

(e)

para

met

er v

alue

para

met

er v

alue

para

met

er v

alue

para

met

er v

alue

true valueaverage estimates.d. of estimates

true valueaverage estimates.d. of estimates

true valueaverage estimates.d. of estimates

true valueaverage estimates.d. of estimates

–0.30–0.25–0.20–0.15–0.10–0.05

00.050.100.150.200.25

0.360.380.400.420.440.460.480.500.520.540.56

0.30

0.32

0.34

0.36

0.38

0.40

0.42

0.44

0 0.01 0.02 0.03 0.04 0.05 0.06s

(b)

(d )

(h)

( f )

true valueaverage estimates.d. of estimates

true valueaverage estimates.d. of estimates

true valueaverage estimates.d. of estimates

true valueaverage estimates.d. of estimates–0.10

–0.05 0

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

–0.20

–0.15

–0.10

–0.05

0

0.05

0.10

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0

0 0.01 0.02 0.03 0.04 0.05 0.06s

(i)

true valueaverage estimates.d. of estimates0.1

0.20.30.40.50.60.70.80.91.0

0 0.01 0.02 0.03 0.04 0.05 0.06

para

met

er v

alue

s

Figure 6. The effect of increasing measurement error on the accuracy of parameter estimates.Results are based on 1000 independent noise realizations with pZ19, qZnZ106, lZ1. The errorbars indicate the standard error of the parameter estimates. A s of 0.06 corresponds toapproximately 75% relative error for active p53 and 12% for the other components. (a) DATM, (b)DMDM2, (c) Dp53, (d ) pMDM2, (e) pp53, ( f ) k1, (g) k2, (h) k3 and (i ) k4.

541Fitting short time course data

Phil. Trans. R. Soc. A (2008)

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

Page 25: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

10–10

10–9

10–8

10–7

rela

tive

diff

eren

cebe

twee

n es

timat

ed a

ndac

tual

par

amet

er v

alue

s

rela

tive

diff

eren

cebe

twee

n es

timat

ed a

ndac

tual

par

amet

er v

alue

s

rela

tive

diff

eren

cebe

twee

n es

timat

ed a

ndac

tual

par

amet

er v

alue

s

rela

tive

diff

eren

cebe

twee

n es

timat

ed a

ndac

tual

par

amet

er v

alue

s

rela

tive

diff

eren

cebe

twee

n es

timat

ed a

ndac

tual

par

amet

er v

alue

s

0.0001

0.001

0.01

0.1

1.0

0.0001

0.001

0.01

0.1

(a)

(c)

(e)

0.0001

0.001

0.01

0.1

0.001

0.01

0.1

1.0

0.0001

0.001

0.01

0.1

(b)

(d )

( f )

0 200 400 600 800 1000

no. of time points in dataset

0.0001

0.001

0.01(i)

0.0001

0.001

0.01

0.1

0 200 400 600 800 1000no. of time points in dataset

(g)

0 200 400 600 800no. of time points in dataset

0.0001

0.001

0.01(h)

Figure 7. The relationship between the parameter estimates produced by the method in §3 and theamount of data ( pZ19, qZn, and lZ1).(a) DATM, (b) DMDM2, (c) Dp53, (d ) pMDM2, (e) pp53, ( f )k1, (g) k2, (h) k3 and (i ) k4.

D. Brewer et al.542

Phil. Trans. R. Soc. A (2008)

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

Page 26: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

543Fitting short time course data

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

References

Acton, F. S. 1990 Numerical methods that work. Washington, DC: Mathematical Association of

America.

Baden, N. & Villadsen, J. 1982 A family of collocation based methods for parameter estimation in

differential equations. Chem. Eng. J. 23, 1–13. (doi:10.1016/0300-9467(82)85001-6)

Barenco, M., Tomescu, D., Brewer, D., Callard, R., Stark, J. & Hubank, M. 2006 Ranked

prediction of p53 targets using hidden variable dynamic modelling (HVDM). Genome Biol. 7,

R25. (doi:10.1186/gb-2006-7-3-r25)

Bellman, R., Jacquez, J., Kalaba, R. & Schwimmer, S. 1967 Quasilinearization and the

estimation of chemical rate constants from raw kinetic data. Math. Biosci. 1, 71–76. (doi:10.

1016/0025-5564(67)90027-2)Biegler, L. T. 1984 Solution of dynamic optimization problems by successive quadratic

programming and orthogonal collocation. Comput. Chem. Eng. 8, 243–247. (doi:10.1016/

0098-1354(84)87012-X)

Bode, A. M. & Dong, Z. 2004 Post-translational modification of p53 in tumorigenesis. Nat. Rev.

Cancer 4, 793–805. (doi:10.1038/nrc1455)Brewer, D. S. 2006 Modelling the p53 gene regulatory network. PhD thesis, University of

London.

Cardoso, M. F., Salcedo, R. L. & DeAzevedo, S. F. 1996 The simplex-simulated annealing

approach to continuous non-linear optimization. Comput. Chem. Eng. 20, 1065–1080. (doi:10.

1016/0098-1354(95)00221-9)Cash, J. R. & Karp, A. H. 1990 A variable order Runge–Kutta method for initial-value problems with

rapidly varying right-hand sides.ACMTrans.Math. Softw. 16, 201–222. (doi:10.1145/79505.79507)

De Boor, C. 1978 A practical guide to splines. Applied mathematical sciences, vol. 27. New York,

NY: Springer.

Dempster, A., Laird, N. & Rubin, D. 1977 Maximum likelihood from incomplete data via the EM

algorithm. J. R. Stat. Soc. B 39, 138.

Esposito, W. R. & Floudas, C. A. 2000 Global optimization for the parameter estimation of

differential-algebraic systems. Ind. Eng. Chem. Res. 39, 1291–1310. (doi:10.1021/ie990486w)

Fehlberg, E. 1968 Classical fifth, sixth, seventh, and eighth order Runge–Kutta formulas with

stepsize control. Technical report TR R-287, NASA.Gershenfeld, N. A. 1999 The nature of mathematical modeling. Cambridge, UK: Cambridge

University Press.

Gilks, W. R., Richardson, S. & Spiegelhalter, D. J. 1996 Markov chain Monte Carlo in practice.

London, UK: Chapman & Hall.

Golub, G. H. & Ortega, J. M. 1992 Scientific computing and differential equations: an

introduction to numerical methods. Boston, MA; London, UK: Academic Press.

Kirkpatrick, S. 1984 Optimization by simulated annealing—quantitative studies. J. Stat. Phys.

34, 975–986. (doi:10.1007/BF01009452)

Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. 1983 Optimization by simulated annealing.

Science 220, 671–680. (doi:10.1126/science.220.4598.671)Kuznetsov, Y. A. 1995 Elements of applied bifurcation theory. Berlin, Germany: Springer.

Kvasnicka, V. & Pospichal, J. 1997 A hybrid of simplex method and simulated annealing.

Chemomet. Intell. Lab. Syst. 39, 161–173. (doi:10.1016/S0169-7439(97)00071-3)

Lane, D. P. 1992 Cancer. p53, guardian of the genome. Nature 358, 15–16. (doi:10.1038/

358015a0)

Lawson, C. L. & Hanson, R. J. 1974 Solving least squares problems. Upper Saddle River, NJ:

Prentice-Hall.

Li, Z., Osborne, M. R. & Prvan, T. 2005 Parameter estimation of ordinary differential equations.

IMA J. Numer. Anal. 25, 264–285. (doi:10.1093/imanum/drh016)Marquardt, D. W. 1963 An algorithm for least-squares estimation of nonlinear parameters. SIAM

J. Appl. Math. 11, 431–441. (doi:10.1137/0111030)

Phil. Trans. R. Soc. A (2008)

Page 27: Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN

D. Brewer et al.544

on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from

Metropolis,N.,Rosenbluth,A.W.,Rosenbluth,M.N.,Teller,A.H.&Teller, E. 1953Equations of statecalculations by fast computing machine. J. Chem. Phys. 21, 1087–1091. (doi:10.1063/1.1699114)

Moon, T. 1996 The expectation–maximization algorithm. Signal Process. 13, 4760.Nelder, J. A. & Mead, R. 1965 A simplex method for function minimization. Comput. J. 7,

308–313.Pozo, R. 2004 Template numerical toolkit. See http://math.nist.gov/tnt.Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. 2002 Numerical recipes in

CCC: the art of scientific computing, 2nd edn. Cambridge, UK: Cambridge University Press.Ramsay, J. O., Hooker, G., Campbell, D. & Cao, J. 2007 Parameter estimation for differential

equations: a generalized smoothing approach. J. R. Stat. Soc. B 69, 741–796. (doi:10.1111/j.1467-9868.2007.00610.x)

Seinfeld, J. H. 1969 Identification of parameters in partial differential equations. Chem. Eng. J24, 65–74. (doi:10.1016/0009-2509(69)80009-6)

Soussi, T., Dehouche, K. & Beroud, C. 2000 p53 website and analysis of p53 gene mutations inhuman cancer: forging a link between epidemiology and carcinogenesis. Hum. Mutat. 15,105–13. (doi:10.1002/(SICI)1098-1004(200001)15:1!105::AID-HUMU19O3.0.CO;2-G)

Stark, J., Callard, R. & Hubank, M. 2003a From the top down: towards a predictive biology ofgene networks. Trends Biotechnol. 21, 290–293. (doi:10.1016/S0167-7799(03)00140-9)

Stark, J., Brewer, D., Barenco, M., Tomescu, D., Callard, R. & Hubank, M. 2003bReconstructing gene networks: what are the limits? Biochem. Soc. Trans. 31, 1519–1525.

Swartz, J. & Bremermann, H. 1975 Discussion of parameter estimation in biological modelling:Algorithms for estimation and evaluation of the estimates. J. Math. Biol. 1, 241–257. (doi:10.1007/BF01273746)

Timmer, J., Rust, H., Horbelt, W. & Voss, H. U. 2000 Parametric, nonparametric andparametric modelling of a chaotic circuit time series. Phys. Lett. A 274, 123–134. (doi:10.1016/S0375-9601(00)00548-X)

Tjoa, I. B. & Biegler, L. T. 1991 Simultaneous solution and optimization strategies forparameter-estimation of differential-algebraic equation systems. Ind. Eng. Chem. Res. 30,376–385. (doi:10.1021/ie00050a015)

Torres, F. M., Agichtein, E., Grinberg, L., Yu, G. W. & Topper, R. Q. 1997 A note on theapplication of the “Boltzmann simplex”—simulated annealing algorithm to globaloptimizations of argon and water clusters. Theochem.: J. Mol. Struct. 419, 85–95. (doi:10.1016/S0166-1280(97)00195-4)

van den Bosch, B. & Hellinckx, L. J. 1974 A new method for estimation of parameters indifferential equations. AIChE J. 20, 250–256. (doi:10.1002/aic.690200207)

Varah, J. M. 1982 A spline least squares method for numerical parameter estimation indifferential equations. SIAM J. Sci. Comput. 3, 28–46. (doi:10.1137/0903003)

Villadsen, J. V. & Stewart, W. E. 1967 Solution of boundary-value problems by orthogonalcollocation. Chem. Eng. Sci. 22, 1483–1501. (doi:10.1016/0009-2509(67)80074-5)

Wang, F. S. 2000 A modified collocation method for solving differential-algebraic equations.Appl. Math. Comput. 116, 257–278. (doi:10.1016/S0096-3003(99)00138-1)

Phil. Trans. R. Soc. A (2008)