least squares and chebyshev fitting for parameter estimation in odes

Advances in Computational Mathematics 1(1993)357-366 357

Least squares and Chebyshev fitting for parameter estimation in ODEs*

Jack Williams and Z. Kalogiratou

Department of Mathematics, University of Manchester, Manchester M13 9PL, UK

Received 6 July 1992; revised 7 May 1993

Abstract

We discuss the problem of determining parameters in mathematical models described by ordinary differential equations. This problem is normally treated by least squares fitting. Here some results from nonlinear mean square approximation theory are outlined which highlight the problems associated with nonuniqueness of global and local minima in this fitting procedure. Alternatively, for Chebyshev fitting and for the case of a single differential equation, we extend and apply the theory of [17,18] which ensures a unique global best approximation. The theory is applied to two numerical examples which show how typical difficulties associated with mean square fitting can be avoided in Chebyshev fitting.

Keywords: Parameter estimation, least squares approximation, Chebyshev approximation, differential equations, initial value problems.

AMS(MOS) subject classifications: primary 65L05, 41A50.

1. Introduction

Parameter estimation arises in the fitting of mathematical models which contain unknown parameters. In recent years the problem has attracted considerable interest, particularly in areas such as chemical engineering, biological systems and chemical kinetics [2]. A system of ordinary differential equations is specified which contain unknown parameters and in practice the standard procedure is to estimate the parameters by carrying out a nonlinear least squares fitting procedure; this process is outlined in section 2.

In section 3 some known results from nonlinear/-,2 approximation theory are considered which help to expose the fundamental difficulty of there possibly being

*This paper is presented as an outcome of the LMS Durham Symposium convened by Professor C.T.H. Baker on 4th-14th July 1992 with support from the SERC under Grant reference number GR/H03964.

J.C. Baltzer AG, Science Publishers

358 J. Williams, Z. Kalogiratou, Parameter estimation in ODEs

several global minima; several local minima are also possible. Since all available methods for minimising a sum of squares compute local minima, there is the serious possibility of obtaining estimates of the parameters which are in some sense incorrect.

In attempting to confront this problem in section 4 we switch attention to nonlinear L,. or Chebyshev fitting and extend the work of [17,18]. Here, at least in the case of a parameter fitting problem for a single differential equation, classes of models are identified which have unique global fits. Finally, in section 5 some numerical examples are presented which illustrate the possible advantages of l** fitting, in helping to avoid convergence to a stationary point which corresponds to some form of degeneracy in the model. This may be unavoidable in best 17 fitting.

2. The parameter estimation problem

We consider mathematical models of the following form:

dy = f ( t , y , p ) , t e [ 0 , T ] , y ( 0 ) = g ( p ) , y ~ R m, p e P c R n. (2.1) dt

Here P consists of the set of feasible parameters. The idea is to estimate the values of the unknown parameters from the results {ti, Y~ }, i = 1,2 . . . . . N, of experiments at times ti, mN > n. In some cases the estimation may be carried out on the basis of fitting only m" < m "components. The standard fitting criterion is by maximum likelihood analysis, Bard [1 ], and leads to the minimisation of an objective function whose form depends on the assumptions regarding the structure of the errors in the data. A widely used assumption is that the experimental errors at different times ti are uncorrelated and are normally distributed with mean zero and co- variance matrix Vi. With the error e(p, ti)= Y i - Y ( p , ti), the maximum likelihood parameters pM satisfy:

N

S(p M) < S(p) = Z e ( p , ti)Tvi-le(p, ti), Vp ~ P. (2.2) i=1

It is worth noting that in practice the fitting problem (2.2) may be formulated irrespective of any statistical considerations whatever. In this case the data is regarded as being equally reliable so that Vi = I, i = 1,2 . . . . . N. In such cases the least squares criterion simply provides the most intuitively appealing approach which is manageable computationally. Clearly in this situation other fitting criteria may be considered, for example, best l** fitting.

3. Nonlinear Lz approximation

An obvious difficulty associated with a nonlinear approximation problem is the possibility of a computational scheme converging to a stationary point which is a nonunique global minimum, or a local minimum which is not a global minimum.

J. Williams, Z. Kalogiratou, Parameter estimation in ODEs 359

In the context of a parameter estimation in ODEs, this is only of real practical importance if the actual parameter values are required, since it would be possible to compute the wrong parameters. Otherwise, for model discrimination purposes, it is only necessary to compute some feasible parameters for which the fit is deemed acceptable.

Some insights into the various possibilities can be gained by considering some results from nonlinear/-a approximation theory. We begin with an artificial parameter estimation problem which results in three stationary points.

y, = -PzY t ~ [-1,1], y(-1) = Pl. (3.1) 1 + p2t'

Parameter values are required which yield a best/-.z fit to Y(~ - t on the range [-1, 1]. Let the solution of (3.1) be y(p, t), then the problem is min t, I_l(t - y(p,t))2dt, giving two global minima pl, p2:

pl = (-1.6509, 0.9322), p2 = (0.0580,-0.9322), l i t - y(p1,t)[[ 2 = 0.694,

and a saddle point p*= (0,0) with IIt-y(p*,Oll2=.f2]-3 ---0.816. In general nonlinear L2 approximation, it only appears possible to characterize

a local best approximation. For example, for ordinary rational approximation to Y ~ C[a,b], from R := {r = P/Q : e ~ n , Q ~ m ; Q(x) > O, t ~ [a,b]}, where @n denotes the space of polynomials of degree at most n; a best approximation satisfies II Y - r*l12 --- II Y - rl12 for all r ~ R. Existence of a best approximation r* is assured but no complete characterization is known. Cheney and Goldstein [6] give a local characterization theorem. Examples of nonuniqueness of the best approximation and cases of several local minima are given by Dunham [8], Wolfe [19], Diener [10] and Braess [5]. For approximation from some more general nonlinear families it is known that if Y is sufficiently close to the approximating family, the best approximation is unique, Cheney and Goldstein [7]. Similar types of results are known for approximation from a Hilbert space, Spies [15] and Wolfe [19]. The general problem is

minS(p), P e R n, A" P---~ H, A(p)I[ H, p~P

where A denotes a smooth mapping from the parameter space P to a Hilbert space H with inner product norm II ltn. This setting could correspond to the ODE parameter estimation problem now defined on a continuum. Suppose a local best approximation A(p*) has a certain bound on the error, then this approximation is the unique global best approximation to Y [15]. In practice this may at best be interpreted as saying that if the ODE model can represent the data sufficiently accurately, the corresponding fit is a unique global fit.


4. Nonlinear L . approximation

In view of the above theoretical results it is natural to consider other fitting criteria which may increase the possibility of unique global fits. Some progress is possible for the case of single differential equation models and the L** norm. In [17], the authors consider Chebyshev approximation to continuous functions Y ~ C[a, b] by solutions y(p , t) of parameter dependent differential equations. This work may be applied to the parameter estimation problem. The model is

y" = f ( t , y ,p) , t ~ [a,b], y(a) = Pl, P = (Pl,P2 . . . . . Pn) ~ P. (4.1)

In [17] conditions on the model (4.1) are established which guarantee that y(p*,t) is the unique global best approximation to given Y(t). Furthermore, the best approximation is characterized by the familiar Chebyshev equi-oscillation property of the error curve. The approximation can therefore be computed by the Exchange a lgor i thm [13,3]. Approx ima t ion on a f ini te d iscre te subset X = {a < tl < t2 . . . . . < t~ < b} of [a,b] is treated in [18].

Let the solutions of (4.1) be V := {y(p, t} :p = (Pl,P2 . . . . . Pn) ~P , t E [a,b] }. Also now let X = [a,b] or X be the above discrete subset of [a,b]. Then for Y ~ C[X], a best Chebyshev approximation y(p*, t) satisfies

I Y(t)- y(p*,t)l] x y(p,t)l[ x Vy(p , t )~ V,

where II g Ilx-maxtexlg(t)I. The results of [17] and [18] may be summarised as follows (in which condition 4 has been expanded for clarity).

THEOREM 4.1

Let the model (4.1) satisfy the following conditions.

(1) P is an open subset of •" and for p ~ P a unique solution y(p, t) exists which is sufficiently differentialiable with respect to t.

(2) For p ~P , t E [a,b], the variational equations for i = 1,2 . . . . . n,

d (Oy(p,t)~ Of(t,y,p) Oy(p,t) Of(t,y,p) d t~ ~Pi ) = 3y OPi + OPi '

with initial conditions,

3y(p,a) _ 1, Oy(p,a) _ O, i ¢: 1, Opi

have a unique solution which is sufficiently differentiable on [a, b].

(3) For p ~ P , t • [a,b], Of/Opl--0 and the space spanned by the functions Of(t,y(p,O,p)/Opl, i = 2 . . . . . n is a Haar space of dimension d(p) - 1.


(4) For p, q ~P, f ( t , y (p , t ) ,q )~C[a ,b] . For p ~P, either (case 1) there exist at most d(p) - 2 distinct points tj~ [a, b] such that f(tj, y(p, tj), p) - f ( t j , y(p, tj), q) = 0, for q ~ P, q ~ p (where double interior zeros are counted twice), or (case 2) f ( t , y( p, t), p) - f(t , y(p, t), q) - O, owing to (p2, P3 . . . . . Pn) = (q2, q3 . . . . . qn) or to y(p,t), y( q, t) being trivial (constant) solutions.

Then for Y ~ C[X], the best approximation y(p*, t) is unique and is characterized by the equi-oscillation property, that is, y(p*,t)~ V is the best approximation if and only i f there exists an a l ternant o f s = d(p*) + 1 po in t s ~/in X g iven by a < ~ < [2 < . - . < ~ < b, where e(p*,t) = Y(t) -y(p*,t) satisfies e(p*, ti) = -e(p*, ti+ 1), i = 1 . . . . . s - 1, I e(p*, i ) = II e(p*,t) IIx.

Condit ion 4 is required in order to establish that V satisfies the global Haar condition: for each p ~P, q ~ P , p ~ q , implies that y ( p , x ) - y ( q , x ) has at most d ( p ) - 1 zeros in [a,b]. The possibility that f ( t , y (p , t ) , p ) - f ( t , y (p , t ) , q ) - O, was not explicitly treated in [17]. If this occurs according to case 1 it follows that Pl ~ ql, and by uniqueness (condition 1), y (p , x ) - y (q , x ) has no zeros in [a,b]. Case 2 is trivial.

4.1. EXAMPLES OF SINGLE EQUATION MODELS

Three classes A, B and C of models (4.1) which satisfy the conditions of theorem 4.1 are identified in [ 17]. Each of these classes can be extended significantly by allowing the forcing term Z(t) in the right hand side functionf(t ,y,p) to depend also on y. This can be useful in some models occurring in mathematical biology. In practice to identify the parameter set P may require numerical investigation. The forms of the right hand sides are now as follows.

n

A : Z(t,y) + 2PiOi(Y) . i=2

B ' x ( t , y ) + o)(y)- 1 -P~-+ P3__Y. + : - .+_ Pr+2Y______~ r + Pr+3Y +. •. + Pr+s+2Y s "

r

C" Z(t, y) + to(y) £ Pi+l exp(pi+r + lY). i=1

The functions Z(t,y), w(y) are specified in each case. The introduction here o f Z(t, y) does not affect any of the essential details of the proofs in [ 17]. The essential requirement for each model is that for p ~ P, t ~ [a, b], the solutions y(p, t) are one- to-one. Furthermore, in model A, for z ~R, the range o f y ( p , t ) , the functions ~a(Z), ~(z) . . . . . ~,~(z), are continuously differentiable and satisfy the Haar condition; the best approximation has an alternant of length d(p*) + 1 = n + 1. For model C, for


z ~R, agy)# 0 and is continuously differentiable with respect to y. The best approximation has d(p") = (n + 1)/2 + l(p) where l(p) denotes the number of nonzero parameters Pi + ~ for i = 1,2 . . . . . r. For a unique representation of the exponential sum the convention p, + 2 > P, + 2 > . . . . > P2, + I, is followed. Model A can be used to model two known growth processes in mathematical biology [14], namely, the Logistic growth model and the Gompertz growth model. The introduction of Z(t ,y) allows for example the modelling of the Spruce Budworm population (with original parameters r and q) [14, p.5, p.33], given by

Y' = y2 + ry(1 - y Z(t,y) + P2q~2(Y) + P3~(Y), P2 - r, 1)3 E - r / q . 1 + y2- q')--

The above model classes can be further extended by thc simple device of carrying out a transformation of the dependent variable y(p, t), which preserves the one-to- one property.

LEMMA 4.1

Let the model (4.1) satisfy the conditions of theorem 4.1. Let h:V--->Z, where the continuously differentiable mapping h is one-to-one and satisfies dh/dy # 0 for all solutions y ~ V. Then best approximations to continuous functions on [a,b] by solutions z(p, t) = h(y(p, t)) of the transformed model are unique and are characterized by the Chebyshev equi-oscillation property of theorem 4.1.

Proof

The proof requires showing that the family of solutions z(p, t) satisfies the local and global Haar conditions [17], given that the family V does, and follows easily from the conditions on h. []

EXAMPLE

In model C let Z( t , y )=0 , and to (y ) - 1. Then with h(y )= z=exp(y) , the transformed model is,

r

Z" = Z P i + l zl+pI+'+I, t ~ [a,b], z(a) = exp(pl ). i=l

The model parameters are now powers of the dependent variable which is a useful model in biology, for example the Von Bertalanffy model [4], y' = p2y p4 + p3y p5 .

5. Numerical examples

Two numerical examples of fitting by single equation models are presented. The Chebyshev approximations were computed by the multiple Exchange algorithm


(the key assumption being that the best approximation is characterized by an altemant of n + 1 points, that is d(p*) = n) and by the general purpose nonlinear Chebyshev fitting procedure of Madsen [12] (available in the Harwell FORTRAN code VG02A [ 16]). The trust region method of Madsen computes a stationary point of maxi[ e(p, ti) I and no assumptions are made about the length of an alternant; the method serves as a means of checking the Exchange algorithm results. The least square approximations were computed using the Levenberg-Marquardt trust region method, available as the routine LMDIF from MINPACK (via electronic mail from NETLIB [11]). The differential equations were solved with the aid of LSODE from ODE PACK, also from NETLIB.

EXAMPLE 1

This example (a case of model C) illustrates how a trust region method for Chebyshev fitting can converge to a stationary point which from the equi-oscillation property of theorem 4.1 is recognised as not being the global minimum.

y" = y(p2e p3y + p4ePSY), y(0) = Pl, t ~ [0,1].

X := {ti = i/lO0 : i = 0,1 . . . . . 100}, {Y(ti)} = {4/(2ti + 1)2}.

The global best approximation has error 4.71525 x 10 -4 and is given by

p* = (4.000471525, -2.594343083, 0.115452726, 1.917275434, -0.739413975),

with 6 points of equi-oscillation. However, for the starting parameters pO = (4, -1, -1 , 0.5, 1) the Madsen method converges very slowly (242 iterations) to

if= (4.034251154,-0.836735616,-0.692162605, 0.275765222, 0.276805769)

with error norm 3.425115 x 10 -2 and with only 4 points of equi-oscillation (correct to 8 decimal digits). It is clear that the approximant has degenerated into that corresponding to the model y '=yp2e p'y, y(0)=p l , t~ [0 ,1 ] , for which the best approximation is p* = (4.034251098,-1.528897438, 0.276236880) with error norm 3.4251098 x 10 -2. Comparing the two approximations reveals that P2 + P3 -- P~, P4 - P5 --- P3. Other starting values pO can yield convergence to different ff but still satisfy these relationships. The excessively slow convergence of Madsen's method is brought about by the near rank deficient Jacobians occurring in the solution of linear subproblems.

EXAMPLE 2

Here for the deceptively simple model


y" = P2Y 2 ( P 3 + 0 2 , y ( 0 ) = p l t~[0 ,3] ,

X := {ti = 0,1,2,3}, {Y(ti)} := {2,1,1,0},

we illustrate the not uncommon situation in which a least squares procedure can very slowly drift in parameter space along a descent direction to an apparent local best approximation. In fact, the method is converging to an approximation which corresponds to a degenerate form of the model and thereby results in incorrect parameter values. In a general practical situation the form of this degeneracy (or parameter dependency) may not be at all obvious; this example allows a simple analysis of the situation. For appropriate parameter sets P, P2 ~ 0, for which the solutions are one-to-one and y(p ,x ) ~: O, x ~ [0,3], it is easily shown that the model satisfies theorem 4.1 with d(p) = 3. The parameters for the computed global l** fit are p** = (1.75,-2.0, -3.5), with II e(p**,t) I1.. = 0.25 and when used as starting values for the /2 parameters result in the computed values pZ = (1.9, -10/6, -19/6), with 11 e(P z, t) Ih -- 0.,/E-~.2. As is expected the l** fit has an error curve with 4 points of equi- oscillation; the 12 fit has 4 points of oscillation (necessary condition for a local minimum subject to the local Haar condition, Dunham [9]). In attempting to confirm the lz fit as a global best approximation by using a variety of starting values p0, the method apparently converges to widely different values, all with approximately the same (but greater than 0.2) sums of squares of errors. For example with p0 = (2, -3000, 75) there results pO = (2.011673, -940939, 1324.92) with sum of squares 0.358621 and an error curve with only 3 points of oscillation.

In the model let Pz ---> - " ' , P3 ---> + ~ with p2/p 2 ---> - k , for k > 0 a constant. The model reduces to the two parameter form y' = -ky 2, y(0) = Pl. Fitting this model to the above data produces the 12 parameters Pl = P = 2.01674, k = K = 0.535338 with sum of squares of errors 0.358337. Hence for Pl = P and P2, P3 sufficiently large, the path in (P2,P3) space defined by p2/p 2 = -K , are paths of descent for the 12 fitting by the original model. Once on this path the sum of squares 11 e(p, t) ~2 reduces very slowly; this is shown in fig. 1 and a corresponding surface is shown in fig. 2. Most numerical methods once in the vicinity of this path would be expected to follow it, until according to the accuracy requirement in the method, the change in the sum of squares is regarded as negligible. The above apparent numerical convergence to pO is associated with the value p2°/(p3°) 2 =-0.5360.

6. Conclusions

(1) For certain classes of single equation parameter estimation problems it is possible to compute unique global best Chebyshev approximations whose parameters can provide good starting values for 12 fitting.

(2) The Exchange algorithm may be used, exploiting the equi-oscillation property on n + 1 points. More general trust region methods may converge to local

0 . 3 6 2 , ~ , , , , , , , , ~ , , , , ~ , , , , ~ , , , , , , ~ ~ , , , , , , , , , . , ~ , , , ~ , , , , , ~

0,361

0.361

0.36 P, p2/(p3)^2 = - K

0.36

0.359

0.359

0.358


, i , , t l t J , , i i , , 1 1 J , t i , , 1 1 , , , t i i i i i i i t i i ~ . , , 1 . 1 i i , ~ I i 1 ,

104 105 106 107 10 a 109 1010

-p2

Fig. 1. The sum of squares along a degenerate path for example 2.

Fig. 2. The path to degeneracy for example 2.


minima, or some form of stationary point which corresponds to a degenerate model. Fitting a global Chebyshev approximation via the Exchange algorithm helps to avoid problems of degeneracy (with its associated reduction in the number of equi- oscillations of the error curve), since an approximation is sought with n + 1 points of equi-oscillation.

(3) For least squares fitting, numerical methods can converge to non-global local minima. Furthermore, the current methods are not able to avoid the possible convergence to a stationary point which corresponds to some form of degeneracy in a model, since they do not appear to take direct account of the sign changing properties of the error curve.

References

[1] Y. Bard, Nonlinear Parameter Estimation (Academic Press, New York, 1974). [2] L.T. Biegler, J.J. Damiano and G.E. Blau, Nonlinear parameter estimation: a case study, AIChE

J. 32(1986)29-45. [3] R.B. Barrar and H.L. Loeb, On the Remez algorithm for nonlinear families, Numer. Math.

15(1970)382-391. [4] L.V. Bertalanffy, Quantative laws in metabolism and growth, Quarterly Rev. Biol. 32(1957)

217-231. [5] D. Braess, On nonuniqueness in rational Lp approximation theory, JAPP 51(1987)68-70. [6] E.W. Cheney and A.A. Goldstein, Mean-squares approximation by generalised rational functions,

Math Zeitschr. 95(1967)232-241. [7] E.W. Cheney and A.A. Goldstein, A note on nonlinear approximation theory, Numer. Math.,

Differentialgleichungen-Approximations Theorie (1968)251-255. [8] C.B. Dunham, Best mean square approximation, Computing 9(1972)87-93. [9] C.B. Dunham, Nonlinear mean-square approximation on finite sets, SIAM J. Numer. Anal.

12(1975)105-110. [10] I. Diener, On nonuniqueness in nonlinear/_,2 approximation, JAPP 51(1987)54-67. [11] J. Dongarra and E. Gross, Distribution of mathematical software via electronic mail, CACM

30(1987)403-407. [12] K. Madsen, An algorithm for minimax solution of overdetermined systems of nonlinear equations,

J. Inst. Math. Appl. 16(1975)321-328. [13] G. Meinardus, Approximation of Functions: Theory and Numerical Methods (Springer, New York,

1967). [14] J.D. Murray, Mathematical Biology (Springer, Berlinfl-Ieidelberg, 1989). [15] J. Spies, Uniqueness theorems for nonlinear L 2 approximation problems, Computing

11 (1973)327-355. [16] VG02A, Harwell subroutine library, United Kingdom Atomic Energy Authority, Harwell Laboratory,

Oxfordshire, England (1988). [17] J. Williams and Z. Kalogiratou, Best Chebyshev approximation from families of ordinary equations,

IMA J. Numer. Anal. 13(1993)383-395. [ 18] J. Williams and Z. Kalogiratou, Nonlinear Chebyshev fitting from the solution of ordinary differential

equations, Numer. Algor. 5(1993), to appear. [19] J.M. Wolfe, On the unicity of nonlinear approximation in smooth spaces, IAPP 12(1974)165-181.

least squares and chebyshev fitting for parameter estimation in odes

Documents