v. nonlinear regression by modified gauss-newton method: theory

V. Nonlinear Regression By Modified Gauss-Newton Method: Theory

Method to calculate model parameter estimates that result in the best fit, in a least squares sense, to the calibration data.

Implemented as an iterative form of linear regression

Chapter 5 and Appendix A (Hill and Tiedeman, 2007)

Outline

Linear regression and normal equations Nonlinear regression and normal equations Modifications that make the nonlinear regression

normal equations work in practice Stopping criteria Limits on estimated parameters Log transformed parameters Exercises 5.2a and 5.2b Addition of prior information Exercise 5.2c

Simple Linear Regression

Suppose we collect some data and want to determine the relation between the observed values, y, and the independent variable, x:

iii xββy 10

observedresponse

true,unknownintercept

true,unknown

slope

true,unknown

random error

y

x

response (dependent variable)

predictor (independent variable)

If we think the relation between y and x is linear, we can model the data using a linear model:

0 and 1 are the true parameters of this linear model.


Don’t know the true values of the parameters. Estimate them using the assumed model and the observations, by

expressing the linear model in terms of estimated parameter values:

iii exbby 10

observedresponse

y

x

xbby 10

iii yye

Estimate b0 and b1 to obtain the best fit of the simulated values to the observations.

One method: Minimize sum of squared errors, or residuals.

estimate of 0

estimate of 1

residual

simulated values, yi


Sum of squared residuals:

n

ii

n

iii

n

ii x)bb(y)y(ye),bS(b

1

10

11

10

222

To minimize:

This results in the normal equations:

b

S0

0

Set and b

S0

1

n

iii

n

ii

n

ii

n

ii

n

ii

yxxbxb

yxbnb

11

21

10

1110

Solve these equations to obtain expressions for b0 and b1, the parameter estimates that give the best fit of the simulated and observed values.

If have more than 2 parameters, need to replace summations with matrix notation. First, look at using 2 parameters and matrix notation.

Linear Regression in Matrix Form

iii exbby 10 Linear regression model: , i=1.n

In general, X is of dimension ND x NP, and b is of dimension NP, where ND is the number of observations and NP is the number of parameters.

The normal equations (b’ is the vector of least-squares estimates of b):

e bXy

ny

y

y

y2

1

nx

x

x

X

1

1

1

2

1

1

0

b

bb

ne

e

e

e2

1

vector ofobserved

values

matrix ofcoefficients

vector ofresiduals

vector ofparameters

yXbXX TT yXXXb TT 1)(

n

iii

n

ii

n

ii

n

ii

n

ii

yxxbxb

yxbnb

11

21

10

1110Using

summa-tions

Using matrix notation:

Linear Regression with Weighting

When using weights, we minimize the sum of squared weighted residuals:

)bXy()bXy(ee)e()bS( TTND

iii

/

1

221

We again set the derivative of the objective function to zero:

b

)bS(0

This leads to the normal equations:

yX)bXX( TT

Solving for b’:

yX)XX(b TT 1

Linear versus Nonlinear Models

Linear models: Sensitivities of matrix X are not a function of the model parameters:

10

b

yii

i xb

y

1

nx

x

x

X

1

1

1

2

1

and ; recall

ii xbby 10

Linear models have elliptical objective function surfaces. With two parameters:

b1

b2

S(b) One step to get to the minimum.

Linear versus Nonlinear Models

Nonlinear models: Sensitivities of matrix X are a function of the model parameters.

Ground-water models are commonly nonlinear. This nonlinearity comes from Darcy’s Law:

Q Q

h1

h2

x=0

Q = -KA (h/x)

h = h1 – x (Q/KA)

Derivatives:

h/x = -Q/KA Not a function of x. Linear in x.

h/Q = -x/KA Function of K. Linear in Q, but nonlinear in K.

h/K = xQ/K2AFunction of K and Q. Nonlinear in both K and Q.

Nonlinear Regression

An iterative form of linear regression, with some modifications to

the normal equations to make them work in practice. General

procedure:

1. Linearize the model around the current parameter values. This

results in a linearized objective-function surface.

2. Using the normal equations, calculate new parameter values that

are closer to the minimum of the linearized objective-function

surface, and therefore, hopefully closer to the minimum of the

nonlinear objective-function surface.

3. Repeat from step 1.


Nonlinear model:

e) b(yy

y’(b) is nonlinear function of b.

To linearize y’(b), we use a Taylor Series expansion about the current values of b.

For a one-parameter model:

)b(bb

(b)y')(by'(b)y'

bb

iii 00

0


One-parameter model example: The Theim equation (pg 70).

drawdown=[Q/(2T)] ln(r/r0)

-1600

-1200

-800

-400

0

0 0.005 0.01 0.015 0.02 0.025Transmissivity (T)

Dra

wd

ow

n (

s)

Nonlinear Theim Model

Model Linearized about T=0.005

0

50000

100000

150000

200000

0 0.005 0.01 0.015 0.02 0.025

Transmissivity (T)

Su

m o

f s

qre

d, w

ted

re

sid

s

Nonlinear Objective Function

Linearized Objective Function

ModelsNonlinear and

Linearized

Objective Functions Nonlinear and

Linearized


Two-parameter model example: The Theis equation (pg 75).

(A)Time, in seconds Drawdown, in feet

480 1.711020 2.231500 2.542040 2.772700 3.043720 3.25

4920 3.56 Pumpage = 1.16 ft3/s

Distance from pumping to observation well = 175 ft

Nonlinear objective-function surface

Objective-function surface linearized about a point far from minimum of nonlinear objective-function surface

Objective-function surface linearized about a point close

to minimum of nonlinear objective-function surface

True minimum

Points about which the surface is linearized


Objective function for nonlinear model:

))b'(yy())b'(yy())b(y'(y)bS( Tii

n

ii

2

1

To minimize S, we first substitute the linearized approximation of y’(b) into S. The linear approximation written in terms of sensitivity matrix X:

)bb(X)b'(y)b('y bb 000

Substituting this into S, and setting 0)(

b

bSleads once again to

The NORMAL EQUATIONS!!!

(XT X) d = XT (y - y )


(XT X) d = XT (y - y )

The NORMAL EQUATIONS:

Same form as the normal equations for linear regression, except:

d replaces b

y - y replaces y

These differences are because of the iterative process used to solve the nonlinear regression.

d is the parameter change vector: br+1 = br + d.

Solve normal equations for d, then calculate br+1, which are the parameter

values at the next iteration, r+1.

X is the matrix of sensitivities calculated at br.

Derived independently by Gauss and Newton in the early 1800’s.

Performed poorly; unstable

Geometry of the Normal Equations

These normal equations were developed in the early 1800’s. However, in the form presented above, they typically have convergence problems. Modifications are needed to make them work well.

(XT X) d = XT (y-y)Alters d to be different than straight down gradient

Parameter change vector

Proportional to the gradient of the objective function, S/b. Points downhill. Steepest descent direction.

b2

b1

Contours of S(b)

Making the Normal Equations Work: Scaling

Scaling is needed when the parameter values are very different in magnitude, so that sensitivities are also very different. Improves accuracy of the calculated change vector, d.

Scaling does NOT change the magnitude or direction of the parameter change vector, d.

(CT XT X C) C-1 d = CT XT (y-y)

Define C so the diagonal terms of this scaled matrix equal 1.0

C terms added here to maintain equality in the equation.

Making the Normal Equations Work:The Marquardt Parameter (1950’s)

Direction change is needed when the scaled parameter change vector, d, is in a direction that is not likely to help.

Addition of the Marquardt term results in a change vector, d, that points in a direction that is closer to the steepest descent direction.

(CT XT X C + I mr) C-1 d = CT XT (y-y)

Marquardtterm.

Scaled d vector

Steepest descent direction

b2

b1

Calculating the Marquardt Parameter

Marquardt parameter used to improve regression performance for ill posed problems. Initially mr =0 for all iterations.

If d is too close to being orthogonal to steepest descent direction, then Marquardt parameter is used:

b2

b1 Criteria for implementing Marquardt parameter suggested by Cooley and Naff

(1990): If cosine of angle is <0.08 (angle > ~85), then mr,new= a x mr,old + b

Cooley and Naff (1990) suggest a=1.5 and b=0.001. In MODFLOW-2000 (PES file) and UCODE_2005, user can specify cosine, a, and b.

Making the Normal Equations Work: Damping

Damping – allow the parameter values to change less than indicated by d. Damping helps remedy overshoot problems.

Implemented by inclusion of damping factor r in calculation of br+1:

br+1 = r d + br.

Changes the magnitude but not the direction of d.

The value of r is calculated internally by MF-2000 or UCODE_2005.

The value calculated equals the damping required so that no parameter changes by more than user-specified factor MaxChange. A common value is 2.0.

Making the Normal Equations Work: Damping

The factor by which the regression ‘wants’ to change the jth parameter is:

r

r

r

rr

j

j

j

jj

b

d

b

bbDMX

1where 1r

jb Is calculated with r = 1.0.

If DMX is greater than user-specified MaxChange, then r is calculated as:

rr

jj bdDMX

MaxChangeMaxChanger

Example: Suppose MaxChange = 2.0, and DMX = 10.0:

r = 2.0 / 10.0 = 0.2

In this example, each model parameter will be actually changed by only 0.2 times the value of DMX calculated by the regression for that parameter.

UCODE_2005 allows a different MaxChange values to be defined for each parameter.

Stopping Criteria: TolPar and TolSOSC

Gauss-Newton process is iterative – so, when do you stop iterating?

Need a convergence criteria.

Two convergence criteria in MODFLOW-2000 and UCODE_2005: TolPar and TolSOSC.

TolPar (Tolerance fro Parameters): The largest fractional change in a parameter value. For regression to converge:

parameters allfor ,TolPar jr

r

j

jb

d

TolPar should ideally be 0.01; larger values may be needed in initial regression runs or for insensitive parameters

TolSOSC: Convergence criterion for the sum of squared weighted residuals objective function. This criterion stops regression when the model fit isn’t changing much.

In the final regression runs, the TolPar convergence criterion should be met.

UCODE_2005 Flow Chart

Flowchart showing the major

steps of UCODE_2005.

(Figure 1 of Poeter, Hill,

Banta, Mehl, and

Christensen, 2005) .

Calculate central-difference sensitivities Calculate and print statistics and generate data_exchange files

YES

Initialize problem

Create input files for the process model(s) using current parameter values

Execute process model(s)

Extract values from process-model output files Calculate simulated equivalents for observations

Start sensitivity loop, parameter# = 1

para

met

er#

= p

aram

eter

# +

1

i ite

rati

on#

= i

tera

tion

# +

1

Perturb this parameter and recreate the input files for the process model(s)

Execute process model(s)

Unperturb this parameter

Extract values from process-model output files Calculate forward-difference perturbation sensitivities for this parameter

Update parameter values using modified Gauss-Newton method [Reg_GN_Controls]

Converged or maximum number of iterations?

YES

STOP

Last parameter? NO

Start parameter-estimation iterations, iteration# = 1

NO

START

UCODE Flow Chart

Flowchart showing the major

steps of UCODE_2005.

(Figure 1 of Poeter, Hill,

Banta, Mehl, and

Christensen, 2005)

Lists the input that control

each major step.

Calculate central-difference perturbation sensitivities [Parameter Blocks]. Use them to calculate and print statistics and generate data_exchange files.

YES

Initialize problem [Options, UCODE_Control_Data]

Create input files for the process model(s) using current parameter values [Model_Input_Files, Template files, Parameter blocks]

Execute process model(s) [Model_Command_Lines]

Extract from process-model output files [Model_Output_Files, Instruction files]. Calculate simulated equivalents for observations [Observation blocks].

Start perturbation-sensitivity loop, parameter# = 1

para

met

er#

= p

aram

eter

# +

1

i ite

rati

on#

= i

tera

tion

# +

1

Perturb this parameter. Recreate input files for the process model(s) [Parameter blocks, Model_Input_Files, Template files]

Execute process model(s) [Model_Command_Lines]

Unperturb this parameter

Extract from process-model output files [Model_Output_Files, Instruction files]. Calculate forward-difference perturbationsensitivities

for this parameter [Parameter blocks]

Update parameter values using modified Gauss-Newton method [Reg_GN_Controls]

Converged or maximum number of iterations? [Reg_GN_Controls]

YES

STOP

Last parameter? NO

Start parameter-estimation iterations, iteration# = 1

NO

START

Use of Limits on Estimated Parameter Values

Often used to constrain estimated parameter values to avoid unrealistic values. But unrealistic values can be a valuable indicator of model error!!

But what about insensitive parameters?

Applying limits results in the estimated parameter being at the edge of reasonable range of values.

Using prior information instead results in parameter values that are in the middle of the range of reasonable parameter values.

Applying limits results in difficulties in propagating uncertainty in limited parameters to uncertainty in predictions.

Using prior information provides a clear framework for propagating uncertainty in the parameters to uncertainty in the predictions.

Limits are allowed in UCODE_2005, because it is applicable to ANY model, and limits may be needed to maintain parameter values for which solutions can be obtained. See the Parameter_Data input block, documentation p. 70.

In MODFLOW-2000, this is achieved internally by, for example, not letting hydraulic conductivity parameters be negative unless the user says otherwise.

Log-Transformed Parameters

Log-transforming parameters can sometimes make a nonlinear regression problem behave more linearly:

From Carrera and Neuman, WRR, 1986, 22(2), p. 211-227

Log-transforming also prevents the regression from calculating negative values for the parameters in their native space: Log-normal distributionNormal distribution

In MODFLOW-2000 and UCODE, user generally sees values in native space, even when log-transforming is used. Parameter estimates and statistics are printed in the output files as native and log10 transformed values. (Codes do calculations in natural log space).

Exercises 5.2a and 5.2b

DO EXERCISE 5.2a: Define range of reasonable parameter values.

DO EXERCISE 5.2b: First attempt at regression.

First, use native parameter values

Second, try log transforming the K-related parameters

Exercise 5.2b – Looking at Results

Can use GW Chart to look at some results.

Choose UCODE_2005.

File Open File, then choose ex5.2._b. This file contains the parameter estimates for all parameter estimation iterations. Click ‘yes’ in the following dialogue boxes.

Before proceeding with the exercise, close GW_Chart. MODFLOW-2000 will not run if a needed file is open in GW_Chart.

Exercise 5.2b – No log transform

Exercise 5.2b – K’s log transformed

Analysis of non convergence

Several K parameters being changed to orders of magnitude smaller than start values.

What might we do now? Regression did not work with or without log transforming.

Recall CSS – indicated might not be able to estimate all parameters.

Should we fix some parameters? Another option: Add prior. If so, which parameters?

Prior Information

Allows direct, independent measurements of model input values to be included in the regression:

2

1

2

1))('())('()( bPPbyybS ii

npr

iiii

n

ii

Observations Prior

Use prior carefully. Before adding prior, first try performing regression without prior, and assess how much information the dependent observations alone provide towards estimating the model parameters.

Using prior instead of fixing the value of a parameter allows uncertainty in the parameter to be included in the calculated measures of parameter and prediction uncertainty.

When model execution times are long, can fix parameter value initially, then use prior information for final regression runs and(or) to evaluate uncertainty

Prior Information

In MODFLOW-2000 and UCODE, the simulated values related to the prior information are of the form:

where bj is a parameter value and apj is a coefficient

Commonly, the prior information relates to a single parameter value.

Example: prior information is the field estimate of K from a large-scale aquifer test. Pp is the field estimate, and P’p is the regression estimate of the K parameter.

Prior information can also relate to more than one parameter.

Example: prior information is an annual recharge estimate, but we estimate seasonal recharge in the model. Pp is the annual recharge estimate, and P’p is the sum of the regression estimates of seasonal recharges.

jNP

jpjp babP

1)(

Weighting Prior Information

If the weight reflects the actual uncertainty in the value specified, that is,

then this is truly prior information, as used in Bayesian theory.

21

i

If the weight is larger than is consistent with the actual uncertainty, that is,

so that the value of STATISTIC is less than that which would accurately reflect the uncertainty, then what is called prior information needs to be classified instead as regularization.

21

i

Weighting Prior Information

It is sometimes necessary to use regularization to make the problem tractable,

but the result is that measures of uncertainty produced by the model will not

be correct.

One approach to this problem is to calibrate the model with the regularization

needed to produce a tractable problem, and then change the weighting when

calculating prediction uncertainty.

Entering Prior Equations in UCODE

Prior equation in UCODE_2005: Linear_prior_information using the format table:

In UCODE, if the parameter is log-transformed,

the value is in native space

the STATISTIC must relate to the log-transformed value of the prior estimate. STATFLAG identifies what STAT is (variance, standard deviation, or coefficient of variation).

The equation needs to include the log10 of the parameter.

BEGIN LINEAR_PRIOR_INFORMATION TABLE

nrow=2 ncol=5 columnlabels groupname=prior

priorname equation PriorInfoValue Statistic StatFlag

PRIOR_VK_CB VK_CB 1.0E-7 0.3 CV

Eqn_Rch_Ann 0.5*Rch1+0.5*Rch2 37.0 4.0 VAR

END LINEAR_PRIOR_INFORMATION

Exercise 5.2c

DO EXERCISE 5.2c: Assign prior information on parameters, and re-run the regression.

Do not log transform

Add prior

Which parameters should we add prior to?

Use the starting values as prior value with a coefficient of variation of 0.3

PROBLEM: Compare ex5.2c.#uout (from 5.2c) and to ex5.2.#uout (from 5.2b)

Prior Information Summary

Prior information Can help stabilize and improve the inversion procedure.

But use realistic weights when analyzing uncertainties Can be thought of as:

Additional observations, or A penalty function

Is a way for the modeler to incorporate their own judgment about the parameter values.

Use carefully! (see Weiss & Smith, GW 1998) Apply to insensitive parameters, not sensitive parameters.

v. nonlinear regression by modified gauss-newton method: theory

Documents

model parameters

sensitivities of matrix

normal equations b

assumed model

number of parameters

true parameters

outlinelinear regression

true values