v. nonlinear regression by modified gauss-newton method: theory
DESCRIPTION
V. Nonlinear Regression By Modified Gauss-Newton Method: Theory. Method to calculate model parameter estimates that result in the best fit, in a least squares sense, to the calibration data. Implemented as an iterative form of linear regression Chapter 5 and Appendix A (Hill and Tiedeman, 2007). - PowerPoint PPT PresentationTRANSCRIPT
V. Nonlinear Regression By Modified Gauss-Newton Method: Theory
Method to calculate model parameter estimates that result in the best fit, in a least squares sense, to the calibration data.
Implemented as an iterative form of linear regression
Chapter 5 and Appendix A (Hill and Tiedeman, 2007)
Outline
Linear regression and normal equations Nonlinear regression and normal equations Modifications that make the nonlinear regression
normal equations work in practice Stopping criteria Limits on estimated parameters Log transformed parameters Exercises 5.2a and 5.2b Addition of prior information Exercise 5.2c
Simple Linear Regression
Suppose we collect some data and want to determine the relation between the observed values, y, and the independent variable, x:
iii xββy 10
observedresponse
true,unknownintercept
true,unknown
slope
true,unknown
random error
y
x
response (dependent variable)
predictor (independent variable)
If we think the relation between y and x is linear, we can model the data using a linear model:
0 and 1 are the true parameters of this linear model.
Simple Linear Regression
Don’t know the true values of the parameters. Estimate them using the assumed model and the observations, by
expressing the linear model in terms of estimated parameter values:
iii exbby 10
observedresponse
y
x
xbby 10
iii yye
Estimate b0 and b1 to obtain the best fit of the simulated values to the observations.
One method: Minimize sum of squared errors, or residuals.
estimate of 0
estimate of 1
residual
simulated values, yi
Simple Linear Regression
Sum of squared residuals:
n
ii
n
iii
n
ii x)bb(y)y(ye),bS(b
1
10
11
10
222
To minimize:
This results in the normal equations:
b
S0
0
Set and b
S0
1
n
iii
n
ii
n
ii
n
ii
n
ii
yxxbxb
yxbnb
11
21
10
1110
Solve these equations to obtain expressions for b0 and b1, the parameter estimates that give the best fit of the simulated and observed values.
If have more than 2 parameters, need to replace summations with matrix notation. First, look at using 2 parameters and matrix notation.
Linear Regression in Matrix Form
iii exbby 10 Linear regression model: , i=1.n
In general, X is of dimension ND x NP, and b is of dimension NP, where ND is the number of observations and NP is the number of parameters.
The normal equations (b’ is the vector of least-squares estimates of b):
e bXy
ny
y
y
y2
1
nx
x
x
X
1
1
1
2
1
1
0
b
bb
ne
e
e
e2
1
vector ofobserved
values
matrix ofcoefficients
vector ofresiduals
vector ofparameters
yXbXX TT yXXXb TT 1)(
n
iii
n
ii
n
ii
n
ii
n
ii
yxxbxb
yxbnb
11
21
10
1110Using
summa-tions
Using matrix notation:
Linear Regression with Weighting
When using weights, we minimize the sum of squared weighted residuals:
)bXy()bXy(ee)e()bS( TTND
iii
/
1
221
We again set the derivative of the objective function to zero:
b
)bS(0
This leads to the normal equations:
yX)bXX( TT
Solving for b’:
yX)XX(b TT 1
Linear versus Nonlinear Models
Linear models: Sensitivities of matrix X are not a function of the model parameters:
10
b
yii
i xb
y
1
nx
x
x
X
1
1
1
2
1
and ; recall
ii xbby 10
Linear models have elliptical objective function surfaces. With two parameters:
b1
b2
S(b) One step to get to the minimum.
Linear versus Nonlinear Models
Nonlinear models: Sensitivities of matrix X are a function of the model parameters.
Ground-water models are commonly nonlinear. This nonlinearity comes from Darcy’s Law:
Q Q
h1
h2
x=0
Q = -KA (h/x)
h = h1 – x (Q/KA)
Derivatives:
h/x = -Q/KA Not a function of x. Linear in x.
h/Q = -x/KA Function of K. Linear in Q, but nonlinear in K.
h/K = xQ/K2AFunction of K and Q. Nonlinear in both K and Q.
Nonlinear Regression
An iterative form of linear regression, with some modifications to
the normal equations to make them work in practice. General
procedure:
1. Linearize the model around the current parameter values. This
results in a linearized objective-function surface.
2. Using the normal equations, calculate new parameter values that
are closer to the minimum of the linearized objective-function
surface, and therefore, hopefully closer to the minimum of the
nonlinear objective-function surface.
3. Repeat from step 1.
Nonlinear Regression
Nonlinear model:
e) b(yy
y’(b) is nonlinear function of b.
To linearize y’(b), we use a Taylor Series expansion about the current values of b.
For a one-parameter model:
)b(bb
(b)y')(by'(b)y'
bb
iii 00
0
Nonlinear Regression
One-parameter model example: The Theim equation (pg 70).
drawdown=[Q/(2T)] ln(r/r0)
-1600
-1200
-800
-400
0
0 0.005 0.01 0.015 0.02 0.025Transmissivity (T)
Dra
wd
ow
n (
s)
Nonlinear Theim Model
Model Linearized about T=0.005
0
50000
100000
150000
200000
0 0.005 0.01 0.015 0.02 0.025
Transmissivity (T)
Su
m o
f s
qre
d, w
ted
re
sid
s
Nonlinear Objective Function
Linearized Objective Function
ModelsNonlinear and
Linearized
Objective Functions Nonlinear and
Linearized
Nonlinear Regression
Two-parameter model example: The Theis equation (pg 75).
(A)Time, in seconds Drawdown, in feet
480 1.711020 2.231500 2.542040 2.772700 3.043720 3.25
4920 3.56 Pumpage = 1.16 ft3/s
Distance from pumping to observation well = 175 ft
Nonlinear objective-function surface
Objective-function surface linearized about a point far from minimum of nonlinear objective-function surface
Objective-function surface linearized about a point close
to minimum of nonlinear objective-function surface
True minimum
Points about which the surface is linearized
Nonlinear Regression
Objective function for nonlinear model:
))b'(yy())b'(yy())b(y'(y)bS( Tii
n
ii
2
1
To minimize S, we first substitute the linearized approximation of y’(b) into S. The linear approximation written in terms of sensitivity matrix X:
)bb(X)b'(y)b('y bb 000
Substituting this into S, and setting 0)(
b
bSleads once again to
The NORMAL EQUATIONS!!!
(XT X) d = XT (y - y )
Nonlinear Regression
(XT X) d = XT (y - y )
The NORMAL EQUATIONS:
Same form as the normal equations for linear regression, except:
d replaces b
y - y replaces y
These differences are because of the iterative process used to solve the nonlinear regression.
d is the parameter change vector: br+1 = br + d.
Solve normal equations for d, then calculate br+1, which are the parameter
values at the next iteration, r+1.
X is the matrix of sensitivities calculated at br.
Derived independently by Gauss and Newton in the early 1800’s.
Performed poorly; unstable
Geometry of the Normal Equations
These normal equations were developed in the early 1800’s. However, in the form presented above, they typically have convergence problems. Modifications are needed to make them work well.
(XT X) d = XT (y-y)Alters d to be different than straight down gradient
Parameter change vector
Proportional to the gradient of the objective function, S/b. Points downhill. Steepest descent direction.
b2
b1
Contours of S(b)
Making the Normal Equations Work: Scaling
Scaling is needed when the parameter values are very different in magnitude, so that sensitivities are also very different. Improves accuracy of the calculated change vector, d.
Scaling does NOT change the magnitude or direction of the parameter change vector, d.
(CT XT X C) C-1 d = CT XT (y-y)
Define C so the diagonal terms of this scaled matrix equal 1.0
C terms added here to maintain equality in the equation.
Making the Normal Equations Work:The Marquardt Parameter (1950’s)
Direction change is needed when the scaled parameter change vector, d, is in a direction that is not likely to help.
Addition of the Marquardt term results in a change vector, d, that points in a direction that is closer to the steepest descent direction.
(CT XT X C + I mr) C-1 d = CT XT (y-y)
Marquardtterm.
Scaled d vector
Steepest descent direction
b2
b1
Calculating the Marquardt Parameter
Marquardt parameter used to improve regression performance for ill posed problems. Initially mr =0 for all iterations.
If d is too close to being orthogonal to steepest descent direction, then Marquardt parameter is used:
b2
b1 Criteria for implementing Marquardt parameter suggested by Cooley and Naff
(1990): If cosine of angle is <0.08 (angle > ~85), then mr,new= a x mr,old + b
Cooley and Naff (1990) suggest a=1.5 and b=0.001. In MODFLOW-2000 (PES file) and UCODE_2005, user can specify cosine, a, and b.
Making the Normal Equations Work: Damping
Damping – allow the parameter values to change less than indicated by d. Damping helps remedy overshoot problems.
Implemented by inclusion of damping factor r in calculation of br+1:
br+1 = r d + br.
Changes the magnitude but not the direction of d.
The value of r is calculated internally by MF-2000 or UCODE_2005.
The value calculated equals the damping required so that no parameter changes by more than user-specified factor MaxChange. A common value is 2.0.
Making the Normal Equations Work: Damping
The factor by which the regression ‘wants’ to change the jth parameter is:
r
r
r
rr
j
j
j
jj
b
d
b
bbDMX
1where 1r
jb Is calculated with r = 1.0.
If DMX is greater than user-specified MaxChange, then r is calculated as:
rr
jj bdDMX
MaxChangeMaxChanger
Example: Suppose MaxChange = 2.0, and DMX = 10.0:
r = 2.0 / 10.0 = 0.2
In this example, each model parameter will be actually changed by only 0.2 times the value of DMX calculated by the regression for that parameter.
UCODE_2005 allows a different MaxChange values to be defined for each parameter.
Stopping Criteria: TolPar and TolSOSC
Gauss-Newton process is iterative – so, when do you stop iterating?
Need a convergence criteria.
Two convergence criteria in MODFLOW-2000 and UCODE_2005: TolPar and TolSOSC.
TolPar (Tolerance fro Parameters): The largest fractional change in a parameter value. For regression to converge:
parameters allfor ,TolPar jr
r
j
jb
d
TolPar should ideally be 0.01; larger values may be needed in initial regression runs or for insensitive parameters
TolSOSC: Convergence criterion for the sum of squared weighted residuals objective function. This criterion stops regression when the model fit isn’t changing much.
In the final regression runs, the TolPar convergence criterion should be met.
UCODE_2005 Flow Chart
Flowchart showing the major
steps of UCODE_2005.
(Figure 1 of Poeter, Hill,
Banta, Mehl, and
Christensen, 2005) .
Calculate central-difference sensitivities Calculate and print statistics and generate data_exchange files
YES
Initialize problem
Create input files for the process model(s) using current parameter values
Execute process model(s)
Extract values from process-model output files Calculate simulated equivalents for observations
Start sensitivity loop, parameter# = 1
para
met
er#
= p
aram
eter
# +
1
i ite
rati
on#
= i
tera
tion
# +
1
Perturb this parameter and recreate the input files for the process model(s)
Execute process model(s)
Unperturb this parameter
Extract values from process-model output files Calculate forward-difference perturbation sensitivities for this parameter
Update parameter values using modified Gauss-Newton method [Reg_GN_Controls]
Converged or maximum number of iterations?
YES
STOP
Last parameter? NO
Start parameter-estimation iterations, iteration# = 1
NO
START
UCODE Flow Chart
Flowchart showing the major
steps of UCODE_2005.
(Figure 1 of Poeter, Hill,
Banta, Mehl, and
Christensen, 2005)
Lists the input that control
each major step.
Calculate central-difference perturbation sensitivities [Parameter Blocks]. Use them to calculate and print statistics and generate data_exchange files.
YES
Initialize problem [Options, UCODE_Control_Data]
Create input files for the process model(s) using current parameter values [Model_Input_Files, Template files, Parameter blocks]
Execute process model(s) [Model_Command_Lines]
Extract from process-model output files [Model_Output_Files, Instruction files]. Calculate simulated equivalents for observations [Observation blocks].
Start perturbation-sensitivity loop, parameter# = 1
para
met
er#
= p
aram
eter
# +
1
i ite
rati
on#
= i
tera
tion
# +
1
Perturb this parameter. Recreate input files for the process model(s) [Parameter blocks, Model_Input_Files, Template files]
Execute process model(s) [Model_Command_Lines]
Unperturb this parameter
Extract from process-model output files [Model_Output_Files, Instruction files]. Calculate forward-difference perturbationsensitivities
for this parameter [Parameter blocks]
Update parameter values using modified Gauss-Newton method [Reg_GN_Controls]
Converged or maximum number of iterations? [Reg_GN_Controls]
YES
STOP
Last parameter? NO
Start parameter-estimation iterations, iteration# = 1
NO
START
Use of Limits on Estimated Parameter Values
Often used to constrain estimated parameter values to avoid unrealistic values. But unrealistic values can be a valuable indicator of model error!!
But what about insensitive parameters?
Applying limits results in the estimated parameter being at the edge of reasonable range of values.
Using prior information instead results in parameter values that are in the middle of the range of reasonable parameter values.
Applying limits results in difficulties in propagating uncertainty in limited parameters to uncertainty in predictions.
Using prior information provides a clear framework for propagating uncertainty in the parameters to uncertainty in the predictions.
Limits are allowed in UCODE_2005, because it is applicable to ANY model, and limits may be needed to maintain parameter values for which solutions can be obtained. See the Parameter_Data input block, documentation p. 70.
In MODFLOW-2000, this is achieved internally by, for example, not letting hydraulic conductivity parameters be negative unless the user says otherwise.
Log-Transformed Parameters
Log-transforming parameters can sometimes make a nonlinear regression problem behave more linearly:
From Carrera and Neuman, WRR, 1986, 22(2), p. 211-227
Log-transforming also prevents the regression from calculating negative values for the parameters in their native space: Log-normal distributionNormal distribution
In MODFLOW-2000 and UCODE, user generally sees values in native space, even when log-transforming is used. Parameter estimates and statistics are printed in the output files as native and log10 transformed values. (Codes do calculations in natural log space).
Exercises 5.2a and 5.2b
DO EXERCISE 5.2a: Define range of reasonable parameter values.
DO EXERCISE 5.2b: First attempt at regression.
First, use native parameter values
Second, try log transforming the K-related parameters
Exercise 5.2b – Looking at Results
Can use GW Chart to look at some results.
Choose UCODE_2005.
File Open File, then choose ex5.2._b. This file contains the parameter estimates for all parameter estimation iterations. Click ‘yes’ in the following dialogue boxes.
Before proceeding with the exercise, close GW_Chart. MODFLOW-2000 will not run if a needed file is open in GW_Chart.
Exercise 5.2b – No log transform
Exercise 5.2b – K’s log transformed
Analysis of non convergence
Several K parameters being changed to orders of magnitude smaller than start values.
What might we do now? Regression did not work with or without log transforming.
Recall CSS – indicated might not be able to estimate all parameters.
Should we fix some parameters? Another option: Add prior. If so, which parameters?
Prior Information
Allows direct, independent measurements of model input values to be included in the regression:
2
1
2
1))('())('()( bPPbyybS ii
npr
iiii
n
ii
Observations Prior
Use prior carefully. Before adding prior, first try performing regression without prior, and assess how much information the dependent observations alone provide towards estimating the model parameters.
Using prior instead of fixing the value of a parameter allows uncertainty in the parameter to be included in the calculated measures of parameter and prediction uncertainty.
When model execution times are long, can fix parameter value initially, then use prior information for final regression runs and(or) to evaluate uncertainty
Prior Information
In MODFLOW-2000 and UCODE, the simulated values related to the prior information are of the form:
where bj is a parameter value and apj is a coefficient
Commonly, the prior information relates to a single parameter value.
Example: prior information is the field estimate of K from a large-scale aquifer test. Pp is the field estimate, and P’p is the regression estimate of the K parameter.
Prior information can also relate to more than one parameter.
Example: prior information is an annual recharge estimate, but we estimate seasonal recharge in the model. Pp is the annual recharge estimate, and P’p is the sum of the regression estimates of seasonal recharges.
jNP
jpjp babP
1)(
Weighting Prior Information
If the weight reflects the actual uncertainty in the value specified, that is,
then this is truly prior information, as used in Bayesian theory.
21
i
If the weight is larger than is consistent with the actual uncertainty, that is,
so that the value of STATISTIC is less than that which would accurately reflect the uncertainty, then what is called prior information needs to be classified instead as regularization.
21
i
Weighting Prior Information
It is sometimes necessary to use regularization to make the problem tractable,
but the result is that measures of uncertainty produced by the model will not
be correct.
One approach to this problem is to calibrate the model with the regularization
needed to produce a tractable problem, and then change the weighting when
calculating prediction uncertainty.
Entering Prior Equations in UCODE
Prior equation in UCODE_2005: Linear_prior_information using the format table:
In UCODE, if the parameter is log-transformed,
the value is in native space
the STATISTIC must relate to the log-transformed value of the prior estimate. STATFLAG identifies what STAT is (variance, standard deviation, or coefficient of variation).
The equation needs to include the log10 of the parameter.
BEGIN LINEAR_PRIOR_INFORMATION TABLE
nrow=2 ncol=5 columnlabels groupname=prior
priorname equation PriorInfoValue Statistic StatFlag
PRIOR_VK_CB VK_CB 1.0E-7 0.3 CV
Eqn_Rch_Ann 0.5*Rch1+0.5*Rch2 37.0 4.0 VAR
END LINEAR_PRIOR_INFORMATION
Exercise 5.2c
DO EXERCISE 5.2c: Assign prior information on parameters, and re-run the regression.
Do not log transform
Add prior
Which parameters should we add prior to?
Use the starting values as prior value with a coefficient of variation of 0.3
PROBLEM: Compare ex5.2c.#uout (from 5.2c) and to ex5.2.#uout (from 5.2b)
Prior Information Summary
Prior information Can help stabilize and improve the inversion procedure.
But use realistic weights when analyzing uncertainties Can be thought of as:
Additional observations, or A penalty function
Is a way for the modeler to incorporate their own judgment about the parameter values.
Use carefully! (see Weiss & Smith, GW 1998) Apply to insensitive parameters, not sensitive parameters.