statistical modeling of physical systems: an introduction to grey box
TRANSCRIPT
Statistical Modelling of Physical Systems An introduction to Grey Box modelling
Henrik Madsen
ESI 101 Course, NREL, Colorado
Henrik Madsen 1
Outline 1
2
3
4
5
6
7
8
9
10
Introduction Selection of modelling framework Nonlinear versus linear modelling Why Stochastic Differential Equations? Grey Box Modelling
The Stochastic State Space Model Model Identification, Estimation, Selection and Validation
Identification of model order Identification of functional relation
Estimation – The maximum likelihood principle The information matrix Distribution of the ML estimator Tests for individual parameters Likelihood ratio tests
Model Validation Software – CTSM-R Summary
Henrik Madsen 2
Introduction
• Various methods of advanced modelling are needed for an increasing number of complex physical, chemical and biological systems.
• For a model to describe the future evolution of the system, it must
1. capture the inherently non-linear behavior of the system. 2. provide means to accommodate for noise due to approximation
and measurement errors.
• Calls for methods that are capable of bridging the gap between physical and statistical modelling.
Henrik Madsen 3
Which type of model to use?
• Simple / Complex • Lumped / Distributed • Linear / Non-linear • Time-invariant / Time-varying • Discrete / Continuous time • Deterministic / Stochastic • Black box / Grey box / White box • Parametric / Non-parametric
Base the decision on • Purpose • Prior knowledge • Available data • Tools
Henrik Madsen 4
Nonlinear versus linear modelling
• The aim of the modelling effort may be generally expressed as follows: Find a nonlinear function h such that {Et } defined by
h(Xt , Xt−1, . . . ) = Et (1)
is a sequence of independent random variables. • Suppose also that the model is causally invertible, i.e. the
equation above may be ’solved’ such that we may write
Xt = h'(Et , Et−1, . . . ). (2)
Henrik Madsen 5
Nonlinear vs. linear model buiding (cont.)
• Suppose that h' is sufficiently well-behaved to be expanded in a Taylor series
∞ ∞ ∞
k=0 k=0 l=0 ∞ ∞ ∞
cc
cc
c
c
Xt = µ + gk Et−k + gkl Et−k Et−l
gklmEt−k Et−l Et−m + . . . (3)+ k=0 l=0 m=0
• The functions
∂h' ∂2h' µ = h' (0), gk = ( ), gkl = ( ), etc. (4)
∂Et−k ∂ Et−k ∂ Et−l
are called the Volterra series for the process {X }. The sequences gk , gkl , . . . are called the kernels of the Volterra series.
Henrik Madsen 6
Non-linear vs. linear model building (cont.)
• For linear systems
gkl = gklm = gklmn = . . . = 0 (5)
• Hence the system is completely characterized by either
{gk } : Impulse response function
or
H(→) : Frequency response function
Henrik Madsen 7
Non-linear vs. linear model building (cont.)
• In general there is no such thing as a transfer function for non-linear systems.
• However, an infinite sequence of generalized transfer functions may be defined as
∞ c−iω1kH1(ω1) gk e=
k=0 ∞ c ∞ c
−i(ω1k+ω2 l)H2(ω1, ω2) gkl e= k=0 l=0
∞ c ∞ c ∞ c−i(ω1k+ω2 l+ω3m)H3(ω1, ω2, ω3) gklme=
k=0 l=0 m=0
. . .
Henrik Madsen 8
Non-linear vs. linear model building (cont.)
Let Ut and Xt denote the input and the output of a given system. • For linear systems it is well known that
iω0tL1 If the input is a single harmonic Ut = A0e then the output is a single harmonic of the same frequency, but with the amplitude scaled by |H(ω0)| and the phase shifted by arg H(ω0).
L2 Due to the linearity, the principle of superposition is valid, and the total output is the sum of the outputs corresponding to the individual frequency components of the input. (Hence the system is completely described by knowing the response to all frequencies – that is what the transfer function supplies).
• For non-linear systems, however, neither of the properties (L1) or (L2) hold. NL1 For an input with frequency ω0, the output will, in general, contain
also components at the frequencies 2ω0, 3ω0, . . . (frequency multiplication).
NL2 For two inputs with frequencies ω0 and ω1, the output will contain components at frequencies ω0, ω1, (ω0 + ω1) and all harmonics of the frequencies (intermodulation distortion).
Henrik Madsen 9
Why Stochastic Differential Equations? Problem Scenario
Ordinary differential equation
dA = −KA dt
Y = A + E
Henrik Madsen 10
ODE
0 5 10 15 20 25−1.8
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
Time
Con
cent
ratio
n −
logs
cale
DataODE
• Autocorrelated residuals!!
Henrik Madsen 11
Problem Scenario
Stochastic differential equation
dA = −KA dt + dw
Y = A + e
Henrik Madsen 12
ODE vs SDE
0 5 10 15 20 25−1.8
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
Time
Con
cent
ratio
n −
logs
cale
DataODE
• Uncorrelated residuals • System noise • Measurement noise
Henrik Madsen 13
ODE vs SDE
0 5 10 15 20 25−1.8
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
Time
Con
cent
ratio
n −
logs
cale
DataODESDE
• Uncorrelated residuals • System noise • Measurement noise
Henrik Madsen 13
Grey box modelling of oxygen concentration - A sketch of the physical system
Henrik Madsen 14
Grey box modelling of oxygen concentration - A white box model
Model found in the literature:
dC K = √ (Cm(T ) − C) + P(I) − R(T )
dt h h
IP(I) = PmE0 (= βI)
Pm + E0I R(T ) = R15θ
T −15 [mg/l]
Cm(T ) = 14.54 − 0.39T + 0.01T 2 [mg/l]
• Simple - however, a non-linear model. • Uncertainty of prediction does not depend on horizon.
Henrik Madsen 15
Model validation
The autocorrelation and partial autocorrelation function for the residuals from the first order model
Henrik Madsen 16
Grey box model of oxygen concentration
The following nonlinear state space (Hidden Markov) model has been found: ***The system equation:
√K
C Kb
[dC
] √ −Kc
[C]
β [
I ]
h h h= dt + dtdL K3 −Kl L 0 γ Pr
K √ Cm(T ) − R(T )
[dW1(t)
]h h+ dt +
0 dW2(t)
***The observation equation:
Cr = �1 0
� [C]
+ eL
Henrik Madsen 17
Model types
• White box models - the model structure is known and deterministic. - uncertainty is discarded and the model tends to be overspecified.
• Black box models - data based models - input/output models. - the model and its parameters have little physical significance.
• Grey box models - between white and black box models
Henrik Madsen 18
The grey box modelling concept
• Combines prior physical knowledge with information in data. • The model is not completely described by physical equations, but
equations and the parameters are physically interpretable.
Henrik Madsen 19
Why use grey box modelling?
• Prior physical knowledge can be used. • Non-linear and non-stationary models are easily formulated. • Missing data are easily accommodated. • It is possible to estimate environmental variables that are not
measured. • Available physical knowledge and statistical modelling tools is
combined to estimate the parameters of a rather complex dynamic system.
• The parameters contain information from the data that can be directly interpreted by the scientists.
• Fewer parameters → more power in the statistical tests. • The physical expert and the statistician can collaborate in the
model formulation.
Henrik Madsen 20
Stochastic Differential Equations (SDE’s)
• Ordinary Differential Equations (ODE’s) provide deterministic description of a system:
dX t = f (X t , U t , t)dt t ≥ 0.
where f is a known function of the time t and the state X and input U.
• To describe the deviation between the ODE and the true variation of the state an additive noise term is introduced.
• Physical arguments for including the noise part: 1. Modelling approximations. 2. Unrecognized inputs. 3. Measurements of the input are noise corrupted.
Henrik Madsen 21
The continuous-discrete time non-linear stochastic state space model The system equation (set of Itô stochastic differential eqs.)
dX t = f (X t , U t , θθθ) dt + G(X t , U t , θθθ) dW t , X t0 = X 0
Notation X t ∈ Rn State vector U t ∈ Rr Known input vector f Drift term G diffusion term W t Wiener process of dimension, d , with incre
mental covariance Qt
θθθ ∈ ΘΘΘ ⊆ Rp Unknown parameter vector
Henrik Madsen 22
The observation equation
The observations are in discrete time, functions of state, input, and parameters, and are subject to noise:
Y ti = h(X ti , U ti , θθθ) + eti
NoY ti h eti
tation ∈ Rm
∈ Rm
Observation vector Observation function Gaussian white noise with covariance ΣΣΣti
Observations are available at the time points ti : t1 < . . . < ti < . . . < tN
X 0, W t , eti assumed independent for all (t , ti ), t = ti
Henrik Madsen 23
Advantages of HMM with the dynamics described by SDE’s
• Provides a decomposition of the total error into process error and measurement error.
• Facilitates use of statistical tools for model validation. • Provides a systematic framework for pinpointing model
deficiencies – will be demonstrated later on. • Covariances of system error and measurement error are
estimated. • SDE based estimation gives more accurate and reliable
parameter estimates than ODE based estimation. • SDEs give more correct (more accurate and realistic) predictions
and simulations.
Henrik Madsen 24
Methods for Identification, Estimation and Model Validation
• Model Identification: See the next slides. • Parameter Estimation:
− Maximum Likelihood Methods • Model testing/selections:
− Test for significiant parameters (typically t-tests) − Test for model reductions (typically likelihood ratio tests) − Alternatively: Information criteria
• Model Validation: − Test whether the estimated model describes the data. − Autocorrelation functions – or Lag Dependent Functions. − Other classical methods ...
Henrik Madsen 25
Identification of model order (here: number of states)
Use the autocorrelation and partial autocorrelation functions
Henrik Madsen 26
Identification input variables and eg. time delays
Use the Pre-whitening procedure (pp. 223-226 in Time Series Analysis book) or Ridge regression (pp. 227-228 in TSA).
Henrik Madsen 27
Identification of functional relations
Use non-parametric methods (kernels, smoothing splines, etc.) to estimate the conditional mean and the conditional variance. • The conditional mean enters the drift term. • The conditional variance enters the diffusion term.
Henrik Madsen 28
Identification of Model Structure
• The diffusion term gives information for pinpointing model deficiencies.
• Assume that we based on ’large’ values of relevant diffusion term(s) suspect r ∈ θθθ to be a function of the states, input or time.
• Then consider the extended state space model:
dX t = f (X t , U t , θθθ) dt + G(X t , U t , θθθ) dW t , X t0 = X 0 ∗drt = dWt
Y ti = h(X ti , U ti , θθθ) + eti
(6)
which corresponds to a random walk description of rt .
Henrik Madsen 29
Identification of Model Structure
• Do we observe a significiant reduction of the relevant diffusion term(s)?
• In that case calculate the smoothed state estimate ?rt |N (use for instance the software tool CTSM-R - see slides by Niamh O’Connell).
• Plot ?rt |N versus the states, inputs and time. • Identify a possible functional relationship. • Build that functional relationship into the stochastic state space
model. • Estimate the model parameters and evaluate the improvement –
using e.g. likelihood ratio tests.
Henrik Madsen 30
The information matrix
Definition (Observed information)
The matrix ∂2
j(θθθ; y) = − l(θθθ; y)T∂θθθ∂θθθ
with the elements ∂2
j(θθθ; y)ij = − l(θθθ; y)∂θi ∂θj
is called the observed information corresponding to the observation y , evaluated in ?θθθ.
The observed information is thus equal to the Hessian (with opposite sign) of the log-likelihood function evaluated at θθθ. The Hessian matrix is simply (with opposite sign) the curvature of the log-likelihood function.
Henrik Madsen 31
Invariance property
Theorem (Invariance property)
Assume that ?θθθ is a maximum likelihood estimator for θθθ, and let ψψψ = ψψψ(θθθ) denote a one-to-one mapping of Ω ⊂ Rk onto Ψ ⊂ Rk . Then the estimator ψψψ(?θθθ) is a maximum likelihood estimator for the parameter ψψψ(θθθ).
The principle is easily generalized to the case where the mapping is not one-to-one.
Henrik Madsen 32
Distribution of the ML estimator
Theorem (Distribution of the ML estimator) We assume that 9θθθ is consistent. Then, under some regularity conditions,
9θθθ − θθθ → N(0, i(θθθ)−1)
where i(θθθ) is the expected information or the information matrix.
The results can be used for inference under very general conditions. As the price for the generality, the results are only asymptotically valid.
• Asymptotically the variance of the estimator is seen to be equal to the Cramer-Rao lower bound for any unbiased estimator.
• The practical significance of this result is that the MLE makes efficient use of the available data for large data sets.
Henrik Madsen 33
Distribution of the ML estimator
In practice, we would use
θ ∼ N(θ, j−1 θθθ))? (?
where j(?θθθ) is the observed (Fisher) information.
This means that asymptotically
i) E [?θθθ] = θθθ
ii) D[?θθθ] = j−1(?θθθ)
Henrik Madsen 34
Distribution of the ML estimator
• The standard error of θ?i is given by σ9θ̂i = Varii [θ9]
where Varii [θ?] is the i’th diagonal term of j−1(?θθθ) • Hence we have that an estimate of the dispersion
(variance-covariance matrix) of the estimator is
D[9θθθ] = j−1(9θθθ) • An estimate of the uncertainty of the individual parameter
estimates is obtained by decomposing the dispersion matrix as follows:
D[9θθθ] = 9σσσθ̂θθR9σσσθ̂θθ into ?σσσˆ, which is a diagonal matrix of the standard deviations ofθθθ the individual parameter estimates, and R, which is the corresponding correlation matrix. The value Rij is thus the estimated correlation between θ?i and θ?j .
Henrik Madsen 35
The Wald Statistic
A test of an individual parameter
H0 : θi = θi,0
is given by the Wald statistic:
θi − θi,0Zi = ?
θi σ?ˆ
which under H0 is approximately N(0, 1)-distributed.
Henrik Madsen 36
Example: Quadratic approximation of the log-likelihood 18 The likelihood principleL
og-l
ikel
ihood
θ
0-1
-2
-3
0.0 0.2 0.4 0.6 0.8 1.0
TrueApprox.
(a) n = 10, y = 3L
og-l
ikel
ihood
θ
0-1
-2
-3
0.0 0.2 0.4 0.6 0.8 1.0
TrueApprox.
(b) n = 100, y = 30
Figure 2.3: Quadratic approximation of the log-likelihood function.
Example 2.7 (Quadratic approximation of the log-likelihood)Consider again the situation from Example 2.1 where the log-likelihoodfunction is
l(θ) = y log θ + (n− y) log(1− θ) + const
The score function isl′(θ) =
y
θ− n− y
1− θ,
and the observed information
j(θ) =y
θ2+
n− y
(1− θ)2.
For n = 10, y = 3 and θ̂ = 0.3 we obtain
j(θ̂) = 47.6
The log-likelihood function and the corresponding quadratic approximationare shown in Figure 2.3a. The approximation is poor as can be seen in thefigure. By increasing the sample size to n = 100, but still with θ̂ = 0.3, theapproximation is much better as seen in Figure 2.3b.
At the point θ̂ = yn we have
j(θ̂) =n
θ̂(1− θ̂)
and we find the variance of the estimate
Var[θ̂]= j−1(θ̂) =
θ̂(1− θ̂)
n.
Figure: Quadratic approximation of the log-likelihood function.
Henrik Madsen 37
Likelihood ratio tests
• Want to know the distribution of D when assuming H0 (model B). • It is sometimes possible to calculate the exact distribution. This
is for instance the case for the General Linear Model for Gaussian data.
• In most cases, however, we must use following important result regarding the asymptotic behavior.
Theorem (Wilk’s Likelihood Ratio test)
The random variable D = 2(£A(e θθθB, Y )) converges in lawθθθA, Y ) − £B(eto a χ2 random variable with f = (dim(ΩA) − dim(ΩB)) degrees of freedom, i.e.,
D → χ2(f )
under H0.
Henrik Madsen 38
Flexhouse layout
Henrik Madsen 39
RC-diagram ofte used for illustrating linear models
Rie
Rie
Rea
Rea
Ti
Ti
Ta
Ta
Te
Te
Φh
Φh
Ci
Ci
Ce
Ce
Henrik Madsen 40
Model A
RiaTi
TaΦh CiAwΦs
Aw is the effective window area.
Henrik Madsen 41
Model Validation for Model A
Autocorrelation function and Periodogram for the residuals.
0 5 10 15 20 25 30 35
−1.
0−
0.5
0.0
0.5
1.0
lag
AC
F
0.0 0.1 0.2 0.3 0.4 0.5
0.0
0.2
0.4
0.6
0.8
1.0
frequency
Model is seen not to be adequate.
Henrik Madsen 42
Model E
After some steps: (Notice that eg. the electric heating system is included)
Henrik Madsen 43
Model Validation for Model E.
0 5 10 15 20 25 30 35
−1.
0−
0.5
0.0
0.5
1.0
lag
AC
F
0.0 0.1 0.2 0.3 0.4 0.5
0.0
0.2
0.4
0.6
0.8
1.0
frequency
It is concluded that the model is adequate.
Henrik Madsen 44
Continuous Time Stochastic Modelling (CTSM-R)
• The parameter estimation is performed by using the software CTSM-R.
• The software has been developed at DTU Compute • Download from www.ctsm.info (see also slides by Niamh
O’Connell) • The program returns the uncertainty of the parameter estimates
as well.
Henrik Madsen 45
and provides eg.• Likelihood testing for nested models• Calculations of smoothen state E [X t |YT ]
• Calculations of k-step predictions E [X t |Yt−k ].• Calculations of noise free simulations E [X t |Yt0 ]
The estimation procedure (CTSM-R)
CTSM-R is based on • The Extended Kalman Filter • Approximate likelihood estimation
Henrik Madsen 46
The estimation procedure (CTSM-R)
CTSM-R is based on • The Extended Kalman Filter • Approximate likelihood estimation
and provides eg. • Likelihood testing for nested models • Calculations of smoothen state E [X t |YT ]
• Calculations of k-step predictions E [X t |Yt−k ]. • Calculations of noise free simulations E [X t |Yt0 ]
Henrik Madsen 46
The continuous-discrete time stochastic state space formulation
General formulation
dx t =f (x t , ut , θθθ, t)dt + σσσ(x t , ut , θθθ, t)dw t
yk =h(x tk , utk , θθθ, ek , tk ),
where • x t is the continuous time state and yk ∈ Rl is the discrete time
observations. • ut ∈ Rr is the inputs • θθθ ∈ Rp is a parameter vector • ek ∈ Rl is a random observation error.
Henrik Madsen 47
The estimation procedure (CTSM) - Limitations
Most general set up in CTSM
dx t =f (x t , ut , θθθ, t)dt + σσσ(ut , θθθ, t)dw t
yk =h(x tk , utk , θθθ, tk ) + ek ,
where • σσσ ∈ Rn×n is a quadratic matrix, independent of the state • ek ∼ N(0, Sk (θθθ, uk )) is a Gaussian random variable.
Henrik Madsen 48
Transformation of the State Space 1
Consider the system equation
dx t =f (x t , ut , θθθ, t)dt + σσσ(x t , ut , θθθ, t)R(ut , θθθ, t)dw t ,
where R(ut , θθθ, t) ∈ Rn×n is any matrix function and σσσ ∈ Rn×n is a diagonal matrix
σii (x t , ut , θθθ, t) = σi (xi,t , ut , θθθ, t)
Henrik Madsen 49
then by Itô’s lemma zi is also an Itô process given by
dzi,t =∂
∂tψi(·, t)dt +
fi(·)σi(·)
dt −12σi(·)
nc
j=1
[R(·)]2i,j dt
+
nc
j=1
[R(·)]i,j dwj ,
where the diffusion term is now independent of the state zi .
Transformation of the State Space 2
Choose the transformation 1
dξ zi,t = ψi (xi,t , ut , θθθ, t) = ,
σi (ξ, ut , θθθ, t) ξ=xi
Henrik Madsen 50
Transformation of the State Space 2
Choose the transformation 1
dξ zi,t = ψi (xi,t , ut , θθθ, t) = ,
σi (ξ, ut , θθθ, t) ξ=xi
then by Itô’s lemma zi is also an Itô process given by
n ∂ fi (·) 1 2dzi,t = ψi (·, t)dt + dt − σi (·)
c [R(·)]i,j dt
∂t σi (·) 2 j=1
n
+ c
[R(·)]i,j dwj , j=1
where the diffusion term is now independent of the state zi .
Henrik Madsen 50
Summary
By applying the grey box modelling approach • physical/prior knowledge and information in data are combined,
ie. we have bridged the gap between physical and statistical modelling.
• many statistical, mathematical and physical methods for model validation and structure modification become available.
• parameter estimates have physical significance - seldom the case for black box models.
• we obtain more accurate predictions and more realistic prediction intervals.
• we obtain realistic simulations (ODE based models do not provide a reasonable framework for simulations)
Henrik Madsen 51
Some References
• H. Madsen: Time Series Analysis, Chapman and Hall, 392 pp, 2008.
• J.M. Morales, A.J. Conejo, H. Madsen, P. Pinson, M. Zugno: Integrating Renewables in Electricity Markes, Springer, 430 pp., 2013.
• H.Aa. Nielsen, H. Madsen: A generalization of some classical time series tools, Computational Statistics and Data Analysis, Vol. 37, pp. 13-31, 2001.
• H. Madsen, J. Holst: Estimation of Continuous-Time Models for the Heat Dynamics of a Building, Energy and Building, Vol. 22, pp. 67-79, 1995.
• J.N. Nielsen, H. Madsen, P.C. Young: Parameter Estimation in Stochastic Differential Equations; An Overview, Annual Reviews in Control, Vol. 24, pp. 83-94, 2000.
• N.R. Kristensen, H. Madsen, S.B. Jørgensen: A Method for systematic improvement of stochastic grey-box models, Computers and Chemical Engineering, Vol 28, 1431-1449, 2004.
• N.R. Kristensen, H. Madsen, S.B. Jørgensen: Parameter estimation in stochastic grey-box models, Automatica, Vol. 40, 225-237, 2004.
• J.B. Jørgensen, M.R. Kristensen, P.G. Thomsen, H. Madsen: Efficient numerical implementation of the continuous-discrete extended Kalman Filter, Computers and Chemical Engineering, 2007.
• K.R. Philipsen, L.E. Christiansen, H. Hasman, H. Madsen: Modelling conjugation with stochastic differential equations, Journal of Theoretical Biology, Vol. 263, pp. 134-142, 2010.
• P. Bacher, H. Madsen: Identifying suitable models for the heat dynamics of buildings, Vol. 43, pp. 1511-1522, 2011.
Henrik Madsen 52