kalman filter and state space models

8/11/2019 Kalman filter and State Space Models

1/24

State Space Models, Kalman Filter and Smoothing

The idea that any dynamic system can be expressed in a particular representation called the

state space representation was proposed by Kalman. He presented an algorithm (or) a set of rules

to sequentially forecast and update a set of projections of the unknown state vector.

State space representation of a dynamic system The general case

State space models were originally developed by control engineers to represent a dynamic system

or dynamic linear models. Interest normally centers on a (m1) vector of variables, called a state

vector, that may be signals from a satellite or the actual position of a missile or a rocket. A state

vector represents the dynamics of the process. More precisely, it retains all the memory in the

process. All the dependence between past and future, must funnel through the state vector. The

elements of the state vector may not have any specific economic meaning, but state space approach ispopular in economic applications involving modelling unobserved or latent variables, like permanent

income, NAIRU (Non-Acclerating Inflation Rate of Unemployment), expected inflation, state of the

economy in business cycle analysis, etc. In most cases such signals are not observable directly, but

such a vector of variables is related to a (n1) vector zt of variables that are actually observed,

through an equation called measurement equationor the observation equation, given by

zt = Atxt+ Ytt+ Nt (1)

where Yt and At are parameter matrices of order (nm) and (nk) respectively, xt is (k 1)

vector of exogeneous or pre-determined variables, andNt is (n 1) vector of disturbances which has

zero mean and covariance matrix Ht.

Although the state vector tis not directly observable, its movements are assumed to be governed

by a well defined process, called the transition equationor state equationgiven by,

t = Ttt1+ Rtt, t= 1, . . . , T . (2)

where Tt and Rt are matrices of order (mm) and (mg) respectively. t is (g 1) vector of

disturbances with mean zero and covariance matrixQt.

Remarks:

1. Note that in the measurement equation we have an added disturbances termNt.We need it iff

we assume that what we have observed is contaminated by an additional noise; otherwise we

simply have

zt = Atxt+ Ytt. (3)

1


2/24


3/24

large enough so that the dynamics of the systems can be captured by the simple first order

Markov structure of the state equation. From a technical point of view, the aim of state space

form is to set up t such that, it has as small number of elements as possible. Such a state

space set up is called a minimal realizationand it is a basic criterion for a good state space

form.

6. In many cases of interest only one observation is available in each time period that is zt

is now a scalar in the state equation. Also, the transition matrix is much simpler than given

before, in the sense, the parameters in most cases, including the variance, are assumed to be

time invariant. Thus the transition equation now becomes

t = Tt1+ Rt, t= 1, . . . , T . (12)

and

t= WN(0, 2Q). (13)

7. For many applications using Kalman filter, the vector of exogenous variables is simply not

necessary. One may also assume that the variance of the noise term is time invariant. So that,

the general system now boils down to:

zt = ytt+Nt, t= 1, . . . , T (14)

t = Tt1+ Rt, t= 1, . . . , T . (15)

zt now is a scalar and Nt (0, 2h) and yt is a (1 m) vector. In some of the state

space applications, especially those that use ARMAmodels the measurement error in the

observation equation i.e. Nt is assumed to be zero. This means thatNt in such applications

will be absent.

8. There are many ways to write a given system in state-space form. But written in any way, if

our primary interest is forecasting, we would get identical forecasts no matter which form we

use. Note also that, we can write any state space form as an ARMA model. This way, there

is an equivalence between the two forms.

3


4/24

Examples of state space representation:

Example 1: First let us consider the general ARMA(p, q) model and see how it can be cast in a

state space form. AnARMA(p, q) model can be written, by defining m =max(p, q+ 1),in the form:

zt = 1zt1+2zt2+ +mztm+1et1+2et2+ +m1etm+1+et

where we interpret j = 0 for j > p and j = 0 for j > q.

Then we can write the state and observation equation as follows:

State equation:

t =

1...

2... Im1

... ...

. . . . . . . . .

m... 0

t1+

1

1......

m1

et

Observation equation:

zt =

1 0 . . . 0t.

The original model can be easily recovered by repeated substitution, starting at the bottom row of

the state equation. We can easily note that the first element of the state vector is identically equalto the given model for zt.

Example 2: Let us consider next a univariate AR(p) process:

zt = 1zt1+2zt2+ +pztp+et

where(B) = (11B2B2 pB

p) is the AR operator and et is white noise. This could

be defined in state space form by writing the (m 1) state vector, t,wherem = pfor the present

case, as follows:State equation:

t =

1...

2... Im1

... ...

. . . . . . . . .

m... 0

t1+

1

0......

0

et

4


5/24

Observation equation:

zt =

1 0 . . . 0t.

Defining t = (1t 2t . . . mt), and substituting from the bottom row, we get the original AR

model.

Example 3: Let us consider the following ARMA(1, 1) model. For this model m = 2.

zt = 1zt1+1et1+et.

For this model the state and the measurement equations are given below:

StateEquation: t = 1 10 0

t1+ 11

et andObservationEquation: zt =

1 0

t.

If we define t = (1t 2t),then

2t = 1et

1t = 11,t1+2,t1+et

= 1zt1+1et1+et.

and this is precisely the original model.

Example 4: As a final example, we shall consider the first order moving average model, assuming

that the model has zero mean:

zt = et+1et1.

Herem = 2, so that the state and the measurement equations are given as follows:

State equation: t =

0 1

0 0

t1+

1

1

et and

Observation equation: zt =

1 0t.

If we define t = (1t 2t), then2t = 1et and1t = 2,t1+ et = et+ 1et1 and this is precisely

the original model.

5


6/24

We have seen before that there are many ways of writing a given system in state space form. We

shall here give an example of writing the AR(p) process in a different way.

Example 5: As before let m = p.The state equation is given as: State Equation:

zt

zt1...

ztp+1

t

=

1 2 . . . p1 p1 0 . . . 0 0...

...

0 . . . . . . 1 0

TTT

zt1zt2

...

ztp

t1

+

10...

0

RRR

et

Observation Equation:

(zt) =

1 0 . . . 0

yyyt

zt

zt1...

ztp+1

t

In this case, by carrying out the matrix multiplication on RHS of the state equation, we cannotice that the first row gives the originalAR model and the rest are trivial identities, including the

observation equation.

Example 6: Let us take the ARMA(p, q) that we have seen before:

zt = 1zt1+2zt2+ +mztm+1et1+2et2+ +m1etm+1+et

where we interpret j = 0 for j > p and j = 0 for j > q. We shall re-write it in a way different

from what we saw in Example 1. Letm=max(p, q+ 1). Then we can write the state equation andobservation equation as follows:

State Equation:

t+1=

1 2 . . . m1 m

1 0 . . . 0 0...

...

0 0 . . . 1 0

t+

et+1

0...

0

6


7/24

Observation Equation:

zt= +

1 1, . . . , m1t.

We shall take theARMA(1, 1) model and see how to write the state space form as given Example

6 and retrieve the original model. For ARMA(1, 1), m = 2. So the state and the observation

equations are:

StateEquation:

t+1 =

1 0

1 0

t+

1

0

et+1,

ObservationEquation:

zt = +

1 1t.

Starting from the second row of the state equation, we have

2,t+1= 1,t.

First row of state equation implies that

1,t+1 = 11,t+et+1

or

11B

1,t+1 = et+1. . . . (1)

Observation equation states that

zt = +1,t+12,t

= +1,t+11,t1

= +

1 +1B

1,t . . . (2)

Multiply (2)

11B

to give:

11B

zt

=

11B

1 +1B

1,t

=

1 +1B

et [from (1)]

= (which is the given model)

Example 7: Let us take an example of state space formulation for an economic problem. Fama

and Gibbons, Journal of Monetary Economics, 1982,9,pp.297-32 use the state space idea

7


8/24

to study the behaviour ofex-antereal interest rate (defined as the nominal interest rate, it, minus

the expected inflation rate,et .) This is unobservable because we do not have data on the anticipated

rate of inflation. Thus, the state variable is:

t = itet ,

where is the average ofex-antereal interest rate. Fama and Gibbons assume that the ex-antereal

interest rate follows the AR(1) process:

t+1= t+et+1.

But an econometrician has data on ex-postreal interest rate (that is, nominal interest rate, it minus

the actual rate of inflation,t.) That is,

itt = itet et t= +t+t,

wheret = et t, is the error agents made in forecasting inflation. If people forecast optimally,

thent should be free of autocorrelation and should be uncorrelated with ex-antereal interest rate.

Kalman Filter An Overview

Consider the system given by the following equations:

zt = ytt+Nt, t= 1, . . . , T

t = Tt1+ Rt, t= 1, . . . , T .

Given this, our objectives could be either to obtain the values of unknown parameters or, given

the parameter vectors, we may be aiming to obtain the linear least squares forecasts of the state

vector on the basis observed data. Kalman filter (KF here after) has many uses. We are utilising

it as an algorithm to evaluate the components of the likelihood function. Kalman filtering followsa two-step procedure. In the first step, the optimal predictor for the nextobservation is formed,

based on all the information currently available. This is done by the prediction equation. In the

second step, the moment a new observation becomes available, it is then incorporated into the

estimator of the state vector using the updating equation. These two equations collectively form

the Kalman filter equations. Applied recursively, the KF provides an optimal solution to the twin

problems of prediction and updating. Assuming that the observations are normally distributed and

also assuming that current estimator of the state vector is the bestavailable, the prediction and the

8


9/24

updating estimators are the best. By best, we mean the estimators have the minimum mean squared

error (MMSE). It is very evident that the process of predicting the next observation and updating it

as soon as the actual value becomes available, has an interesting by product the prediction error.

And we have seen in the chapter on estimation, that how a set of dependent observation can be

decomposed in terms of the prediction errors. KF gives us a natural mechanism to carry out thisdecomposition.

Kalman filter recursions Main equations

We shall useatto denote theMMSEestimator oftbased on all information up to and including

the current observation zt. Similarly, we have at/t1 as the MMSEestimator oft at time t1.

That is,at|t1= E(t|It1).

Prediction:

At timet1,all available information, includingzt1is incorporated inat1,which is theMMSEestimator oft1. The prediction error has a covariance matrix of

2ePt1.More precisely,

2ePt1= E

(t1 at1) (t1 at1)

From

t = Tt1+ Rt,

we get that at time t 1,the MMSEestimator oft is given by

at/t1= Tat1

so that the estimation error or the sampling error is given by

t at/t1

= T (t1 at1) + Rt.

The right hand side of this estimation error has zero expectations. We have to note here that an

estimator isunconditionally unbiased (u-unbiased) if its estimation error has zero expectations. And

when an estimator is u-unbiased its MSE matrix, E

t at|t1

t at|t1

is identical to the

covariance matrix of the estimation error, at|t1 t at|t1 t .And hence we can write thethe covariance of the estimation error as:

E

t at/t1

t at/t1

= E

T (t1 at1) + Rt

T (t1 at1) + Rt

= TE

(t1 at1) (t1 at1)

T + TE

(t1 at1)

t

R +

RE

t(t1 at1)

T + RE

t

t

R

= 2TPt1

T +2RQR.

9


10/24

Thus,

t at/t1

W S

0, 2Pt/t1

where

Pt/t1 = TPt1T + RQR

where WS stands for wide sense. Weak stationarity is sometimes referred to as wide sense

stationarity.

Now, given thatat/t1 isMMSEoft at timet 1,the MMSEofzt at time timet 1 clearly

is,

zt/t1= ytat/t1.

The associated prediction error is

zt zt/t1

= t= y

t

t at/t1

+Nt

the expectation of which is zero. Hence,

vart= E(2t) = E

ytt at/t1

t at/t1

yt

+E(N2t)

[since cross product terms have zero expectations]

= 2ytPt|t1yt+2h= 2ft

Deriving the state updatingequations is involved and hence the important steps are relegated to

the appendix and we state only the main equations below:

Updating equation

at = at|t1+ Pt|t1yt zt ytat|t1 /ft.And the estimation error is said to be

(t at) WS(0, 2Pt)

where

Pt = Pt|t1 Pt|t1ytytPt|t1/ft where ft = y

tPt|t1yt+h.

We have to highlight the following points.

10


11/24

1. Note the role played by the prediction error, t =

zt ytat|t1

and the variance associated

with it,2ft.

2. And note also the term, (m1) vector,

Pt|t1yt/ft

,which is called the Kalman gain.

3. In the discussion so far, we have assumed the presence of an additional noise in the measurement

equation; that is, h > 0. But we also have to note that, in our examples of state space

representation of ARMA models, we have assumed that the measurement equation has no

additional error. That is,Ntis assumed to be zero, implyingh,the variance of the measurement

error term, will be zero. However this should not matter, since through these adjustments note

that we have isolated h as an additive scalar, which when becomes zero, does not affect our

calculations. (Note the expression for ft.)

ML Estimation ofARMA models

Literature has many algorithms aimed at simplifying the computation of the components of

the likelihood. One approach is to use the Kalman filter recursions. Other useful algorithms are

by Newbold (Biometrica, 1974, Vol.61, 423-26) and the innovations algorithm, suggested by

Ansley (Biometrica, 1979, Vol.66,59-65).

KF recursions are useful for a number of purposes. But our emphasis will be on understanding

how these recursions (1) can be used to construct linear least squares forecasts of the state vector on

the basis of data observed through time t, and (2) use the resulting prediction error and its variance

to build the components of the likelihood function. In our derivation so far, we have motivated the

discussion on Kalman filter in terms of linear projection of the state vector, tand the observed times

serieszt.These are linear forecasts and are optimal among any function, if we assume that the state

vector and the disturbances are multivariate Gaussian. Our main aim is to see how KF recursions

calculate these forecasts recursively, generating a1 | 0, a2 | 1, . . . , aT| T1,and P1|0, P2|1, . . . , PT|T1 in

succession.

How do we start the recursions?

To start the recursions, we need to get a1|0.This means we should get the first period forecast of

based on an information set. Since we dont have information on the zeroth period, we take the

unconditional expectation as

a1|0 = E(1) ,

where the associated estimation error has zero mean and covariance matrix 2P1|0

.

11


12/24

Let us explain this with the help of an example.

Example 8: Let us take the simplest M A(1) model.

zt = et+1et1

We have shown before that the state vector is simply

t=

zt

1et

and hence

a1|0= E

z1

1e1

=

0

0

.

And the associated variance matrix of the estimation error, 2P0 or 2P1/0, is simply E(11), so

that we have,

P1|0 = 2E(1

1)

= 2E

z1

1e1

z1 1e1

=

1 +21 1

1 21

While one can work out by hand the covariance matrix for the initial state vector for pure MA

models, this turns out to be too tedious for higher order mixed models. So, we need a closed form

solution to calculate this matrix. We get such a solution by generalising this. Generalisation is easy

if we can make prior assumptions about the distribution of the state vector.

Two categories of state vector can be distinguished depending on whether or not the state vector

is covariance stationary. If it is so, then the distribution of the state vector is readily available; and

with that the problem of starting values can be easily resolved. With the assumption that the state

vector is covariance stationary, one can easily check from the state equation that the unconditional

mean of the state vector is zero. That is, from the state equation, one can easily see that

E(t) =0,

and the unconditional variance oft is easily seen to be,

E(t

t) =E(Tt1+ Rt) (Tt1+ Rt)

12


13/24

Let us denote theLHSof the above expression as .Noting that the state vector depends on shocks

only up to t1,we get

= TT + RQR

Though this can be solved in many ways, a direct closed form solution is given by the following

matrix lemma.

We use the vec operator and use the following result.

Proposition: LetA, BandC be matrices such that the product ABC exists. Then

vec

ABC

=

C A

vec

B

.

Thus, we vectorize both sides of the expression for and rearrange to get a closed form solution

as,

vec() =

Im2(T T)

1vec(RQR)

What this implies is that, provided the process is covariance stationary, Kalman filter recursions

can be started with

a1 | 0

= 0,and the (m m) matrixP1 | 0, whose elements can be expressed as

a column vector, is obtained from:

vecP1|0= Im2(T T)1

vec(RQR)

The best way to get a grasp of the Kalman recursions, is to try them out on a simple model. Let

us try them on the simple M A(1) model.

Example 9: Assume for convenience that the process has zero mean. So, the MA(1) model can

be written as,

zt = et+1et1.

Herem = 2.So fromExample 3,we have the state and the measurement equations given as follows:

State equation: t =

0 1

0 0

t1+

1

1

et and

Observation equation: zt =

1 0t.

Note that the observation equation has no error. How do we start the recursions? Recall from the

prediction equation that we have to first get at|t1

. That is, for the first period, we need to get

13


14/24

a1|0,the initial state vector. From our discussion about covariance stationary properties of the state

vector, it is clear that that

a1|0 = Ta0 = 0.

Next we have to calculate the matrix of the estimation error, i.e. 2P1|0 or 2P0. Though we have

a formula to calculate the such matrices, for the present problem one can find it directly:

P1|0 = P0= 2 E

1

1

= 2E

z1

1e1

{z1 1e1}

=

(1 +21) 1

1 21

.

Let us calculate the prediction error for z1.One can easily see that z1|0 = 0,and hence the associated

prediction error 1 = z1 itself and the prediction error variance is given as:

var (1) = 2

1 0

2E

1

1

10

= 2 [1 0]

1 +21 1

1 21

1

0

= 2(1 +21), with f1 = (1 +

21).

Application of the updating formula:

a1 =

(1 +21) 1

1 21

1

0

z1

(1 +21)

=

(1 +21)z1

z11

(1 +21)

=

z1

z11

(1 +21)

14


15/24

Similarly,

P1 =

(1 +21) 1

1 21

(1 +21) 1

1 21

1

0

1 0

(1 +21) 1

1 21(1 +21)

=

0 0

0 41

(1 +21)

Prediction equation for2:

a2|1 = Ta1

= 0 1

0 0 z1

z11

(1 +21)

=

z11

(1 +21)

0

.

And,

P2|1 =

0 1

0 0

0 0

0 41(1 +21)

0 0

1 0

+

1 1

1 21

=

41(1 +21) 00 0

+ 1 1

1 21

=

(1 +21+41)

(1 +21) 1

1 21

.

Predictingz2:

z2 = 1 0 z11(1 +21)

0

= z11

(1 +21)

Prediction error2:

2 = z2z11

(1 +21),

15


16/24

and

f2 =

1 0

(1 +21+

41)

(1 +21) 1

1 21

1

0

= (1 +21+41)(1 +21) 1 10

= (1 +21+41)/(1 +

21)

These steps show that, for the M A(1) model, one can calculate the prediction error and its variance

using the following recursions:

t = zt1t1

ft1, t= 1, 2, . . . , T , where 0= 0, and

ft = 1 + 2t

11 +21+ +

2(t1)1

Note here that the expressions for the prediction error t and the prediction error variance ft are

exactly the same as those obtained using triangular factorization for the M A(1) model.

-

As a final step towards finalising the likelihood function, we shall note the following further

simplification. Recall that we had decomposed the likelihood for set of dependent observations, into

a likelihood for the independent errors, using the concept of prediction error decomposition, as:

logL(z) =T

2log2

T

2log2

1

2

Tt=1

logft1

22

Tt=1

2t /ft.

From our derivation, we can see that the t and ft do not depend on2 and hence we can concen-

trate2 out. This means, we have to differentiate the log-likelihood with respect to 2 and get an

estimator for 2,say, 2.So we get,

2 = 1

T

T

t=12tft

.

Evaluating the log-likelihood in terms of2 = 2 and simplifying, we get

Log L (z)c =T

2

log2+ 1

1

2

Tt=1

log ftT

2log 2.

We either maximize this log likelihood or minimize,

Log L (z)c=T

t=1

log ft+Tlog 2.

16


17/24

One can make an initial guess about the underlying parameters and either apply the numerical

estimation procedures to calculate the derivatives or analytically calculate the derivatives by differ-

entiating the Kalman recursions. In either case one has to keep in mind the restrictions to be imposed

on the parameters, especially on the M Aparameters, to take care of the identification problem. Also,

it has been proved in the literature, that using Kalman recursions to estimate pure AR models isreally not necessary.

-

Kalman Smoothing

We have motivated the discussion on kalman filter so far as an algorithm for predicting the

state vector, obtaining exact finite sample forecasts, as a linear function of past observations.

We have also shown, how the resulting prediction error and the prediction error variance, can

be used to evaluate the loglikelihood.

This is sub-optimal if we are interested in estimating the sequence of states. In many cases,

kalman filter is used to obtain an estimate of the state vector itself. For example, in their model

of the business cycle, Stock and Watson show how one may be interested in knowing the state

of the economyor the phase of the business cycle the economy is in, which is unobservable

at any given historical point. Stock and Watson suggest that comovements in many macro

aggregates have a common element, which may be called the state of the economy and this is

unobservable. They motivate the use of kalman filter to obtain an estimate of this unobserved

state of the economy.

Sometimes elements of the state vector are even interpreted as estimates of missing observations,

which could be higher frequency data points from an observable lower frequency one or simply

an estimate of missing data point. For example, if we have data on a macro aggregate from

1955 through 2104,we may interested in obtaining an estimate of 1970 which may be missing.

Or, we may be interested in extracting monthly data from quarterly data.

Such estimates of the unobserved state of the economy or missing observations can be obtained

fromsmoothed estimatesof the state vector, t.

Each step of the kalman recursions gives an estimate of the state vector,t,given all current and

past observations. But an econometrician should use all available information to estimate the

sequence of states. Kalman smoother provides these estimates. The only smoothed estimator

which utilises all the sample observations is given by

17


18/24

at|T = E(t|IT)

and the M SEof this smoothed estimate is denoted

Pt|T = E

(t at|T)(t at|T)

.

The smoothing equations start from at|T andPt|Tand work backwards.

The expressions for at|T and Pt|T, which may be called the smoothing algorithm, are given

below without proof:

at|T = at+ Ptat+1|T Tt+1at

Pt|T = Pt+ Pt

Pt+1|T Pt+1|t

P

t

where

Pt = PtTt+1P

1t+1|t, t= T1, . . . , 1

with aT|T =aT and PT|T =PT.

A set of direct residuals can also be obtained from the smoothed estimators.

et= zt ytat|T, t= 1, . . . , T

This is not to be confused with the prediction residuals, t, defined earlier.

-

We shall explain the smoothing algorithm with an example. Consider the simple model

zt = t+t, t W N(0, 2)

t = t1+t, t W N(0, 2q)

where the state,t, and the observation, zt, are scalars. The state, which follows a random walk

process, cannot be observed directly as it is contaminated by noise. This is the simple signal plus

noise model. We assume that q is known. Also note that in this example we have allowed the

observation ztto be measured with error,

t. For this example, note that T = 1, R= 1 andy

t= 1.

18


19/24

The prediction equations for this example are

at|t1 = at1, Pt|t1 = Pt1+q

and the updating equations are

at = at|t1+Pt|t1(ztat|t1)/(Pt|t1+ 1)

and

Pt = Pt|t1P2t|t1/(Pt|t1+ 1)

We shall demonstrate how to predict, update and smooth with 4 observations: z1 = 4.4, z2 =

4, z3 = 3.5 and z4 = 4.6. The initial state vector has the property, 0 N(a0, 2P) and we have

been given that a0 = 4, P0 = 12 and q= 4 so that RQR

= 4 andh = 1.

From the prediction equation we have a1|0 = 4, and P1|0 = 16, so that from the updating

equations we have,

a1 = 4 + (12 + 4)(4.44)/(12 + 4 + 1) = 4.376

and

P1 = 16162/17 = 0.941

Since yt = 1 in the measurement equation for all t, MM SLE ofzt is always at|t1. So, z2|1 =

a2|1= a1= 4.376.

Repeating the calculations for t = 2, 3 and 4,we get the following results:

Smoothed estimators and residuals

t 1 2 3 4

zt 4.4 4.0 3.5 4.6

at 4.376 4.063 3.597 4.428

Pt 0.941 0.832 0.829 0.828

t 0.400 -0.376 -0.563 1.003

at|T 4.306 4.007 3.739 4.428

Pt|T 0.785 0.710 0.711 0.828

et 0.094 0.007 -0.239 0.172

19


20/24

From the above table we also have: a2|1= 4.376, P2|1= 4.941, a3|2 = 4.063, P3|2 = 4.832, a4|3=

3.597 and P4|3= 4.829.

From the table, the final estimates are seen to be a4= 4.428 andP4 = 0.828.

These values can now be used in the smoothing algorithm. And the algorithm, for the current

example reduces to,

at|T = at+Pt/Pt+1|t

at+1|Tat

Pt|T = Pt+

Pt/Pt+1|t2

Pt+1|TPt+1|t

, t= T1, . . . , 1

Since a4|4 = a4 and P4|4 = P4, we can apply the smoothing algorithm to obtain smoothed

estimates for a3|4 andP3|4 and work backwards. So we have

a3|4 = 3.597 + (0.829/4.829)(4.4283.597) = 3.379

P3|4 = 0.829 + (0.829/4.829)2(0.8284.829) = 0.711

The rest of the smoothed estimates have been displayed in the table above.

The smoothed estimates of the unobserved state vector is displayed by the row at|Tin the table

above.

Both the direct and the prediction error residuals have been calculated using the formulae,

et = ztat|T andt = ztat1 respectively.

20


21/24

Appendix

Derivation of updating equations

In this Appendix we shall derive the important steps leading to the updating equation and the

associated variance matrix of the estimation error. Before discussing the steps involved, we shall

digress a bit to delve into the following important material.

1. Consider the model:

Z(T1)

= Y(Tm)

(m1)

+ N(T1)

, N

0, 2

.

We shall call this model the sample information.

(a) Case 1: If is fixed in the above model, we have the usual GLSestimator given as:

Y1YY1Zand this would beBLUE.

(b) Case 2: Suppose the vector is either partially or fully random or stochastic. The

question now here, is the GLSestimator stillBLUE? The answer is it still is according to

theextendedGauss-Markov theorem, enunciated by Duncan and Horn, JASA, 1972,

pp.815-21. They proved that the GLSestimator now satisfies a condition called best,

linear, unconditionally unbiased (or u-unbiased) estimator.[An estimator is u-unbiased if

its estimation error has expectation zero.]

(c) Case 3: Suppose that is still fully or partially random. Additionally suppose that

we have some prior information about it. How can we use it to update the estimator

of already obtained? This becomes a special case of themixed estimationprocedure

developed by Theil and Goldberger (see Theil, Principles of Econometrics, pp.347-

52) where we incorporate such prior information with the sample information. Suppose

in our case, the prior information is given in the form given below:

(0 )

0, 2P0

,

where 0 is a known vector and P0 is a known positive definite matrix. Then to get an

updatedestimator, that combines this prior information with the sample information, we

first construct the augmented model:

0

Z

=

I

Y

+

0

N

21


22/24

More concisely,

Z = Y + N, where, E(N) =0, and

E

NN

= 2V= 2

P0 0

0

.

Using the extended Gauss-Markov theorem, we have the estimator ofgiven as:

=

YV1Y1

YV1Z.

Using the original notations, this can be re-written as:

= P

P10 0+ Y1Z

where P=

P10 + Y

1Y1

.

is now the updatedMMSEof,with

( ) 0, 2PWe are going to use this principle of combining sample information and prior information in deriving

our updating equation of the KF recursion.

Updating the state vector

The role of the updating vector is to incorporate the new information in zt the moment we

are at timet with the information already available in the estimator at/t1.This problem is directly

analogous to the one that we discussed under the extended Gauss-Markov theorem and Theils mixed

estimation procedure, where prior information was combined with the sample information. For our

case, the prior information is in

t at/t1

0, 2Pt/t1

,

while the sample information is derived from the measurement equation. Thus the augmented model:

at/t1 = t+ at/t1 t

zt = ytt+Nt.

In matrix notation, at/t1

zt

=

I

yt

t+

at/t1 t

Nt

.

The disturbance term has zero expectations and covariance matrix,

E

at/t1 t

at/t1 t

Nt

Nt

= 2

Pt/t1 0

0 h

.

22


23/24

More precisely,

Zt= ytt+et,

whereE(et) is zero and E(etet) = 2V,where

V= Pt/t1 0

0 h .

Now, using the extended Gauss-Markov theorem, we can write

at=

ytV1Yt

1ytV

1Zt

Using the original notations, we can re-write the expression for at as follows:

at = PtP1t|t1 at| t1+ ytzt/h

where

Pt =

P1t| t1+ yty

t/h

Thusat t

0, 2Pt

The updating formula can be put in a different way using a matrix inversion lemma. The advan-

tage in such an adjustment is that we dont have to invert any matrix in the updating equations.

Lemma:

For any (nn) matrix,D,defined by

D=A + BCB

1,

whereA and C are non-singular matrices of order n and m respectively and B is (nm) ,then we

have:D= A1 A1B

C1 + BA1B

1BA1

We can use this lemma on the expression for Pt by noting that Pt = D, P1t|t1 = A, yt = B

and C = h1 and it follows that,

Pt = Pt|t1 Pt|t1ytytPt|t1/ft where ft = y

tPt|t1yt+h.

23


24/24

One can make it even more compact by writing

at =

Pt|t1 Pt|t1ytytPt|t1/ft

P1t|t1 at| t1+ ytzt/h

= at|t1+ Pt|t1yt

zt/h y

tat|t1/ft y

tPt|t1ytzt/fth

= at|t1+f

1t Pt|t1yt ztft/h ytat|t1 ytPt|t1ytzt/h .

Substituting for fth = ytPt|t1yt in the above term and re-arranging, we get

at = at|t1+ Pt|t1yt

zt ytat|t1

/ft.

Note that the expressions for at andPt in this appendix are exactly the ones we have used as

updatingand the variance matrix of the estimation error respectively in the main text.

Note also that in the discussion so far, we have assumed the presence of an additional noise

in the measurement equation; that is, h > 0. If we dont, then, note that V would become

singular. But we also have to note that, in our examples of state space representation ofARMA

models, we have assumed that the measurement equation has no additional error. However this

should not matter, since through these adjustments, note that we have isolated the variance

component as an additive scalar, which when becomes zero, does not affect our calculations.

kalman filter and state space models

Documents