time series models with a common stochastic variance for

Time series models with a common stochastic variance

for analysing economic time series

Siem Jan Koopman∗ and Charles S. Bos †

Department of Econometrics, Free University Amsterdam

and Tinbergen Institute Amsterdam

October 21, 2002

Abstract

The linear Gaussian state space model for which the common variance is treated as a

stochastic time-varying variable is considered for the modelling of economic time series. The

focus of this paper is on the simultaneous estimation of parameters related to the stochastic

processes of the mean part and the variance part of the model. The estimation method

is based on maximum likelihood and it requires the subsequent uses of the Kalman filter

to treat the mean part and sampling techniques to treat the variance part. This approach

leads to the evaluation of the exact likelihood function of the model subject to simulation

error. The standard asymptotic properties of maximum likelihood estimators apply as a

result. A Monte Carlo study is carried out to investigate the small-sample properties of the

estimation procedure. We present two illustrations which are concerned with the modelling

and forecasting of two U.S. macroeconomic time series: inflation and industrial production.

Keywords: Autoregressive integrated moving average; Importance sampling;

Kalman filter; Monte Carlo simulation; Simulation smoothing;

State space; Stochastic volatility; Unobserved components time

series.

JEL classification: C15, C32, C51, E23, E31

∗Department of Econometrics, De Boelelaan 1105, NL-1081 HV Amsterdam. Email: [email protected].†visiting Nuffield College, Oxford OX1 1NF, UK. Email: [email protected].

1 Introduction

The linear Gaussian state space model has become one of the standard modelling frameworks for

the empirical analysis of economic time series. It is also used for forecasting and signal extraction

in other fields such as engineering (from where it originates), statistics and empirical finance. The

role of the state space framework for modelling macroeconomic time series and analysing business

cycles is discussed in the textbooks of Harvey (1989), Hamilton (1994) and Kim and Nelson (1999).

Other discussions of state space approaches to time series analysis can be found in the books of

Brockwell and Davis (1987) and Shumway and Stoffer (2000). A recent account of the statistical

analysis of the linear Gaussian state space model together with several non-Gaussian and nonlinear

extensions is given by Durbin and Koopman (2001).

We propose a moderate generalisation of the standard linear Gaussian model: the common

variance of the state space model is allowed to be stochastic and time-varying. Let us consider the

stochastic local level model which is the simplest example of a state space model and is given by

yt = αt + εt, αt+1 = αt + ξt, t = 1, . . . , n, (1)

with observation yt and unobservable level αt which is modelled as a so-called random walk process.

The disturbances are normally distributed with

εt ∼ N (0, σ2), ξt ∼ N (0, σ2q2), t = 1, . . . , n,

where variance σ2 > 0 and signal-to-noise ratio q ≥ 0 are fixed and unknown. Further they are

serially and mutually uncorrelated for t = 1, . . . , n. For simplicity we assume that the random

walk process is initialised by a fixed and known value for α1. The forecast function of the local

level model is the exponentially weighted moving average recursion where the discount coefficient

only depends on the signal-to-noise ratio q. The common variance is represented by σ2. The

novelty is introduced by relaxing the assumption of σ2 being fixed and instead allowing it to be

stochastic and time-varying. Specifically, we replace σ2 by σ2t for which a time series process will

be formulated. For example, an autoregressive model for log σ2t can be taken with the specification

σ2t = exp(ht), ht+1 = (1− φ)d+ φht + σηηt, t = 1, . . . , n, (2)

with constant d = E(ht), autoregressive parameter |φ| < 1 and standard deviation ση > 0. The

disturbances ηt ∼ N (0, 1) are assumed to be serially uncorrelated and mutually uncorrelated with

the local level model disturbances εt and ξt at all time points. The initial value of the autoregressive

process is distributed as h1 ∼ N{d, σ2η/(1− φ2)}. A special case of the local level model (1) with

the common stochastic variance specification (2) is obtained by imposing the restriction q = 0 (so

αt+1 = αt = α) which reduces (1) and (2) to the basic stochastic volatility (SV) model which is

considered by, for example, Taylor (1994), Harvey, Ruiz, and Shephard (1994), Danielson (1994),

2

Jacquier, Polson, and Rossi (1994), Shephard and Pitt (1997) and Sandmann and Koopman

(1998). In this paper, however, we show that the stochastic level αt of the model with q > 0 can

be treated simultaneously with the common stochastic variance σ2t in a statistical analysis based

on maximum likelihood estimation.

Related models for this class of linear dynamic models with a common stochastic variance have

been proposed in the literature. An example is the contribution of Shephard (1994) that discusses

simultaneous inference of parameters related to stochastic mean and variance equations. The so-

called local scale model of Shephard does not consider the common variance as stochastic but it

treats the measurement variance as a stochastic variable. Further, the stochastic specification of

the variance is based on the gamma-beta transition model and the estimation method is different

from the maximum likelihood method presented in this paper. Other related contributions are

presented by Nabeya and Tanaka (1988) who consider a linear regression model with the constant

replaced by a random walk process and with the variances replaced by a deterministically time-

varying common variance for both the regression disturbance and the random walk innovation,

by Engle and Smith (1999) who present a model where the Stochastic Permanent Break (STOP-

BREAK) process mingles transitory shocks and permanent shifts randomly in a local level model

with a single GARCH error process, and by Bos, Mahieu, and van Dijk (2000) who carry out a

Bayesian analysis of a local level model with different stochastic and deterministic specifications

of variance processes for exchange rates.

Our proposed generalisation of the linear Gaussian state space model is partly motivated by

the fact that economic time series such as inflation, interest rates, production indices and other

monetary and financial series can be subject to time-varying heteroskedasticity that is potentially

difficult to model explicitly. A standard solution for dealing with heteroskedasticity in econometrics

is to specify a deterministic function of an exogenous variable, for example,

σ2t = σ2 exp(xt), t = 1, . . . , n,

where xt is an exogenous variable. More generally, a linear equation of explanatory variables can

be used to obtain

σ2t = exp(γ0 + γ1x1,t + . . .+ γkxk,t), t = 1, . . . , n,

where γ0, . . . , γk are fixed and unknown coefficients and x1,t, . . . , xk,t are exogenous explanatory

variables. The coefficients can be estimated by numerically maximising the loglikelihood function

of the underlying model. In practice it is hard to select appropriate exogenous variables that can

give an adequate description of the heteroskedasticity. Furthermore this solution is clearly not

practical for forecasting when no future values of explanatory variables are available.

The inclusion of a common stochastic variance can be regarded as a limited extension of the

local level model since the stochastic model for the common variance applies to all variances

3

associated with the disturbances of the local level model. However, the proposed generalisation

may be interesting for the following reasons. Firstly, many models used in practice including

the autoregressive integrated moving average model and the dynamic regression model have state

space representations with a single disturbance variable. In such cases the supposed restriction

does not exist. This also applies to the local level model (1) with a single disturbance (that is,

ξt = qεt for t = 1, . . . , n) as proposed by Ord, Koehler, and Snyder (1997). Secondly, when

models with multiple disturbances are considered such as the local level model (1), it may not be

straightforward to attribute the heteroskedasticity to a particular disturbance in practice. In such

cases it is reasonable to let the common variance of the model to be stochastic and time-varying.

The state space model with a common stochastic variance poses an estimation problem since

the loglikelihood function is not tractable by linear methods such as the Kalman filter due to

the nonlinearities caused by the stochastic variance equation. Similar considerations apply to

stochastic volatility models. Advanced but practical simulation methods have been developed

to compute the loglikelihood function for such models; see, for example, the references to the

stochastic volatility literature given earlier in this section. However, in the case of stochastic

volatility models, estimation is only considered for parameters related to the stochastic variance

and the deterministic mean equations. In this paper we consider the simultaneous estimation of

parameters related to both the stochastic mean and common stochastic variance equations by the

method of maximum likelihood. The key to the development in this paper is the notion that we

can isolate the common variance in the statistical treatment of the mean equation. Subsequently,

the common variance can then be treated as a stochastic variable and modelled separately as a

result. In the case of the local level model with a common stochastic variance, the new results

imply that we can consider the case of q > 0, rather than q = 0, and we are able to estimate q

simultaneously with parameters φ and σξ associated with the model of σt.

The extension of this model lead to a nonlinear ”weighted” Kalman filter in which the weights

associated with the observations yt are determined by the model for the common stochastic vari-

ance. This becomes clear when we write the local level model (1) with a common stochastic

variance as

yt = σt(α∗t + ε∗t ), α∗

t+1 = α∗t + ξ∗t ,

where α∗t = αt/σt, ε∗t = εt/σt and ξ∗t = ξt/σt such that ε∗t ∼ N (0, 1) and ξ∗t ∼ N (0, q2). Thus the

local level model for the weighted observations yt/σt has the constant q as the signal-to-noise ratio

and therefore the same forecast function as for model (1) with constant variances. However, the

loglikelihood function for observation y1, . . . , yn generated by model (1) with common stochastic

variance (2) depends on σt for t = 1, . . . , n. The estimation of q is therefore subject to the model

specification for σt. Further this leads to observation weights of forecast functions that change

stochastically over time as a function of σt. The same conclusion applies to observation weights

4

of estimated components such as for the level component αt

The remaining paper is organised as follows. The next section introduces the general model and

discusses some examples of models of interest in economics, finance and other fields of empirical

research. The estimation methodology is discussed in section 3. A Monte Carlo study is carried

out to investigate the small sample properties of the estimation method and the encouraging

results of this study are reported in section 4. In an empirical illustration in section 5 we show

that a joint stochastic model for the mean and the variance equations may provide a solid basis

for analysing monthly time series of U.S. inflation and of U.S. industrial production. The final

section discusses the contribution of this paper and it discusses further extensions which we regard

as future research.

2 Time series model with a common stochastic variance

2.1 General specification

The linear Gaussian state space model can be formulated with the inclusion of a common fixed

variance σ2, that is

yt = ct + Ztαt +Gtεt, εt ∼ N (0, σ2Irε), σ2 > 0,

αt+1 = Ttαt +Htεt, t = 1, . . . , n,(3)

where yt is a p× 1 vector of observations, ct is a p× 1 vector of fixed effects, αt is a mα × 1 vector

of unobserved states and εt is a rε × 1 vector of disturbances. The system matrices Zt, Tt, Gt

and Ht are assumed to be fixed for all time points t = 1, . . . , n. Unknown elements of the system

matrices will be treated as parameters to be estimated by the method of maximum likelihood.

The fixed effects vector ct is given by ct = c+Xc,tδc where c is a p×1 vector of constants while the

regression coefficients associated with the p×kc matrix Xc,t of explanatory variables are collected

in the kc × 1 vector δc. Initially we assume that α1 ∼ N (a1, σ2P1) where a1 and P1 are known;

later we will consider cases where elements of α1 are generated by a diffuse density. Similar state

space formulations are discussed in De Jong (1989).

The first equation is the observation equation in which the p×mα matrix Zt selects (or weights)

the appropriate elements of the state vector αt relevant for elements in yt. Similarly, the p × rε

matrix Gt selects the appropriate elements of the disturbance vector εt. The second equation is

the state equation with the mα ×mα transition matrix Tt and the mα × rε disturbance selection

matrix Ht. For many practical time series models the system matrices are time-invariant.

The time series state space model (3) can be extended by replacing σ2 with the time-varying

stochastic variable σ2t so that

εt ∼ N (0, σ2t Ir), σ2

t = exp (ht) , t = 1, . . . , n, (4)

5

where ht is a scalar and treated as the common log-variance of the model. The time series process

of the stochastic variable ht is formulated by the state space representation

ht = dt +Atβt + Ctηt, ηt ∼ N (0, Irη),

βt+1 = Btβt +Dtηt, t = 1, . . . , n,(5)

where scalar dt is a fixed effect, βt is an mβ × 1 vector of unobserved states for the log-variance

equation and ηt is the rη × 1 vector of disturbances. The log-variance system vectors At and Ct

and system matrices Bt and Dt are assumed to be fixed for t = 1, . . . , n. In section 3 we will show

how unknown elements in these system matrices can be estimated by the method of maximum

likelihood. The fixed effect dt can be modelled in the same way as ct but with specification

dt = d+Xd,tδd where d is a constant, Xd,t is a 1×kd vector of explanatory variables and δd is the

kd×1 regression coefficient vector. The initial log-variance state vector is given by β1 ∼ N (b1, Q1)

where initially b1 and Q1 are assumed known.

2.2 Gaussian linear model with fixed time-varying variances

Let us first consider the time series model (3) where the common variance is deterministic, that is

σ2t = exp(d+Xd,tδd) for t = 1, . . . , n. A special case of the general model is the regression model

that is considered by Nabeya and Tanaka (1988) and is given by

yt = Xc,tδc + σt(µt + εt), εt ∼ N (0, σ2ε),

µt+1 = µt + ξt, ξt ∼ N (0, σ2ξ ),

(6)

with stochastically time-varying constant µt and deterministically time-varying common standard

deviation σt for t = 1, . . . , n. They used this model for developing asymptotic tests for parameter

constancy over time. This special case of the linear Gaussian state space model can be analysed

using the Kalman filter and the associated methods of maximum likelihood estimation, diagnostic

checking, signal extraction and forecasting; see, for example, Harvey (1989), Shumway and Stoffer

(2000) and Durbin and Koopman (2001, Part I). These treatments also allow for cases where other

elements of the system matrices vary over time deterministically.

Model (6) for which the common variance σ2 is allowed to vary over time is considered in this

paper and maximum likelihood estimation of its parameters is treated generally in section 3.

2.3 Stochastic volatility model

Another special case of the general model (3), (4) and (5) is obtained by taking αt = 0 such that

the mean specification is fixed and that only the variance specification of the model is stochastic.

The restriction leads to the specification of a stochastic volatility (SV) model of the form

yt = ct + σεεt, εt ∼ N (0, σ2t ), σ2

t = exp(ht), t = 1, . . . , n, (7)

6

where σε replaces Gt in equation (3). The log-volatility ht can be modelled as the stationary

autoregressive process (2) that we have as a special case of (5) with dt = (1−φ)d, At = 1, Bt = φ,

Ct = 0, Dt = ση > 0, b1 = d, Q1 = σ2η/(1 − φ2) and with |φ| < 1. Various contributions have

appeared in the literature for estimating SV models by maximum likelihood using importance

sampling techniques; see Danielson (1994), Shephard and Pitt (1997), Durbin and Koopman

(1997), Sandmann and Koopman (1998) and Durham and Gallant (2002). Other methods of

inference have also been considered in the context of SV models including Bayesian methods,

e.g. Kim, Shephard, and Chib (1998), and efficient method of moments procedures, e.g. Gallant,

Hsieh, and Tauchen (1997).

2.4 ARMA models with stochastic variances

It is well-known that the autoregressive moving average (ARMA) model can be represented in

state space. The general model allows the ARMA(p∗, q∗) model to have a stochastic variance

specification, that is

yt = ϕ1yt−1 + . . .+ ϕp∗yt−p∗ + εt + θ1εt−1 + . . .+ θq∗εt−q∗ , εt ∼ N (0, σ2t ), (8)

for t = 1, . . . , n and with orders p∗ ≥ 0 and q∗ ≥ 0. The log-variance ht = log σ2t can be

modelled by an autoregressive process such as in (2) or by any other time series process that can

be represented in the general state space form (5).

2.5 Unobserved components models with common stochastic variances

The local level model (1) is a special case of the unobserved components time series model that

consists of the level αt and the irregular εt as the two unobservables. More elaborate specifications

within this class of time series models are discussed in Harvey (1989). The so-called basic structural

time series model consists of trend, seasonal and irregular components. This model with a common

stochastic variance is given by

yt = µt + γt + q1ε1t,

µt+1 = µt + βt + q2ε2t,

βt+1 = βt + q3ε3t,

S(L)γt+1 = q4ε4t,

(9)

for t = 1, . . . , n and where S(L) is the seasonal sum operator 1+L+ . . .+Ls−1 for seasonal length

s and lag operator L such that Lyt = yt−1. Variable εit is the i-th element of the disturbance

vector εt which is distributed as in (4) and with the common log-variance given by (5), for i =

1, . . . , 4. The unknown fixed coefficients q1, . . . , q4 are unknown and can be estimated by maximum

7

likelihood. The variances are uniquely identified in this model provided that the common log-

variance ht follows a stationary process with zero mean and some constant variance σ2h. In this

case we note that the variance of the ith disturbance is q2i σ2t with unconditional expectation

q2i exp{2E(ht) + var(ht)} = q2i expσ2h,

for i = 1, . . . , 4. When ht is modelled as a nonstationary process, the restriction of the zero mean

for this process is replaced by the restriction that h0 = 0 is fixed.

The state space representation of the mean equation of the model (9) for quarterly data is

based on the 5× 1 state vector

αt =(µt βt γt γ2t γ3t

)′,

where γ2t and γ3t are auxiliary variables required for the formulation of the seasonal component

in state space. Other specifications for the components, in particular the seasonal component, can

be considered and they are discussed in detail by Harvey (1989). We finally note that the initial

state vector is diffuse and requires specific modifications for the analysis which are discussed in

section 3.5.

2.6 Single source of error models

Ord, Koehler, and Snyder (1997) consider unobserved components models such as the ones dis-

cussed in the previous section but with ε1t = ε2t = ε3t = ε4t; they refer to this class of models

as single source of error models. Nonlinear extensions of this class of models are also considered.

They argue that the single source of error models cover a larger extent of the parameter space

in the stationary representation of the model and that it therefore will produce more accurate

model-based forecasts. In a general treatment of time series models with correlated disturbances,

Harvey and Koopman (2000) argue that single source of error models (or models with perfectly

correlated disturbances) are not appropriate for signal extraction. It is noted that the common

stochastic variance discussed in this paper apply naturally to the single source of error models.

A specific example of a nonlinear single source of error model is discussed by Engle and Smith

(1999) and is known as the stochastic permanent break (STOPBREAK) model. In its simplest

form the model is given by

yt = µt + qtεt, µt+1 = µt + εt, t = 1, . . . , n, (10)

where qt is a function of past realisations of εt. The STOPBREAK model can be casted into

the general specification given earlier in section 2.1 when qt is deterministic and not a function of

lagged εt’s and when σ2 is replaced by the common stochastic variance σ2t as modelled by (4) and

(5).

8

3 Estimation by maximum likelihood

In this section we develop a method for computing the Monte Carlo estimate of the loglikelihood

function for state space models with common stochastic variance. In the standard situation, with

deterministic variances, the density p(y|ψ) of the data can be computed through a prediction-error

decomposition (see section 3.1). The loglikelihood is then the sum of the individual contributions

of the prediction errors vt. With stochastic variance however, the Kalman equations cannot be

used for extracting the vt’s as the variances are not available in the filtering equations. Only

after conditioning on the common stochastic variance can the loglikelihood be computed from the

output of the Kalman filter. Importance sampling is used to adjust for the bias introduced by

conditioning, as explained in subsequent sections.

3.1 Kalman filter

Consider model (3), (4) and (5) and define vector ψ as the collection of unknown elements

of the system matrices for the mean and variance equations. Given a realised sequence for

σ∗ = (σ1, . . . , σn)′, the joint density of the observations y = (y′1, . . . y′n)′ can be obtained via

the prediction error decomposition. In particular, the conditional logdensity of y is given by

log p(y|σ∗;ψ) = −nN2

log 2π − 12

n∑t=1

(log |σ2

tFt|+ σ−2t v′tF

−1t vt

), (11)

where vt is the vector of one-step ahead prediction errors and σ2tFt is its variance matrix; see

Schweppe (1965) and Harvey (1989). Both vt and Ft are produced by the Kalman filter as given

by

vt = yt − ct − Ztat, Ft = ZtPtZ′t +GtG

′t,

Kt = TtPtZ′tF

−1t , (12)

at+1 = Ttat +Ktvt, Pt+1 = TtPt(Tt −KtZt)′ +HtH′t,

for t = 1, . . . , n and where a1 and P1 are known values and represent the unconditional mean

and variance of the initial state vector, respectively. The vector at is the linear estimator or

predictor of the state αt conditional on {y1, . . . , yt−1} with variance matrix Pt, for t = 1, . . . , n.

The intermediate matrix Kt is known as the Kalman gain. Proofs of the Kalman filter are given by

Anderson and Moore (1979) and Durbin and Koopman (2001, §4.2.1). We note that the Kalman

filter equations as given by (12) do not depend on σ∗. This implies that vt and Ft do not change

when σ∗ vary for t = 1, . . . , n.

The innovations vt can be regarded as linear transformations of y for a given ψ; see Harvey

(1989). The consequence of the well-known Gaussian properties is that the innovation vectors are

9

also normally distributed and given by

vt ∼ N (0, σ2tFt), t = 1, . . . , n. (13)

It is further noticed that the Jacobian of the transformation from y to v = (v′1, . . . v′n)′ is unity.

Therefore the density of y is the same as the density of v, that is

p(y|σ∗;ψ) = p(v|σ∗;ψ),

where we can take v as being generated by the Gaussian model (13) for given σ∗.

The key to the simultaneous estimation of parameters related to the stochastic mean and

variance equations of model (3)–(5) is the result that the statistical properties of the innovations

can be summarised by (13). This shows that σ2t is the common denominator of the variance matrix

of vt. When σ2t is the constant σ2 in model (3), it is a well-known fact that σ2 can be concentrated

out of the loglikelihood function and that equation (13) holds for σ2t = σ2; see Harvey (1989). The

maximum likelihood estimate of σ2 is then simply given by

σ2 =1n

n∑t=1

v′tF−1t vt.

3.2 Importance sampling

When σt follows a stochastic process as given by (4) and (5) we can not express the Gaussian

density p(y;ψ) in analytical terms. This also applies to density p(v;ψ) associated with model

(13). The difficulty is well recognised in the econometric and statistical literature on stochastic

volatility models; see, for example, the overview articles by Ghysels, Harvey, and Renault (1996)

and Shephard (1996). We note that model (13) with σt modelled as (2) can be regarded as the SV

model for the innovations vt. A standard tool in evaluating non-tractable densities is importance

sampling and is applied to SV models by Danielson (1994), Shephard and Pitt (1997), Sandmann

and Koopman (1998) and Durham and Gallant (2002). The same approach will be taken in this

paper. The technique involves approximating the solution of the density via averages of simulations

from an approximating model. Importance sampling was first used in econometrics by Kloek and

Van Dijk (1978) in their work on computing posterior densities. Geweke (1989) proposes to use

the Lindeberg-Levy central limit theory in order to assess the accuracy of the importance sampler.

Koopman and Shephard (2002) discuss practical procedures based on extreme value theory for

empirically testing the asymptotic normality of the importance sampling estimator.

Given the innovations and their scaled variance matrices (both can be computed without

observing σ∗), we express the density of the observations as

p(y;ψ) =∫p(y|σ∗;ψ)p(σ∗;ψ)dσ∗

=∫p(v|σ∗;ψ)p(σ∗;ψ)dσ∗, (14)

10

where p(v|σ∗;ψ) is the density of model (13) as given by (11) for a realised value of σ∗. Given a

realised value of y, the evaluation of the resulting likelihood function (14) via importance sampling

is based on an importance sampling density g(σ∗|v;ψ) from which it is relatively easy to simulate

from. For considerations of feasibility, the importance density g(·) is chosen to be close to the

density p(·). The observation density associated with the true model can then be represented by

p(y;ψ) =∫p(v|σ∗;ψ)p(σ∗;ψ)

g(σ∗|v;ψ)g(σ∗|v;ψ)dσ∗, (15)

which we approximate via importance sampling. This requires generating Monte Carlo simulations

from the importance density g(σ∗|v;ψ) to obtain the estimator

p(y;ψ) =1M

M∑i=1

p(v|σ∗(i);ψ)p(σ∗(i);ψ)g(σ∗(i)|v;ψ)

, (16)

where σ∗(i) is a realisation from the importance density g(σ∗|v;ψ). The Monte Carlo approxima-

tion (16) of the true density (15) will be based on a Gaussian importance density g(σ∗|y;ψ) in

this paper. We note that the estimator (16) is only subject to simulation error.

The importance density g(σ∗|y;ψ) = g(σ∗|v;ψ) for model (13) can be based on the approxi-

mating linear Gaussian state space model

vt = ht + ut, ut ∼ NID(rt, st), t = 1, . . . , n, (17)

where vt is obtained from the time series y1, . . . , yn by the Kalman filter as described in §3.1

and where ht is given by the model specification for ht = log σ2t as in (5). The time-varying

mean rt and variance st are identified via a recursive procedure for which the details are given

by Shephard and Pitt (1997) and Durbin and Koopman (2001). Generating simulations from the

importance density g(σ∗|v;ψ) are in effect based on the approximating model (17) and is referred

to as simulation smoothing. Various simulation smoothing methods exist to compute conditional

draws from the linear Gaussian state space model. A recent account of simulation smoothing

together with the development of a new, simple and efficient device for this is given by Durbin

and Koopman (2002).

In practice, focus is on the logdensity and in the case of a Gaussian importance density it can

be shown that

log p(y;ψ) = log g(v;ψ) + log

[1M

M∑i=1

p(v|σ∗(i);ψ)g(v|σ∗(i);ψ)

]+O(M−3/2), (18)

where log g(v;ψ) is the logdensity of the approximating model that can be evaluated by the Kalman

filter; the details of this result are given by Durbin and Koopman (1997).

3.3 Estimation procedure

It has been shown that we can evaluate the loglikelihood function for a linear Gaussian model

with a common stochastic time-varying variance using importance sampling methods. The full

11

procedure is relatively easy to implement and can be summarised by the following five steps:

1. Apply the Kalman filter to model (3) without any consideration of σ21 , . . . , σ

2n; store vt and

Ft for t = 1, . . . , n.

2. Consider model (13) and obtain an approximating model; compute the loglikelihood function

of the approximating model using the Kalman filter and generate M samples from the

resulting Gaussian importance density g(σ∗|y;ψ) using the device of Durbin and Koopman

(2002).

3. Compute the Monte Carlo estimate of the loglikelihood function given by

log `(y;ψ) = log g(v;ψ) + log

[1M

M∑i=1


].

4. Maximise this loglikelihood with respect to ψ; the evaluations of the Monte Carlo estimator

log `(y;ψ) for different values of ψ is based on the same set of random numbers for the

sampling of σ∗ from the Gaussian density g(σ∗|y;ψ) to ensure a smooth loglikelihood function

in ψ.

This procedure is numerically stable and is implemented for the object-oriented matrix program-

ming language Ox of Doornik (1999) using the state space functions in the Ox library SsfPack as

documented by Koopman, Shephard, and Doornik (1999). The programs written for this paper

can be obtained from www.ssfpack.com.

3.4 Signal extraction and forecasting

The state vector αt and disturbance vector εt of model (3) with known σ2t can be estimated by

standard state space methods; see Durbin and Koopman (2001). For example, state smoothing

recursions exist for the linear Gaussian state space model that compute αt = E(αt|y) and Vt =

var(αt|y) for t = n, . . . , 1. In the case of a stochastic time-varying process for σt we need to take

account of the variation of σt because a realisation of αt depends partially on σt. In a similar way

as for the construction of the likelihood function, we can develop an importance estimator for αt

which is given by

αt = ω−1M∑i=1

ωiα(i)t , ωi =


, ω =M∑i=1

ωi, (19)

where α(i)t is the smoothed estimate E(αt|y) for αt in model (3) with σ replaced by time-varying

standard deviations in the n× 1 vector σ∗ = σ∗(i). The smoothed state variance is given by

var(αt|y) = ω−1M∑i=1

ωiV(i)t ,

12

where V(i)t is the smoothed state variance Vt for model (3) with σ replaced by time-varying

standard deviations in σ∗ = σ∗(i).

In a similar fashion we can compute the filtered estimate of the state vector E(αt|y1, . . . , yt).

However, the importance sampling weights ωi are ratios of densities of v which is a linear trans-

formation of y. This estimate can therefore not be properly defined as a filtered estimate. Other

simulation techniques such as Monte Carlo filtering (or particle filtering) that is reviewed in Doucet,

deFreitas, and Gordon (2000) should be employed for filtering. These techniques shall not be con-

sidered in this paper. Finally, forecasts can be computed by importance estimation techniques

since forecasts are conditional on y.

Algorithms for computing observation weights for the construction of filtered and smoothed

estimates of the state vector are reviewed in Harvey and Koopman (2000) for the linear Gaussian

unobserved components model. These weights can also be computed for models with a common

stochastic variance and they can be constructed in a similar way as in (19). In particular, the

observation weights of the estimated state vector αt are given by

wt,j = ω−1M∑i=1

ωiw(i)t,j , j = 1, . . . , n,

where w(i)t,j is the jth observation weight of the smoothed state αt for the model (3) with σ replaced

by time-varying standard deviations in σ∗ = σ∗(i).

3.5 Diffuse initialisation

Time series models with nonstationary dynamic specifications require diffuse initial conditions of

the state vector when such models are casted in state space form. The consequences of diffuse initial

conditions for Kalman filtering and smoothing have been discussed elsewhere; see, for example,

the treatment given by Durbin and Koopman (2001, Chapter 5) and the references therein. In

the case of simulation smoothing, diffuse initialisations can be accounted for in a straightforward

fashion for which the details are given by Durbin and Koopman (2002).

4 Small-sample properties: A Monte Carlo study

Since the unknown parameters are estimated by exact maximum likelihood, subject to simulation

error, the asymptotic properties of the estimators apply as usual. Given the numerical involvement

of the estimation procedure, it is interesting to investigate the finite sample properties of the

estimators. We therefore carry out a Monte Carlo study.

13

4.1 Design of the Monte Carlo study

We consider the following three different model specifications for the mean equation: the stationary

autoregressive moving average (ARMA) model (8), the local level model (1) and the unobserved

components model (9). Further, the common stochastic variance (CSV) is based on the log-

variance ht = log σt which is either fixed at zero, ht = 0, or is modelled by the autoregressive model

(2). The different data generation processes (DGPs) and their parameter values are presented in

Table 1.

Table 1: Data generation processes of the Monte Carlo study

No CSV CSV

ARMA yt as in (8) with p∗ = q∗ = 1, yt as in (8) with p∗ = q∗ = 1,

ϕ1 = 0.8, θ1 = −0.6, ϕ1 = 0.8, θ1 = −0.6,

ht = 0. ht as in (2) with d = 0, φ = 0.9, ση = 0.2.

LL yt as in (1) with q = 0.5, yt as in (1) with q = 0.5,


BSM yt as in (9) with s = 4, q3 = 0, β1 = 0, yt as in (9) with s = 4, q3 = 0, β1 = 0,

q1 = 1, q2 = 0.5, q4 = 0.2, q1 = 1, q2 = 0.5, q4 = 0.2,


For each DGP, M = 500 series are generated with three different number of observations:

n = 100, n = 500 and n = 1000. Each generated series is estimated twice using two different model

specifications. Both models adopt the “true” mean equation of the DGP while the common log-

variance is either fixed at zero (see column No CSV) or stochastic and modelled as an autoregressive

process (see column CSV). In the remaining part of this section we discuss the simulation results

that are based on this simulation design and that are presented in the Tables 2–8.

4.2 ARMA simulation results

The simulation results for the ARMA model are presented in Table 2 which consists of two panels:

the first panel considers the DGP without a common stochastic variance (No CSV) and the second

panel considers the DGP with CSV. Each panel has four vertical blocks of which the last three

are associated with the three different sample sizes (100, 500 and 1000). In each of these blocks a

summary of the estimation results are presented for the M = 500 simulated series without CSV

and with CSV. The sample mean of the estimated parameters for the M series is given together

with its sample standard deviation. Further, averages of two diagnostic statistics are reported

together with the percentages of M diagnostics that have not passed the corresponding critical

value. The first diagnostic test statistic is the normality test of Doornik and Hansen (1994),

14

which is an adapted version of the test for normality of Bowman and Shenton (1975), and is

χ2 distributed with two degrees of freedom. The second diagnostic is the heteroskedasticity test

H(h) =∑T

t=T−h+1 v2t /

∑d+1+ht=d+1 v

2t statistic and is Fh,h distributed with h ≈ n/3. Finally the

average value of the M maximised log-likelihood values is reported for both estimated models in

each block.

Table 2: Small sample results for the ARMA modelGenerating no SV

n = 100 n = 500 n = 1000Estimating No SV SV No SV SV No SV SV

p p s.d. p s.d. p s.d. p s.d. p s.d. p s.d.

ϕ 0.8 0.72 0.24 0.70 0.27 0.79 0.08 0.78 0.08 0.79 0.05 0.79 0.05ϑ −0.6 −0.53 0.28 −0.51 0.30 −0.59 0.10 −0.59 0.11 −0.60 0.07 −0.59 0.07σARMA 1 0.98 0.07 0.96 0.08 1.00 0.03 0.99 0.04 1.00 0.02 1.00 0.02σSV 0.16 0.23 0.11 0.13 0.09 0.11φSV 0.36 0.33 0.37 0.31 0.33 0.27

Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej

Normality-DH 2.09 0.07 2.03 0.07 2.00 0.05 1.90 0.05 2.04 0.06 1.88 0.05H(n/3) 1.10 0.35 1.10 0.34 1.02 0.14 1.02 0.13 1.01 0.04 1.01 0.06LL -139.70 -139.06 -707.35 -708.98 -1416.60 -1419.20

Generating SV



ϕ 0.8 0.70 0.30 0.69 0.29 0.78 0.09 0.78 0.09 0.79 0.05 0.79 0.05ϑ −0.6 −0.50 0.33 −0.49 0.33 −0.59 0.12 −0.58 0.11 −0.59 0.07 −0.59 0.07σARMA 1 1.03 0.12 0.97 0.12 1.05 0.06 1.00 0.06 1.05 0.04 1.00 0.04ση 0.2 0.28 0.25 0.22 0.10 0.21 0.08φ 0.9 0.64 0.34 0.84 0.16 0.87 0.08



Reported are the average value of the estimated parameters (p) together with the standard devia-tions (s.d.) of the estimates. The average value of the Normality (DH) and heteroskedasticity (H)test statistics are reported, together with the fraction of rejections (fRej), using a significance levelof 5%. Furthermore, the average loglikelihood value (LL) is given.

The true DGP model that forms the basis of the results in Table 2 is the ARMA model (8)

with p∗ = q∗ = 1 and with coefficients ϕ = 0.8, ϑ = −0.6 and σ2t = 1 for t = 1, . . . , n for the

case without CSV. Estimating the coefficients using a realised time series from the true model

may be difficult due to the existence of strong (negative) correlation between estimates of ϕ and ϑ

caused by possible root cancellations. For the DGP with CSV we consider model (2) for σ2t with

coefficients d = 0, φ = 0.9 and ση = 0.2.

For the smaller sample sizes, estimates of ϕ and ϑ have relatively large standard errors but

they become smaller for larger samples. The estimated coefficients for the mean equation of the

model are not biased irrespective of the specification of the common variance part of the model.

The estimated coefficients for the variance equation of the model tend to be insignificant when

15

the DGP is without CSV.

When the DGP has a fixed variance (No CSV), the normality test is rejected in about 5% of

the cases, so the size of the test is correct. When the DGP includes the CSV specification but it

is not estimated, normality is rejected in 23% of the cases for n = 100 and in 84% of the cases for

n = 1000. The heteroskedasticity test appears to be a less reliable test statistic for detecting of

model misspecification due to the omission of a common stochastic variance. When CSV is both

generated and estimated, the size of the normality test is also not correct for all sample sizes; the

rejection rate of the normality test can be 10%.

The estimation of state space models with a common stochastic variance is more demanding

compared to the estimation of standard state space models. However, the extra computational

burden due to the use of simulation methods is not excessive. As an overall conclusion we empha-

size that estimation results can be obtained quickly and the precision is good.

4.3 Other simulation results

A Monte Carlo study has also been carried out for the class of unobserved components models

with and without a common stochastic variance equation. Tables 7 and 8 (included at the end of

this paper) report the results for DGPs based on model (9). Table 7 concerns the local level model

(1) with σ2 = 1 and q = 0.5 or, equivalently, the basic structural model (9) with q1 = 1, q2 = 0.5

and βt = γt = 0 for t = 1, . . . , n. The DGP with CSV has the same specification for the common

stochastic variance σ2t as for the ARMA model of the previous section. Table 8 concerns the local

level model with a stochastic seasonal component, that is model (9) with q3 = 0 and β1 = 0 such

that βt = 0 for t = 1, . . . , n. The parameter values for this DGP are selected as q1 = 1, q2 = 0.5

and q4 = 0.2. The same specification for the CSV is taken as for the other models.

The Monte Carlo results for the two models produce similar results as for the ARMA model

reported in the previous section. It is surprising to see that when the DGP is taken with no CSV

(that is, ht = 0) and estimation is based on the model with CSV, the estimated coefficients of

the mean equation have no bias while the estimated coefficients of the variance equation appear

not to be significant. The reverse case (CSV is part of DGP but it is not estimated) does also

produce unbiased estimates for the mean equation but in this case the normality test statistics

are unsatisfactory and on these grounds most of the estimated models would have correctly been

rejected when sample sizes are sufficiently large. The heteroskedasticity tests are also unreliable

in this context.

16

5 Analysing macro-economic time series: two illustrations

5.1 U.S. monthly inflation rate

We consider the first difference of the logarithm of the monthly U.S. CPI index1 between 1957:1

and 2001:9 and is taken as the indicator of U.S. inflation. The data is presented in figure 1. In

terms of an unobserved components model, the time series plot of inflation suggests the inclusion

of level and seasonal components. Therefore we base our analysis on model (9) with s = 12 and

βt = 0 for t = 1, . . . , n and n = 537. The common variance specification for σ2t is based on the

log-variance specification (2) with d = 0. The parameter vector ψ for this model is given by

ψ = (q1, q2, q4, φ, ση)′,

where the coefficients q1, q2 and q4 refer to equation (9) and φ and ση refer to (2). The mean

equation requires a diffuse initial condition for the state vector.

20406080

100120140160180200

1960 1970 1980 1990 2000

(i)

-0.6-0.4-0.2

00.20.40.60.8

11.21.4

1960 1970 1980 1990 2000

(ii)

Figure 1: (i) The U.S. monthly consumer price index (CPI) and (ii) the difference of log CPI

(inflation)

The estimation results reported in table 3 have been obtained by numerically maximising the

loglikelihood function which is computed either by the Kalman filter only (for the fixed variance

case) or by the Kalman filter and importance sampling methods described in section 3 (for the

general model with a common stochastic variance). The numerical optimisation is carried out by

the BFGS method as implemented in the the Ox functions library of Doornik (1999). In all cases

convergence was obtained rapidly and required 10 to 50 loglikelihood evaluations depending on

the specification of the model and the starting value of the parameter vector.

The parameter estimates of three model specifications are reported in table 3 together with 95%

(asymmetric) confidence intervals of the estimates and some diagnostic test statistics. The final

rows report the normality test of Doornik and Hansen (1994), the Box-Ljung Q test of Ljung and

Box (1978) for residual correlation (with 21 degrees of freedom for the LL-Seas model, 14 degrees1Source: Bureau of Labor Statistics, series CUUR0000SA0L1E, with the city average CPI index, all items less food

and energy, not seasonally adjusted.

17

Table 3: Estimation results of models for the U.S. monthly inflation rate

Parameter LL-Seas LL-Seas-CSV LL-Seas-CSV-δ

q1 (s.d. irr) 0.1579 [0.144, 0.173] 0.2493 [0.197, 0.316] 0.2892 [0.235, 0.356]q2 (s.d. level) 0.0474 [0.035, 0.063] 0.0554 [0.038, 0.080] 0.0538 [0.037, 0.078]q4 (s.d. seas) 0.0249 [0.018, 0.035] 0.0703 [0.052, 0.096] 0.0853 [0.064, 0.115]ϕ (CSV) 0.9935 [0.961, 0.999] 0.9994 [0.988, 1.000]ση (CSV) 0.2222 [0.159, 0.311] 0.0809 [0.040, 0.164]δL, 1974:2 0.5605 [0.372, 0.749]δL, 1974:11 −0.4744 [−0.667, −0.282]δI, 1980:7 −1.2057 [−1.627, −0.785]δL, 1981:9 −0.5260 [−0.794, −0.258]δL, 1982:8 −0.3683 [−0.643, −0.094]

LL 85.52 146.33 179.18Test statistic

Normality-DH 102.085 (0.000) 0.354 (0.838) 0.930 (0.628)Q 54.474 (0.000) 26.204 (0.125) 29.553 (0.009)ARCH 4.904 (0.270) 7.097 (0.229) 8.121 (0.215)H(n/3) 0.586 (1.000) 0.693 (0.992) 0.719 (0.985)Time 0.97 1:03.72 3:53.88

Parameter estimates with 95% confidence intervals of three models for U.S. inflation 1957-2001. The three models consist of a combination of level, seasonal plus irregular components(LL-Seas), common stochastic variance components (LL-Seas-CSV) and interventions, (LL-Seas-CSV-δ), respectively. The loglikelihood (LL) value and various diagnostic test statistics(with p-values) are also reported. The last row reports the estimation time.

of freedom for the LL-Seas-CSV-δ model), the ARCH-test of Engle (1982) for correlation in the

squared residuals and the heteroskedasticity test statistics H(n/3) together with the corresponding

p-values between square brackets. More details of the test statistics have been given in section 4.

In the second column, labelled as LL-Seas, the estimation results for the local level model

plus a stochastic seasonal component and with a fixed common variance are presented. The test

statistics based on the standardised residuals reveal that the model does not capture all dynamic

properties of the time series accurately. The residuals have relatively large values in the year 1974

and in the earlier years of the 1980s than in the other years. We therefore consider the inclusion

of a common stochastic variance such that the overall variance of the model can be different for

different time periods in the sample.

The third column with label LL-Seas-CSV in table 3 reports the estimated parameters for the

model with a common stochastic variance. This specification produces satisfactory results and,

based on the diagnostic test statistics, we conclude that a more appropriate model for the U.S.

inflation is found compared to model LL-Seas. Some concern is raised by the relative large value

for the Box-Ljung test but we emphasise that we only consider a simple parsimonious model for

inflation. In fact, it is surprising that the inclusion of the common stochastic variance can improve

the fit dramatically for such a simple model. Similar gains may be obtained when richer dynamic

mean specifications for the U.S. inflation series are considered.

18

The autoregressive parameter φ of the common stochastic variance is estimated at a value of

0.9935 which is close to the unit root. This type of near-random walk behaviour does not cause a

problem for the estimation method employed and may be caused by neglecting outliers and shifts

in the inflation series. To investigate the robustness of the new model and to ensure that our

results do not rely on a few a-typical observations we have included some dummy variables that

take account of an outlier in July 1980 and some possible level shifts in the inflation series. The

estimation results can be found in the column labelled as LL-Seas-CSV-δ and we observe that a

further increase of the loglikelihood value is obtained. The estimated value of φ is even closer to

unity. Further, even though all breaks are significant, the diagnostics do not indicate that the

model with dummy variables for outliers and breaks is superior compared to the model without

the dummy variables. We do not continue the analysis by considering other possible specifications

for the common variance such as a random walk model or a stochastic spline model.

-4

-3

-2

-1

0

1

2

3

60 70 80 90 00

(i)

-0.2-0.15

-0.1-0.05

00.05

0.10.15

0.20.25

0 10 20 30 40 50 60

(ii)

0

0.2

0.4

0.6

0.8

1

60 70 80 90 00

(iii)

Figure 2: Prediction residuals for the estimated LL-Seas-CSV-δ model: (i) standardised residuals;

(ii) autocorrelogram of residuals (continuous lines) and squared residuals; (iii) scaled cumulative

sum of squared residuals.

Figure 2 presents graphs of the standardised prediction residuals together with the autocorrela-

tion functions of the residuals and their squares and the cumulative sum (CUSUM) of the squared

residuals. These graphical diagnostic tests are based on one-step ahead prediction residuals and

they raise no concern with respect to the appropriateness of the LL-Seas-CSV-δ model for the

inflation series. From these results, we may conclude that the common stochastic variance is an

important characteristic of U.S. inflation rates. However, a more elaborate model with a richer

dynamic specification for the mean equation may well give a better fit and may provide a more

economic meaningful interpretation.

Figure 3 displays the estimated time-varying unobserved components trend µt and seasonal γt

together with the estimated time-varying standard deviation of the observation equation q1σt =

q1 exp(ht/2) for the final model reported in table 3. The estimation of the components are discussed

in section 3.4. In the first graph of figure 3 the jumps in the estimated level component are

associated with the level shifts in the model LL-Seas-CSV-δ. We note that when the common

19

00.10.20.30.40.50.60.70.80.9

11.1

60 70 80 90 00

(i)

-0.4-0.3-0.2-0.1

00.10.20.30.4

60 70 80 90 00

(ii)

0

0.05

0.1

0.15

0.2

0.25

60 70 80 90 00

(iii)

Figure 3: Estimation results for the LL-Seas-CSV-δ: (i) estimated level (including level breaks),

(ii) estimated seasonal and (iii) estimated standard deviation of the irregular component (scaled

by the estimated time-varying common stochastic standard deviation)

stochastic variance component is included, the model is flexible and can account for those shifts

and the breaks in the model are not absolutely necessary. The seasonality in the second graph

is smooth but clearly time-varying. The final graph of figure 3 presents the estimated series q1σt

which corresponds to the estimate of q1 in the LL-Seas model (that is approximately 0.16). The

common stochastic variance component displays a higher level in the early 1980s corresponding to

a higher uncertainty in the early period od the 1980s which is sometimes referred to as the “second

oil crisis”. Further it shows a gradual reduction in the level in the early 1990s corresponding to

the less volatile period associated with the Volcker-Greenspan regime at the U.S. Federal Reserve.

We therefore conclude that some interesting features of the U.S. inflation series are effectively and

realistically represented by the common stochastic variance in this basic model.

Figure 4 plots the observation weights for the trend plus seasonal signal µt + γt which are

computed as discussed in section 3.4. The weights are presented for the estimated models LL-

Seas (upper panels) and LL-Seas-CSV-δ (bottom panels). The panels on the left hand side are

associated with time point 1981:1 and we observe that the distribution of the weights have heavier

tails for the LL-Seas model with a constant variance compared to the model with a common

stochastic variance. The latter model allows for the increased variability at the beginning of

the 1980s and therefore the observations surrounding time point 1981:1 are more relevant than

observations in more tranquile periods and are given more weight. Also the weighting pattern is

asymmetric whereas for the model with a constant variance the weighting pattern is symmetric.

The weights associated with time point 1995:8 are the same as for time point 1981:1 in the upper

panel (apart from end-point effects) because the model has constant variances. In the bottom

panel it is shown that the less weight is given to observations in the 1980s and the beginning of

the 1990s whereas relative more weight is given to observations after 1995:8. This reflects that

observations from 1993 onwards can be relied upon with more confidence.

20

-0.1

0

0.1

0.2

0.3

65 70 75 80 85 90 95

LLS, 1981:1

-0.1

0

0.1

0.2

0.3

65 70 75 80 85 90 95

LLS-CSV-δ, 1981:1

-0.1

0

0.1

0.2

0.3

80 85 90 95 00

LLS, 1995:8

-0.1

0

0.1

0.2

0.3

80 85 90 95 00

LLS-CSV-δ, 1995:8

Figure 4: Observation weights for the signal of the LL-Seas and LL-Seas-CSV-δ models at periods

1981:1 and 1995:8

Table 4: Forecasting results for the inflation data

LL-Seas 1 3 6 12

MFE −0.004 −0.007 −0.005 −0.011

MAPE 0.123 0.131 0.120 0.104

RMSE 0.154 0.163 0.149 0.133

LL-Seas-CSV 1 3 6 12

MFE −0.004 −0.006 −0.005 −0.010

MAPE 0.107 0.109 0.103 0.098

RMSE 0.137 0.138 0.131 0.126

LL-Seas-CSV-δ 1 3 6 12

MFE −0.005 −0.008 −0.007 −0.011

MAPE 0.108 0.109 0.104 0.099

RMSE 0.137 0.138 0.133 0.127

The table reports the mean forecast error (MFE), the

mean absolute prediction error (MAPE) and the root mean

squared forecast error (RMSE), at horizons of 1, 3, 6 and 12

months ahead. Forecasting starts at 1985:1.

21

To investigate whether the common stochastic variance component improves the forecasting

performance, we have carried out the following forecasting exercise. Starting with the sample

1957:1-1985:1, the model is estimated and h-step forecasts are computed for h = 1, 3, 6, 12; for

the details, see section 3. The four forecast errors em,h = ym+h − ym+h|m are computed where

m refers to the final observation in the estimation sample and yt|m is the forecast of yt given

the observations of the estimation sample for t > m. The estimation and forecasting steps are

repeated for samples that are increased by one observation at a time until sample 1957:1-2000:9

is reached. Finally, some statistics are computed for the collection of forecast errors.

Table 4 reports the mean forecast error (MFE), the mean absolute prediction error (MAPE)

and the root mean squared error (RMSE) which are defined by

MFE =∑

et+h/k, MAPE =∑

|et+h|/k, RMSE =√∑

e2t+h/k.

where k is the number of forecasts errors for horizons h = 1, 3, 6, 12 months. We compare results

for the three inflation models as considered in table 3. The models with the common stochastic

variance outperform the forecast statistics of the model with constant variance for all forecasting

horizons. This is mainly due to the gradual change of the variance after 1985 which seems a

salient feature of US inflation data. By allowing for these changes in the common variance of the

model, more accurate forecasts can be produced. The last panel of table 4 gives the results for the

LL-Seas-CSV model with the pre-1985 level breaks. These statistics are similar to the ones based

on same the model without the breaks. This is probably due to the fact that no breaks occur in

the forecasting periods. Finally we note that for all models the MAPE and RMSE statistics are

lower at horizons 6 and 12 than at horizons 1 and 3 although this is less true for the two models

with a common stochastic variance. It appears that the seasonal effect is strong and therefore

good forecasts can be produced at yearly horizons in particular.

5.2 US industrial production

Another macroeconomic time series of interest is U.S. industrial production2. The monthly time

series from 1960 to 2001 is presented in levels, in logarithms and as percentages of month-to-month

growth rates in figure 5. The time series plot of growth reveals that the variability of the series is

lower after the early 1980s.

The graph of production growth shows further that seasonal variation is the most prominent

feature. We therefore start to estimate an unobserved components time series model with seasonal

and irregular components, that is model (9) with µt = βt = 0 (and q2 = q3 = 0). The estimation

results of this simple model are reported in the column of table 5 indicated by Seas. The large2Industrial Production – Market Group – Total Index – Not Seasonally Adjusted. Source: Economagic, series

b500001 ipnsa, Federal Reserve, Board of Governors. Available is 1919:1-2001:12 but we use only 1960:1-2001:12.

22

20

40

60

80

100

120

140

160

60 70 80 90 00

(i)

3.43.63.8

44.24.44.64.8

55.2

60 70 80 90 00

(ii)

-10-8-6-4-20246

60 70 80 90 00

(iii)

Figure 5: The U.S. monthly industrial production: (i) in levels, (ii) in logs and (iii) in percentage

growth

Table 5: Estimation results for U.S. industrial growth ∆ log(IP)

Parameter Seas AR-Seas AR-Seas-CSV

q1 (s.d. irr) 0.9499 [ 0.885, 1.019] 0.7248 [ 0.648, 0.811] 0.7463 [0.594, 0.938]q4 (s.d. seas) 0.0311 [ 0.024, 0.040] 0.0365 [ 0.029, 0.046] 0.0344 [0.025, 0.046]q5 (s.d. ar1) 0.5434 [ 0.426, 0.693] 0.4980 [0.352, 0.704]ϕ (ar1) 0.8756 [ 0.704, 0.935] 0.9471 [0.884, 0.970]φ (CSV) 0.9432 [0.810, 1.077]ση (CSV) 0.2072 [0.051, 0.848]LL -753.89 -715.56 -699.42Test statistics

Normality-DH 32.260 (0.000) 17.397 (0.000) 5.559 (0.062)Box-Ljung Q 82.733 (0.000) 24.271 (0.186) 29.261 (0.032)ARCH 35.894 (0.105) 17.540 (0.149) 1.829 (0.405)H(n/3) 0.477 (1.000) 0.480 (1.000) 0.785 (0.939)Time 0.58 1.48 1:19.34

Parameter estimates with 95% confidence intervals of three models for U.S. monthly in-dustrial growth 1960-2001. The three models consist of a combination of seasonal plusirregular components (Seas), autoregressive (AR-Seas) component and common stochas-tic variance (AR-Seas-CSV) component respectively. The loglikelihood (LL) value andvarious diagnostic test statistics (with p-values) are also reported. The last row reportsthe estimation time.

23

Box-Ljung Q portmanteau test3 for serial correlation indicates that the model is not appropriate

for the growth series. A closer investigation shows that the serial correlation coefficient of the

prediction residuals at the first lag is relatively large. Therefore an autoregressive component is

included in the model that is given by

yt = ρt + γt + ε1t, t = 1, . . . , n,

where γt and ε1t are specified in (9) and ρt is the autoregressive component given by

ρt+1 = ϕρt + q5ε5t, t = 1, . . . , n,

with ρ1 ∼ N{0, q25/(1−ϕ2)}. The estimation results of this model for the production growth series

is reported in the column of table 5 indicated by AR-Seas. The increase of the loglikelihood value

and the decrease of the Box-Ljung Q test-statistic for serial correlation are substantial compared to

the results of the seasonal plus irregular model. The estimate of the autoregressive ϕ is close to 0.9

and provides the expected persistency in U.S. growth whereas the time-variation of the seasonal

component remains unchanged. The normality test statistic however raises some concern about

the specification of the current model for U.S. growth in production. Some graphical diagnostics

are presented in figure 6 and they are reasonable although the prediction residuals are subject to

some unusual large values during the oil crises in the middle of the 1970s and the early 1980s.

They appear even more clearly in the final plot of figure 6, that is the cumulative sum of squared

prediction residuals.

It is observed earlier that the overall variation of growth is locally varying over time and

therefore we introduce the common stochastic variance. The estimation of this model does not

give numerical problems and the loglikelihood value increases by a value of almost 16 at the expense

of two additional unknown parameters and more computing time. The loglikelihood ratio clearly

indicates that this more elaborate model is preferable. The reported estimated parameter values

show that U.S. growth in the last, say, forty years is very persistent (the estimated autoregressive

coefficient for ρt is close to 0.95) and the persistency of the uncertainty in growth, as indicated

by the autoregressive coefficient φ of the CSV, is also substantial given the estimated value 0.94.

It is surprising to observe that the estimated confidence intervals for the seasonal component and

the irregular component remain close to their estimated values for the model without CSV. This

may indicate that the time-varying heteroskedasticity is most applicable to the autoregressive

component.

It is shown that some improvements in the modelling of macroeconomic time series can be

obtained by allowing the common variance to change stochastically over time. This is confirmed

by the graphical diagnostic plots for the prediction residuals of the final estimated model presented3The Q-statistic has 21 degrees of freedom for the smallest model, 17 degrees of freedom for the largest.

24

-4-3-2-101234

60 69 80 89 00

(i)

-0.2-0.15

-0.1-0.05

00.05

0.10.15

0.20.25

0 10 20 30 40 50 60

(ii)

00.10.20.30.40.50.60.70.80.9

1

60 69 80 89 00

(iii)

Figure 6: Prediction residuals of the estimated model “AR-Seas” with AR(1), seasonal and irreg-

ular components: (i) plot of standardised residuals; (ii) autocorrelogram of residuals (continuous

lines) and squared residuals; (iii) scaled cumulative sum of squared residuals.

in figure 7. The residuals are well-behaved and by considering a richer dynamic specification for

the mean equation we may well reduce the remaining serial correlation in the residuals further.

-3

-2

-1

0

1

2

3

60 69 80 89 00

(i)

-0.2-0.15

-0.1-0.05

00.05

0.10.15

0.20.25

0 10 20 30 40 50 60

(ii)

00.10.20.30.40.50.60.70.80.9

1

60 69 80 89 00

(iii)

Figure 7: Residuals of the model with AR(1), seasonal, irregular and SV components: (i) plot of

standardised residuals; (ii) autocorrelogram of residuals (continuous lines) and squared residuals;

(iii) scaled cumulative sum of squared residuals.

-1.5

-1

-0.5

0

0.5

1

60 69 80 89 00

(i)

-8

-6

-4

-2

0

2

4

6

60 69 80 89 00

(ii)

00.20.40.60.8

11.21.41.6

60 69 80 89 00

(iii)

Figure 8: Estimation results for the production growth model: (i) estimated autoregression, (ii)

estimated seasonal and (iii) estimated standard deviation of the irregular component (scaled by

the estimated time-varying common stochastic standard deviation)

25

Finally, the plots of the estimated components of interest in figure 8 are informative and they

display the main characteristics of the dynamics in U.S. growth. Further analyses may consider an

autoregressive component ρt that is based on a second order autoregressive process with complex

roots or on a cyclical component based on time-varying trigonometric terms. The introduction of

(lagged) explanatory variables may lead to an empirical model that is of more interest from an

economic point of view.

-0.1

0

0.1

0.2

0.3

60 65 70 75 80 85

AR-Seas, 1974:4

-0.1

0

0.1

0.2

0.3

80 85 90 95 00

AR-Seas, 1993:6

-0.1

0

0.1

0.2

0.3

60 65 70 75 80 85

AR-Seas-CSV, 1974:4

-0.1

0

0.1

0.2

0.3

80 85 90 95 00

AR-Seas-CSV, 1993:6

Figure 9: Weights for the AR-Seas and AR-Seas-CSV models, at the high-variability period 1974:4

and the low uncertainty period 1993:6

The observation weights of the estimated signal γt +ρt are presented for the time points 1974:4

and 1993:6 in figure 9. The top panels display the weights for the AR-Seas model with a fixed

variance and are therefore equivalent. The bottom panels present the weight patterns implied by

the model with a common stochastic variance and they are different from each other but also from

the associating weights of the top panels. The main difference is that the patterns of the bottom

panels are not symmetric and therefore different weights are assigned to observations with an

equal distance to the time index of the signal. This is caused by the (common) heteroskedasticity

modelled as a stochastic function of time.

The same forecasting exercise as for US inflation is carried out for US industrial growth where

forecasts are made based on the data until 1980:1 upto 2000:12. For each subsample the model with

a fixed variance and with a common stochastic variance is estimated and forecasts are computed

for horizons h = 1, 3, 6, 12. The results are reported in table 6. The mean of the forecasts errors

reduces considerably for the model with a common stochastic variance leading to less bias in the

26

Table 6: Forecasting results for US industrial growth

AR-Seas 1 3 6 12

MFE 0.103 0.136 0.191 0.161

MAPE 0.697 0.707 0.696 0.677

RMSE 0.921 0.925 0.886 0.859

AR-Seas-CSV 1 3 6 12

MFE 0.073 0.088 0.134 0.115

MAPE 0.704 0.705 0.693 0.688

RMSE 0.929 0.929 0.892 0.880

The table reports the mean forecast error (MFE), the

mean absolute prediction error (MAPE) and the root

mean squared forecast error (RMSE), at horizons of

1, 3, 6 and 12 months ahead. Forecasting starts at

1980:1.

forecasts. Note that the bias is still higher than in the case of US inflation. It is surprising that the

variation of the forecast errors is equivalent for both models. This is probably due to the fact that

the main feature of US growth is described only by the seasonal component. The autoregressive

component and the heteroskedastic nature of the time series seem less important.

6 Conclusions

In this paper, we have considered the possibility of simultaneously modelling both the mean

equation and the common stochastic variance equation within a general class of linear time series

models. The state space framework is flexible and useful in, for example, dissecting time series

into unobserved components so that level, slope and seasonal effects in a time series can be

analysed separately. With the inclusion of a stochastic volatility component into the model,

further flexibility is attained that can be useful for the modelling of macroeconomic time series.

After an introduction of the general model together with a discussion of some special cases in

section 2, a short theoretical exposition is given in section 3. A Monte Carlo experiment is carried

out to show the small sample properties of the estimates produced by the importance sampling

estimation method. These simulation based maximum likelihood estimates are obtained subject

to simulation error which can be made arbitrarily small by enlarging the number of simulations.

The methodology is illustrated by modelling a monthly time series of U.S. inflation between

1957–2001 (537 observations). It is found that a simple local level model with a seasonal component

and a common stochastic variance is appropriate and that it can incorporate some important

27

dynamic features of inflation that is difficult to model since the series has been subject to some

shocks in the last forty years. The autoregressive common log-variance has been able to capture

(i) the volatile period of the 1970s and early 1980s associated with oil crises and (ii) the less

volatile period after the end of the 1980s associated with the Volcker-Greenspan regime at the U.S.

Federal Reserve. The role that the common stochastic variance play in the analysis is illustrated

by presenting the observations weights associated with the estimated signal of the time series and

by presenting forecasting results. We show that US inflation forecasts that are based on a simple

descriptive model can be improved by incorporating a common stochastic variance component.

A similar analysis is carried out for the monthly percentage growth of US industrial production

in the years 1960 to 2001. The in-sample fit of the model with a common stochastic variance

component improves significantly although the forecast results are less affected by the extension.

This may be due to the highly seasonal nature of the time series.

The framework of a common stochastic variance can be exploited into other directions. In

this paper we have focused on stochastic heteroskedasticity as a function of time although it can

also be related to other characteristics of the observation yt. For example, the actual value of the

observation or the value of another variable associated with yt may be a source of heteroskedas-

ticity. Let us assume that heteroskedasticity is a stochastic function of the variable xt associated

with yt. The treatment of a common stochastic variance as a function of xt is similar to the

method described in section 3. The Kalman filter is used to obtain innovations based on the state

space model (3) conditional on the observations and on a given value of σ∗. It is noticed that

the innovations are independently distributed. An intermediate step is to re-order the innovations

vt and their scaled variances Ft subject to the magnitude of, say, xt. Denote these re-ordered

innovations by voj and their variances by F o

j . It is still valid to treat the re-ordered innovations as

being generated by the model

voj ∼ N (0, σo2

j ), j = 1, . . . , n.

We then can use model (5) for the re-ordered quantities (so using index j instead of time index t)

and obtain a Monte Carlo estimate of the loglikelihood function since the equality p(y|ψ) = p(vo|ψ)

where vo is the stack of the re-ordered innovations still applies. Further research is required to

investigate the effectiveness of this development but this short exposition reveals the flexibility of

the proposed methodology for modelling stochastic heteroskedasticity.

References

Anderson, B. D. O. and J. B. Moore (1979). Optimal Filtering. Englewood Cliffs: Prentice-Hall.

Bos, C. S., R. J. Mahieu, and H. K. van Dijk (2000). Daily exchange rate behaviour and hedging

28

of currency risk. J. Applied Econometrics 15 (6), 671–696.

Bowman, K. O. and L. R. Shenton (1975). Omnibus test contours for departures from normality

based on√b1 and b2. Biometrika 62, 243–250.

Brockwell, P. J. and R. A. Davis (1987). Time Series: Theory and Methods. New York: Springer-

Verlag.

Danielson, J. (1994). Stochastic volatility in asset prices: Estimation with simulated maximum

likelihood. Journal of Econometrics 61, 375–400.

De Jong, P. (1989, December). Smoothing and interpolation with the state-space model. J.

American Statistical Association 84 (408), 1082–1088.

Doornik, J. A. (1999). Object-Oriented Matrix Programming using Ox (3rd ed.). London: Tim-

berlake Consultants Ltd. See http://www.nuff.ox.ac.uk/Users/Doornik.

Doornik, J. A. and H. Hansen (1994). An omnibus test for univariate and multivariate normality.

Technical report, Nuffield College, Oxford, UK.

Doucet, A., J. F. G. deFreitas, and N. J. Gordon (Eds.) (2000). Sequential Monte Carlo methods

in practice. New York: Springer-Verlag.

Durbin, J. and S. J. Koopman (1997). Monte Carlo maximum likelihood estimation of non-

Gaussian state space model. Biometrika 84, 669–84.

Durbin, J. and S. J. Koopman (2001). Time Series Analysis by State Space Methods. Oxford:

Oxford University Press.

Durbin, J. and S. J. Koopman (2002). A simple and efficient simulation smoother for state space

time series analysis. Biometrika 89, 603–616.

Durham, G. B. and A. Gallant (2002). Numerical techniques for maximum likelihood estimation

of continuous-time diffusion. J. Business and Economic Statist. 20 (3), 297–316.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance

of United Kingdom inflations. Econometrica 50, 987–1008.

Engle, R. F. and A. D. Smith (1999). Stochastic permanent breaks. Review of Economics and

Statistics 81 (4), 553–574.

Gallant, A. R., D. Hsieh, and G. Tauchen (1997). Estimation of stochastic volatility models

with diagnostics. Journal of Econometrics 81 (1), 159–192.

Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration.

Econometrica 57 (6).

29

Ghysels, E., A. C. Harvey, and E. Renault (1996). Stochastic volatility. In G. Maddala and

C. Rao (Eds.), Handbook of Statistics, Volume 14, Statistical Methods in Finance, pp. 119–

191. North-Holland, Amsterdam.

Hamilton, J. (1994). Time Series Analysis. Princeton: Princeton University Press.

Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cam-

bridge: Cambridge University Press.

Harvey, A. C. and S. J. Koopman (2000). Signal extraction and the formulation of unobserved

components models. Econometrics Journal 3, 84–107.

Harvey, A. C., E. Ruiz, and N. Shephard (1994). Multivariate stochastic variance models. Rev.

Economic Studies 61, 247–264.

Jacquier, E., N. G. Polson, and P. E. Rossi (1994). Bayesian analysis of stochastic volatility

models. J. Business and Economic Statist. 12, 371–417.

Kim, C. J. and C. R. Nelson (1999). State Space Models with Regime Switching. Cambridge,

Massachusetts: MIT Press.

Kim, S., N. Shephard, and S. Chib (1998). Stochastic volatility: Likelihood inference and com-

parison with ARCH models. Rev. Economic Studies 64, 361–393.

Kloek, T. and H. K. Van Dijk (1978). Bayesian estimates of equation system parameters: An

application of integration by Monte Carlo. Econometrica 46, 1–20.

Koopman, S. J. and N. Shephard (2002). Testing the assumptions behind the use of importance

sampling. Discussion paper, Nuffield College, Oxford.

Koopman, S. J., N. Shephard, and J. A. Doornik (1999). Statistical algorithms for models in

state space using SsfPack 2.2. Econometrics Journal 2, 107–160.

Ljung, G. M. and G. E. P. Box (1978). On a measure of lack of fit in time series models.

Biometrika 66, 67–72.

Nabeya, S. and K. Tanaka (1988). Asymptotic theory of a test for the constancy of regression

coefficients against the random walk alternative. Annals of Statistics 16, 218–235.

Ord, J. K., A. B. Koehler, and R. D. Snyder (1997). Estimation and prediction for a class of

dynamic nonlinear statistical models. J. American Statistical Association 92, 1621–1629.

Sandmann, G. and S. J. Koopman (1998). Estimation of stochastic volatility models via Monte

Carlo maximum likelihood. Journal of Econometrics 87, 271–301.

Schweppe, F. (1965). Evaluation of likelihood functions for Gaussian signals. IEEE Transactions

on Information Theory 11, 61–70.

Shephard, N. (1994). Partial non-Gaussian state space. Biometrika 81 (1), 115–131.

30

Shephard, N. (1996). Statistical aspects of ARCH and stochastic volatility. In D. Cox, D. Hink-

ley, and O. Barndorff-Nielsen (Eds.), Time Series Models in Econometrics, Finance and

Other Fields, Number 65 in Monographs on Statistics and Applied Probability, pp. 1–67.

Chapman and Hall, London.

Shephard, N. and M. K. Pitt (1997). Likelihood analysis of non-Gaussian measurement time

series. Biometrika 84, 653–667.

Shumway, R. H. and D. S. Stoffer (2000). Time Series Analysis and its Applications. Springer

Texts in Statistics. New York: Springer.

Taylor, S. J. (1994). Modeling stochastic volatility: A review and comparative study. Mathe-

matical Finance 4 (2), 183–204.

Table 7: Small sample results for the Local Level modelGenerating no SV



q1 1.0 1.00 0.10 0.98 0.11 1.00 0.04 0.99 0.04 1.00 0.03 0.99 0.03q2 0.5 0.48 0.11 0.48 0.11 0.50 0.05 0.49 0.05 0.50 0.04 0.50 0.03ση 0.15 0.23 0.11 0.13 0.08 0.11φ 0.38 0.31 0.34 0.30 0.35 0.28



Generating SV



q1 1.0 1.05 0.15 0.98 0.14 1.05 0.07 1.00 0.06 1.05 0.05 1.00 0.05q2 0.5 0.51 0.13 0.49 0.13 0.52 0.06 0.50 0.06 0.52 0.04 0.50 0.04ση 0.2 0.23 0.22 0.22 0.10 0.20 0.07φ 0.9 0.68 0.33 0.85 0.13 0.88 0.08



See table 2 for a description of the entries in the table.

31

Table 8: Small sample results for the Local Level plus Seasonal modelGenerating no SV



q1 1.0 0.99 0.15 0.98 0.18 0.99 0.06 0.99 0.06 1.00 0.04 0.99 0.04q2 0.5 0.48 0.12 0.48 0.13 0.50 0.05 0.49 0.05 0.50 0.03 0.50 0.03q4 0.2 0.20 0.05 0.19 0.06 0.20 0.02 0.20 0.02 0.20 0.01 0.20 0.01ση 0.18 0.24 0.11 0.13 0.09 0.11φ 0.39 0.34 0.33 0.31 0.29 0.28



Generating SV



q1 1.0 1.03 0.21 1.12 0.52 1.04 0.08 1.01 0.08 1.05 0.06 1.00 0.06q2 0.5 0.51 0.13 0.56 0.31 0.53 0.06 0.51 0.06 0.53 0.04 0.51 0.04q4 0.2 0.20 0.06 0.22 0.12 0.21 0.02 0.20 0.03 0.21 0.02 0.20 0.02ση 0.2 0.24 0.23 0.19 0.09 0.17 0.06φ 0.9 0.70 0.32 0.87 0.13 0.90 0.07



See table 2 for a description of the entries in the table.

32

time series models with a common stochastic variance for

Documents