time series models with a common stochastic variance for
TRANSCRIPT
Time series models with a common stochastic variance
for analysing economic time series
Siem Jan Koopman∗ and Charles S. Bos †
Department of Econometrics, Free University Amsterdam
and Tinbergen Institute Amsterdam
October 21, 2002
Abstract
The linear Gaussian state space model for which the common variance is treated as a
stochastic time-varying variable is considered for the modelling of economic time series. The
focus of this paper is on the simultaneous estimation of parameters related to the stochastic
processes of the mean part and the variance part of the model. The estimation method
is based on maximum likelihood and it requires the subsequent uses of the Kalman filter
to treat the mean part and sampling techniques to treat the variance part. This approach
leads to the evaluation of the exact likelihood function of the model subject to simulation
error. The standard asymptotic properties of maximum likelihood estimators apply as a
result. A Monte Carlo study is carried out to investigate the small-sample properties of the
estimation procedure. We present two illustrations which are concerned with the modelling
and forecasting of two U.S. macroeconomic time series: inflation and industrial production.
Keywords: Autoregressive integrated moving average; Importance sampling;
Kalman filter; Monte Carlo simulation; Simulation smoothing;
State space; Stochastic volatility; Unobserved components time
series.
JEL classification: C15, C32, C51, E23, E31
∗Department of Econometrics, De Boelelaan 1105, NL-1081 HV Amsterdam. Email: [email protected].†visiting Nuffield College, Oxford OX1 1NF, UK. Email: [email protected].
1 Introduction
The linear Gaussian state space model has become one of the standard modelling frameworks for
the empirical analysis of economic time series. It is also used for forecasting and signal extraction
in other fields such as engineering (from where it originates), statistics and empirical finance. The
role of the state space framework for modelling macroeconomic time series and analysing business
cycles is discussed in the textbooks of Harvey (1989), Hamilton (1994) and Kim and Nelson (1999).
Other discussions of state space approaches to time series analysis can be found in the books of
Brockwell and Davis (1987) and Shumway and Stoffer (2000). A recent account of the statistical
analysis of the linear Gaussian state space model together with several non-Gaussian and nonlinear
extensions is given by Durbin and Koopman (2001).
We propose a moderate generalisation of the standard linear Gaussian model: the common
variance of the state space model is allowed to be stochastic and time-varying. Let us consider the
stochastic local level model which is the simplest example of a state space model and is given by
yt = αt + εt, αt+1 = αt + ξt, t = 1, . . . , n, (1)
with observation yt and unobservable level αt which is modelled as a so-called random walk process.
The disturbances are normally distributed with
εt ∼ N (0, σ2), ξt ∼ N (0, σ2q2), t = 1, . . . , n,
where variance σ2 > 0 and signal-to-noise ratio q ≥ 0 are fixed and unknown. Further they are
serially and mutually uncorrelated for t = 1, . . . , n. For simplicity we assume that the random
walk process is initialised by a fixed and known value for α1. The forecast function of the local
level model is the exponentially weighted moving average recursion where the discount coefficient
only depends on the signal-to-noise ratio q. The common variance is represented by σ2. The
novelty is introduced by relaxing the assumption of σ2 being fixed and instead allowing it to be
stochastic and time-varying. Specifically, we replace σ2 by σ2t for which a time series process will
be formulated. For example, an autoregressive model for log σ2t can be taken with the specification
σ2t = exp(ht), ht+1 = (1− φ)d+ φht + σηηt, t = 1, . . . , n, (2)
with constant d = E(ht), autoregressive parameter |φ| < 1 and standard deviation ση > 0. The
disturbances ηt ∼ N (0, 1) are assumed to be serially uncorrelated and mutually uncorrelated with
the local level model disturbances εt and ξt at all time points. The initial value of the autoregressive
process is distributed as h1 ∼ N{d, σ2η/(1− φ2)}. A special case of the local level model (1) with
the common stochastic variance specification (2) is obtained by imposing the restriction q = 0 (so
αt+1 = αt = α) which reduces (1) and (2) to the basic stochastic volatility (SV) model which is
considered by, for example, Taylor (1994), Harvey, Ruiz, and Shephard (1994), Danielson (1994),
2
Jacquier, Polson, and Rossi (1994), Shephard and Pitt (1997) and Sandmann and Koopman
(1998). In this paper, however, we show that the stochastic level αt of the model with q > 0 can
be treated simultaneously with the common stochastic variance σ2t in a statistical analysis based
on maximum likelihood estimation.
Related models for this class of linear dynamic models with a common stochastic variance have
been proposed in the literature. An example is the contribution of Shephard (1994) that discusses
simultaneous inference of parameters related to stochastic mean and variance equations. The so-
called local scale model of Shephard does not consider the common variance as stochastic but it
treats the measurement variance as a stochastic variable. Further, the stochastic specification of
the variance is based on the gamma-beta transition model and the estimation method is different
from the maximum likelihood method presented in this paper. Other related contributions are
presented by Nabeya and Tanaka (1988) who consider a linear regression model with the constant
replaced by a random walk process and with the variances replaced by a deterministically time-
varying common variance for both the regression disturbance and the random walk innovation,
by Engle and Smith (1999) who present a model where the Stochastic Permanent Break (STOP-
BREAK) process mingles transitory shocks and permanent shifts randomly in a local level model
with a single GARCH error process, and by Bos, Mahieu, and van Dijk (2000) who carry out a
Bayesian analysis of a local level model with different stochastic and deterministic specifications
of variance processes for exchange rates.
Our proposed generalisation of the linear Gaussian state space model is partly motivated by
the fact that economic time series such as inflation, interest rates, production indices and other
monetary and financial series can be subject to time-varying heteroskedasticity that is potentially
difficult to model explicitly. A standard solution for dealing with heteroskedasticity in econometrics
is to specify a deterministic function of an exogenous variable, for example,
σ2t = σ2 exp(xt), t = 1, . . . , n,
where xt is an exogenous variable. More generally, a linear equation of explanatory variables can
be used to obtain
σ2t = exp(γ0 + γ1x1,t + . . .+ γkxk,t), t = 1, . . . , n,
where γ0, . . . , γk are fixed and unknown coefficients and x1,t, . . . , xk,t are exogenous explanatory
variables. The coefficients can be estimated by numerically maximising the loglikelihood function
of the underlying model. In practice it is hard to select appropriate exogenous variables that can
give an adequate description of the heteroskedasticity. Furthermore this solution is clearly not
practical for forecasting when no future values of explanatory variables are available.
The inclusion of a common stochastic variance can be regarded as a limited extension of the
local level model since the stochastic model for the common variance applies to all variances
3
associated with the disturbances of the local level model. However, the proposed generalisation
may be interesting for the following reasons. Firstly, many models used in practice including
the autoregressive integrated moving average model and the dynamic regression model have state
space representations with a single disturbance variable. In such cases the supposed restriction
does not exist. This also applies to the local level model (1) with a single disturbance (that is,
ξt = qεt for t = 1, . . . , n) as proposed by Ord, Koehler, and Snyder (1997). Secondly, when
models with multiple disturbances are considered such as the local level model (1), it may not be
straightforward to attribute the heteroskedasticity to a particular disturbance in practice. In such
cases it is reasonable to let the common variance of the model to be stochastic and time-varying.
The state space model with a common stochastic variance poses an estimation problem since
the loglikelihood function is not tractable by linear methods such as the Kalman filter due to
the nonlinearities caused by the stochastic variance equation. Similar considerations apply to
stochastic volatility models. Advanced but practical simulation methods have been developed
to compute the loglikelihood function for such models; see, for example, the references to the
stochastic volatility literature given earlier in this section. However, in the case of stochastic
volatility models, estimation is only considered for parameters related to the stochastic variance
and the deterministic mean equations. In this paper we consider the simultaneous estimation of
parameters related to both the stochastic mean and common stochastic variance equations by the
method of maximum likelihood. The key to the development in this paper is the notion that we
can isolate the common variance in the statistical treatment of the mean equation. Subsequently,
the common variance can then be treated as a stochastic variable and modelled separately as a
result. In the case of the local level model with a common stochastic variance, the new results
imply that we can consider the case of q > 0, rather than q = 0, and we are able to estimate q
simultaneously with parameters φ and σξ associated with the model of σt.
The extension of this model lead to a nonlinear ”weighted” Kalman filter in which the weights
associated with the observations yt are determined by the model for the common stochastic vari-
ance. This becomes clear when we write the local level model (1) with a common stochastic
variance as
yt = σt(α∗t + ε∗t ), α∗
t+1 = α∗t + ξ∗t ,
where α∗t = αt/σt, ε∗t = εt/σt and ξ∗t = ξt/σt such that ε∗t ∼ N (0, 1) and ξ∗t ∼ N (0, q2). Thus the
local level model for the weighted observations yt/σt has the constant q as the signal-to-noise ratio
and therefore the same forecast function as for model (1) with constant variances. However, the
loglikelihood function for observation y1, . . . , yn generated by model (1) with common stochastic
variance (2) depends on σt for t = 1, . . . , n. The estimation of q is therefore subject to the model
specification for σt. Further this leads to observation weights of forecast functions that change
stochastically over time as a function of σt. The same conclusion applies to observation weights
4
of estimated components such as for the level component αt
The remaining paper is organised as follows. The next section introduces the general model and
discusses some examples of models of interest in economics, finance and other fields of empirical
research. The estimation methodology is discussed in section 3. A Monte Carlo study is carried
out to investigate the small sample properties of the estimation method and the encouraging
results of this study are reported in section 4. In an empirical illustration in section 5 we show
that a joint stochastic model for the mean and the variance equations may provide a solid basis
for analysing monthly time series of U.S. inflation and of U.S. industrial production. The final
section discusses the contribution of this paper and it discusses further extensions which we regard
as future research.
2 Time series model with a common stochastic variance
2.1 General specification
The linear Gaussian state space model can be formulated with the inclusion of a common fixed
variance σ2, that is
yt = ct + Ztαt +Gtεt, εt ∼ N (0, σ2Irε), σ2 > 0,
αt+1 = Ttαt +Htεt, t = 1, . . . , n,(3)
where yt is a p× 1 vector of observations, ct is a p× 1 vector of fixed effects, αt is a mα × 1 vector
of unobserved states and εt is a rε × 1 vector of disturbances. The system matrices Zt, Tt, Gt
and Ht are assumed to be fixed for all time points t = 1, . . . , n. Unknown elements of the system
matrices will be treated as parameters to be estimated by the method of maximum likelihood.
The fixed effects vector ct is given by ct = c+Xc,tδc where c is a p×1 vector of constants while the
regression coefficients associated with the p×kc matrix Xc,t of explanatory variables are collected
in the kc × 1 vector δc. Initially we assume that α1 ∼ N (a1, σ2P1) where a1 and P1 are known;
later we will consider cases where elements of α1 are generated by a diffuse density. Similar state
space formulations are discussed in De Jong (1989).
The first equation is the observation equation in which the p×mα matrix Zt selects (or weights)
the appropriate elements of the state vector αt relevant for elements in yt. Similarly, the p × rε
matrix Gt selects the appropriate elements of the disturbance vector εt. The second equation is
the state equation with the mα ×mα transition matrix Tt and the mα × rε disturbance selection
matrix Ht. For many practical time series models the system matrices are time-invariant.
The time series state space model (3) can be extended by replacing σ2 with the time-varying
stochastic variable σ2t so that
εt ∼ N (0, σ2t Ir), σ2
t = exp (ht) , t = 1, . . . , n, (4)
5
where ht is a scalar and treated as the common log-variance of the model. The time series process
of the stochastic variable ht is formulated by the state space representation
ht = dt +Atβt + Ctηt, ηt ∼ N (0, Irη),
βt+1 = Btβt +Dtηt, t = 1, . . . , n,(5)
where scalar dt is a fixed effect, βt is an mβ × 1 vector of unobserved states for the log-variance
equation and ηt is the rη × 1 vector of disturbances. The log-variance system vectors At and Ct
and system matrices Bt and Dt are assumed to be fixed for t = 1, . . . , n. In section 3 we will show
how unknown elements in these system matrices can be estimated by the method of maximum
likelihood. The fixed effect dt can be modelled in the same way as ct but with specification
dt = d+Xd,tδd where d is a constant, Xd,t is a 1×kd vector of explanatory variables and δd is the
kd×1 regression coefficient vector. The initial log-variance state vector is given by β1 ∼ N (b1, Q1)
where initially b1 and Q1 are assumed known.
2.2 Gaussian linear model with fixed time-varying variances
Let us first consider the time series model (3) where the common variance is deterministic, that is
σ2t = exp(d+Xd,tδd) for t = 1, . . . , n. A special case of the general model is the regression model
that is considered by Nabeya and Tanaka (1988) and is given by
yt = Xc,tδc + σt(µt + εt), εt ∼ N (0, σ2ε),
µt+1 = µt + ξt, ξt ∼ N (0, σ2ξ ),
(6)
with stochastically time-varying constant µt and deterministically time-varying common standard
deviation σt for t = 1, . . . , n. They used this model for developing asymptotic tests for parameter
constancy over time. This special case of the linear Gaussian state space model can be analysed
using the Kalman filter and the associated methods of maximum likelihood estimation, diagnostic
checking, signal extraction and forecasting; see, for example, Harvey (1989), Shumway and Stoffer
(2000) and Durbin and Koopman (2001, Part I). These treatments also allow for cases where other
elements of the system matrices vary over time deterministically.
Model (6) for which the common variance σ2 is allowed to vary over time is considered in this
paper and maximum likelihood estimation of its parameters is treated generally in section 3.
2.3 Stochastic volatility model
Another special case of the general model (3), (4) and (5) is obtained by taking αt = 0 such that
the mean specification is fixed and that only the variance specification of the model is stochastic.
The restriction leads to the specification of a stochastic volatility (SV) model of the form
yt = ct + σεεt, εt ∼ N (0, σ2t ), σ2
t = exp(ht), t = 1, . . . , n, (7)
6
where σε replaces Gt in equation (3). The log-volatility ht can be modelled as the stationary
autoregressive process (2) that we have as a special case of (5) with dt = (1−φ)d, At = 1, Bt = φ,
Ct = 0, Dt = ση > 0, b1 = d, Q1 = σ2η/(1 − φ2) and with |φ| < 1. Various contributions have
appeared in the literature for estimating SV models by maximum likelihood using importance
sampling techniques; see Danielson (1994), Shephard and Pitt (1997), Durbin and Koopman
(1997), Sandmann and Koopman (1998) and Durham and Gallant (2002). Other methods of
inference have also been considered in the context of SV models including Bayesian methods,
e.g. Kim, Shephard, and Chib (1998), and efficient method of moments procedures, e.g. Gallant,
Hsieh, and Tauchen (1997).
2.4 ARMA models with stochastic variances
It is well-known that the autoregressive moving average (ARMA) model can be represented in
state space. The general model allows the ARMA(p∗, q∗) model to have a stochastic variance
specification, that is
yt = ϕ1yt−1 + . . .+ ϕp∗yt−p∗ + εt + θ1εt−1 + . . .+ θq∗εt−q∗ , εt ∼ N (0, σ2t ), (8)
for t = 1, . . . , n and with orders p∗ ≥ 0 and q∗ ≥ 0. The log-variance ht = log σ2t can be
modelled by an autoregressive process such as in (2) or by any other time series process that can
be represented in the general state space form (5).
2.5 Unobserved components models with common stochastic variances
The local level model (1) is a special case of the unobserved components time series model that
consists of the level αt and the irregular εt as the two unobservables. More elaborate specifications
within this class of time series models are discussed in Harvey (1989). The so-called basic structural
time series model consists of trend, seasonal and irregular components. This model with a common
stochastic variance is given by
yt = µt + γt + q1ε1t,
µt+1 = µt + βt + q2ε2t,
βt+1 = βt + q3ε3t,
S(L)γt+1 = q4ε4t,
(9)
for t = 1, . . . , n and where S(L) is the seasonal sum operator 1+L+ . . .+Ls−1 for seasonal length
s and lag operator L such that Lyt = yt−1. Variable εit is the i-th element of the disturbance
vector εt which is distributed as in (4) and with the common log-variance given by (5), for i =
1, . . . , 4. The unknown fixed coefficients q1, . . . , q4 are unknown and can be estimated by maximum
7
likelihood. The variances are uniquely identified in this model provided that the common log-
variance ht follows a stationary process with zero mean and some constant variance σ2h. In this
case we note that the variance of the ith disturbance is q2i σ2t with unconditional expectation
q2i exp{2E(ht) + var(ht)} = q2i expσ2h,
for i = 1, . . . , 4. When ht is modelled as a nonstationary process, the restriction of the zero mean
for this process is replaced by the restriction that h0 = 0 is fixed.
The state space representation of the mean equation of the model (9) for quarterly data is
based on the 5× 1 state vector
αt =(µt βt γt γ2t γ3t
)′,
where γ2t and γ3t are auxiliary variables required for the formulation of the seasonal component
in state space. Other specifications for the components, in particular the seasonal component, can
be considered and they are discussed in detail by Harvey (1989). We finally note that the initial
state vector is diffuse and requires specific modifications for the analysis which are discussed in
section 3.5.
2.6 Single source of error models
Ord, Koehler, and Snyder (1997) consider unobserved components models such as the ones dis-
cussed in the previous section but with ε1t = ε2t = ε3t = ε4t; they refer to this class of models
as single source of error models. Nonlinear extensions of this class of models are also considered.
They argue that the single source of error models cover a larger extent of the parameter space
in the stationary representation of the model and that it therefore will produce more accurate
model-based forecasts. In a general treatment of time series models with correlated disturbances,
Harvey and Koopman (2000) argue that single source of error models (or models with perfectly
correlated disturbances) are not appropriate for signal extraction. It is noted that the common
stochastic variance discussed in this paper apply naturally to the single source of error models.
A specific example of a nonlinear single source of error model is discussed by Engle and Smith
(1999) and is known as the stochastic permanent break (STOPBREAK) model. In its simplest
form the model is given by
yt = µt + qtεt, µt+1 = µt + εt, t = 1, . . . , n, (10)
where qt is a function of past realisations of εt. The STOPBREAK model can be casted into
the general specification given earlier in section 2.1 when qt is deterministic and not a function of
lagged εt’s and when σ2 is replaced by the common stochastic variance σ2t as modelled by (4) and
(5).
8
3 Estimation by maximum likelihood
In this section we develop a method for computing the Monte Carlo estimate of the loglikelihood
function for state space models with common stochastic variance. In the standard situation, with
deterministic variances, the density p(y|ψ) of the data can be computed through a prediction-error
decomposition (see section 3.1). The loglikelihood is then the sum of the individual contributions
of the prediction errors vt. With stochastic variance however, the Kalman equations cannot be
used for extracting the vt’s as the variances are not available in the filtering equations. Only
after conditioning on the common stochastic variance can the loglikelihood be computed from the
output of the Kalman filter. Importance sampling is used to adjust for the bias introduced by
conditioning, as explained in subsequent sections.
3.1 Kalman filter
Consider model (3), (4) and (5) and define vector ψ as the collection of unknown elements
of the system matrices for the mean and variance equations. Given a realised sequence for
σ∗ = (σ1, . . . , σn)′, the joint density of the observations y = (y′1, . . . y′n)′ can be obtained via
the prediction error decomposition. In particular, the conditional logdensity of y is given by
log p(y|σ∗;ψ) = −nN2
log 2π − 12
n∑t=1
(log |σ2
tFt|+ σ−2t v′tF
−1t vt
), (11)
where vt is the vector of one-step ahead prediction errors and σ2tFt is its variance matrix; see
Schweppe (1965) and Harvey (1989). Both vt and Ft are produced by the Kalman filter as given
by
vt = yt − ct − Ztat, Ft = ZtPtZ′t +GtG
′t,
Kt = TtPtZ′tF
−1t , (12)
at+1 = Ttat +Ktvt, Pt+1 = TtPt(Tt −KtZt)′ +HtH′t,
for t = 1, . . . , n and where a1 and P1 are known values and represent the unconditional mean
and variance of the initial state vector, respectively. The vector at is the linear estimator or
predictor of the state αt conditional on {y1, . . . , yt−1} with variance matrix Pt, for t = 1, . . . , n.
The intermediate matrix Kt is known as the Kalman gain. Proofs of the Kalman filter are given by
Anderson and Moore (1979) and Durbin and Koopman (2001, §4.2.1). We note that the Kalman
filter equations as given by (12) do not depend on σ∗. This implies that vt and Ft do not change
when σ∗ vary for t = 1, . . . , n.
The innovations vt can be regarded as linear transformations of y for a given ψ; see Harvey
(1989). The consequence of the well-known Gaussian properties is that the innovation vectors are
9
also normally distributed and given by
vt ∼ N (0, σ2tFt), t = 1, . . . , n. (13)
It is further noticed that the Jacobian of the transformation from y to v = (v′1, . . . v′n)′ is unity.
Therefore the density of y is the same as the density of v, that is
p(y|σ∗;ψ) = p(v|σ∗;ψ),
where we can take v as being generated by the Gaussian model (13) for given σ∗.
The key to the simultaneous estimation of parameters related to the stochastic mean and
variance equations of model (3)–(5) is the result that the statistical properties of the innovations
can be summarised by (13). This shows that σ2t is the common denominator of the variance matrix
of vt. When σ2t is the constant σ2 in model (3), it is a well-known fact that σ2 can be concentrated
out of the loglikelihood function and that equation (13) holds for σ2t = σ2; see Harvey (1989). The
maximum likelihood estimate of σ2 is then simply given by
σ2 =1n
n∑t=1
v′tF−1t vt.
3.2 Importance sampling
When σt follows a stochastic process as given by (4) and (5) we can not express the Gaussian
density p(y;ψ) in analytical terms. This also applies to density p(v;ψ) associated with model
(13). The difficulty is well recognised in the econometric and statistical literature on stochastic
volatility models; see, for example, the overview articles by Ghysels, Harvey, and Renault (1996)
and Shephard (1996). We note that model (13) with σt modelled as (2) can be regarded as the SV
model for the innovations vt. A standard tool in evaluating non-tractable densities is importance
sampling and is applied to SV models by Danielson (1994), Shephard and Pitt (1997), Sandmann
and Koopman (1998) and Durham and Gallant (2002). The same approach will be taken in this
paper. The technique involves approximating the solution of the density via averages of simulations
from an approximating model. Importance sampling was first used in econometrics by Kloek and
Van Dijk (1978) in their work on computing posterior densities. Geweke (1989) proposes to use
the Lindeberg-Levy central limit theory in order to assess the accuracy of the importance sampler.
Koopman and Shephard (2002) discuss practical procedures based on extreme value theory for
empirically testing the asymptotic normality of the importance sampling estimator.
Given the innovations and their scaled variance matrices (both can be computed without
observing σ∗), we express the density of the observations as
p(y;ψ) =∫p(y|σ∗;ψ)p(σ∗;ψ)dσ∗
=∫p(v|σ∗;ψ)p(σ∗;ψ)dσ∗, (14)
10
where p(v|σ∗;ψ) is the density of model (13) as given by (11) for a realised value of σ∗. Given a
realised value of y, the evaluation of the resulting likelihood function (14) via importance sampling
is based on an importance sampling density g(σ∗|v;ψ) from which it is relatively easy to simulate
from. For considerations of feasibility, the importance density g(·) is chosen to be close to the
density p(·). The observation density associated with the true model can then be represented by
p(y;ψ) =∫p(v|σ∗;ψ)p(σ∗;ψ)
g(σ∗|v;ψ)g(σ∗|v;ψ)dσ∗, (15)
which we approximate via importance sampling. This requires generating Monte Carlo simulations
from the importance density g(σ∗|v;ψ) to obtain the estimator
p(y;ψ) =1M
M∑i=1
p(v|σ∗(i);ψ)p(σ∗(i);ψ)g(σ∗(i)|v;ψ)
, (16)
where σ∗(i) is a realisation from the importance density g(σ∗|v;ψ). The Monte Carlo approxima-
tion (16) of the true density (15) will be based on a Gaussian importance density g(σ∗|y;ψ) in
this paper. We note that the estimator (16) is only subject to simulation error.
The importance density g(σ∗|y;ψ) = g(σ∗|v;ψ) for model (13) can be based on the approxi-
mating linear Gaussian state space model
vt = ht + ut, ut ∼ NID(rt, st), t = 1, . . . , n, (17)
where vt is obtained from the time series y1, . . . , yn by the Kalman filter as described in §3.1
and where ht is given by the model specification for ht = log σ2t as in (5). The time-varying
mean rt and variance st are identified via a recursive procedure for which the details are given
by Shephard and Pitt (1997) and Durbin and Koopman (2001). Generating simulations from the
importance density g(σ∗|v;ψ) are in effect based on the approximating model (17) and is referred
to as simulation smoothing. Various simulation smoothing methods exist to compute conditional
draws from the linear Gaussian state space model. A recent account of simulation smoothing
together with the development of a new, simple and efficient device for this is given by Durbin
and Koopman (2002).
In practice, focus is on the logdensity and in the case of a Gaussian importance density it can
be shown that
log p(y;ψ) = log g(v;ψ) + log
[1M
M∑i=1
p(v|σ∗(i);ψ)g(v|σ∗(i);ψ)
]+O(M−3/2), (18)
where log g(v;ψ) is the logdensity of the approximating model that can be evaluated by the Kalman
filter; the details of this result are given by Durbin and Koopman (1997).
3.3 Estimation procedure
It has been shown that we can evaluate the loglikelihood function for a linear Gaussian model
with a common stochastic time-varying variance using importance sampling methods. The full
11
procedure is relatively easy to implement and can be summarised by the following five steps:
1. Apply the Kalman filter to model (3) without any consideration of σ21 , . . . , σ
2n; store vt and
Ft for t = 1, . . . , n.
2. Consider model (13) and obtain an approximating model; compute the loglikelihood function
of the approximating model using the Kalman filter and generate M samples from the
resulting Gaussian importance density g(σ∗|y;ψ) using the device of Durbin and Koopman
(2002).
3. Compute the Monte Carlo estimate of the loglikelihood function given by
log `(y;ψ) = log g(v;ψ) + log
[1M
M∑i=1
p(v|σ∗(i);ψ)g(v|σ∗(i);ψ)
].
4. Maximise this loglikelihood with respect to ψ; the evaluations of the Monte Carlo estimator
log `(y;ψ) for different values of ψ is based on the same set of random numbers for the
sampling of σ∗ from the Gaussian density g(σ∗|y;ψ) to ensure a smooth loglikelihood function
in ψ.
This procedure is numerically stable and is implemented for the object-oriented matrix program-
ming language Ox of Doornik (1999) using the state space functions in the Ox library SsfPack as
documented by Koopman, Shephard, and Doornik (1999). The programs written for this paper
can be obtained from www.ssfpack.com.
3.4 Signal extraction and forecasting
The state vector αt and disturbance vector εt of model (3) with known σ2t can be estimated by
standard state space methods; see Durbin and Koopman (2001). For example, state smoothing
recursions exist for the linear Gaussian state space model that compute αt = E(αt|y) and Vt =
var(αt|y) for t = n, . . . , 1. In the case of a stochastic time-varying process for σt we need to take
account of the variation of σt because a realisation of αt depends partially on σt. In a similar way
as for the construction of the likelihood function, we can develop an importance estimator for αt
which is given by
αt = ω−1M∑i=1
ωiα(i)t , ωi =
p(v|σ∗(i);ψ)g(v|σ∗(i);ψ)
, ω =M∑i=1
ωi, (19)
where α(i)t is the smoothed estimate E(αt|y) for αt in model (3) with σ replaced by time-varying
standard deviations in the n× 1 vector σ∗ = σ∗(i). The smoothed state variance is given by
var(αt|y) = ω−1M∑i=1
ωiV(i)t ,
12
where V(i)t is the smoothed state variance Vt for model (3) with σ replaced by time-varying
standard deviations in σ∗ = σ∗(i).
In a similar fashion we can compute the filtered estimate of the state vector E(αt|y1, . . . , yt).
However, the importance sampling weights ωi are ratios of densities of v which is a linear trans-
formation of y. This estimate can therefore not be properly defined as a filtered estimate. Other
simulation techniques such as Monte Carlo filtering (or particle filtering) that is reviewed in Doucet,
deFreitas, and Gordon (2000) should be employed for filtering. These techniques shall not be con-
sidered in this paper. Finally, forecasts can be computed by importance estimation techniques
since forecasts are conditional on y.
Algorithms for computing observation weights for the construction of filtered and smoothed
estimates of the state vector are reviewed in Harvey and Koopman (2000) for the linear Gaussian
unobserved components model. These weights can also be computed for models with a common
stochastic variance and they can be constructed in a similar way as in (19). In particular, the
observation weights of the estimated state vector αt are given by
wt,j = ω−1M∑i=1
ωiw(i)t,j , j = 1, . . . , n,
where w(i)t,j is the jth observation weight of the smoothed state αt for the model (3) with σ replaced
by time-varying standard deviations in σ∗ = σ∗(i).
3.5 Diffuse initialisation
Time series models with nonstationary dynamic specifications require diffuse initial conditions of
the state vector when such models are casted in state space form. The consequences of diffuse initial
conditions for Kalman filtering and smoothing have been discussed elsewhere; see, for example,
the treatment given by Durbin and Koopman (2001, Chapter 5) and the references therein. In
the case of simulation smoothing, diffuse initialisations can be accounted for in a straightforward
fashion for which the details are given by Durbin and Koopman (2002).
4 Small-sample properties: A Monte Carlo study
Since the unknown parameters are estimated by exact maximum likelihood, subject to simulation
error, the asymptotic properties of the estimators apply as usual. Given the numerical involvement
of the estimation procedure, it is interesting to investigate the finite sample properties of the
estimators. We therefore carry out a Monte Carlo study.
13
4.1 Design of the Monte Carlo study
We consider the following three different model specifications for the mean equation: the stationary
autoregressive moving average (ARMA) model (8), the local level model (1) and the unobserved
components model (9). Further, the common stochastic variance (CSV) is based on the log-
variance ht = log σt which is either fixed at zero, ht = 0, or is modelled by the autoregressive model
(2). The different data generation processes (DGPs) and their parameter values are presented in
Table 1.
Table 1: Data generation processes of the Monte Carlo study
No CSV CSV
ARMA yt as in (8) with p∗ = q∗ = 1, yt as in (8) with p∗ = q∗ = 1,
ϕ1 = 0.8, θ1 = −0.6, ϕ1 = 0.8, θ1 = −0.6,
ht = 0. ht as in (2) with d = 0, φ = 0.9, ση = 0.2.
LL yt as in (1) with q = 0.5, yt as in (1) with q = 0.5,
ht = 0. ht as in (2) with d = 0, φ = 0.9, ση = 0.2.
BSM yt as in (9) with s = 4, q3 = 0, β1 = 0, yt as in (9) with s = 4, q3 = 0, β1 = 0,
q1 = 1, q2 = 0.5, q4 = 0.2, q1 = 1, q2 = 0.5, q4 = 0.2,
ht = 0. ht as in (2) with d = 0, φ = 0.9, ση = 0.2.
For each DGP, M = 500 series are generated with three different number of observations:
n = 100, n = 500 and n = 1000. Each generated series is estimated twice using two different model
specifications. Both models adopt the “true” mean equation of the DGP while the common log-
variance is either fixed at zero (see column No CSV) or stochastic and modelled as an autoregressive
process (see column CSV). In the remaining part of this section we discuss the simulation results
that are based on this simulation design and that are presented in the Tables 2–8.
4.2 ARMA simulation results
The simulation results for the ARMA model are presented in Table 2 which consists of two panels:
the first panel considers the DGP without a common stochastic variance (No CSV) and the second
panel considers the DGP with CSV. Each panel has four vertical blocks of which the last three
are associated with the three different sample sizes (100, 500 and 1000). In each of these blocks a
summary of the estimation results are presented for the M = 500 simulated series without CSV
and with CSV. The sample mean of the estimated parameters for the M series is given together
with its sample standard deviation. Further, averages of two diagnostic statistics are reported
together with the percentages of M diagnostics that have not passed the corresponding critical
value. The first diagnostic test statistic is the normality test of Doornik and Hansen (1994),
14
which is an adapted version of the test for normality of Bowman and Shenton (1975), and is
χ2 distributed with two degrees of freedom. The second diagnostic is the heteroskedasticity test
H(h) =∑T
t=T−h+1 v2t /
∑d+1+ht=d+1 v
2t statistic and is Fh,h distributed with h ≈ n/3. Finally the
average value of the M maximised log-likelihood values is reported for both estimated models in
each block.
Table 2: Small sample results for the ARMA modelGenerating no SV
n = 100 n = 500 n = 1000Estimating No SV SV No SV SV No SV SV
p p s.d. p s.d. p s.d. p s.d. p s.d. p s.d.
ϕ 0.8 0.72 0.24 0.70 0.27 0.79 0.08 0.78 0.08 0.79 0.05 0.79 0.05ϑ −0.6 −0.53 0.28 −0.51 0.30 −0.59 0.10 −0.59 0.11 −0.60 0.07 −0.59 0.07σARMA 1 0.98 0.07 0.96 0.08 1.00 0.03 0.99 0.04 1.00 0.02 1.00 0.02σSV 0.16 0.23 0.11 0.13 0.09 0.11φSV 0.36 0.33 0.37 0.31 0.33 0.27
Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej
Normality-DH 2.09 0.07 2.03 0.07 2.00 0.05 1.90 0.05 2.04 0.06 1.88 0.05H(n/3) 1.10 0.35 1.10 0.34 1.02 0.14 1.02 0.13 1.01 0.04 1.01 0.06LL -139.70 -139.06 -707.35 -708.98 -1416.60 -1419.20
Generating SV
n = 100 n = 500 n = 1000Estimating No SV SV No SV SV No SV SV
p p s.d. p s.d. p s.d. p s.d. p s.d. p s.d.
ϕ 0.8 0.70 0.30 0.69 0.29 0.78 0.09 0.78 0.09 0.79 0.05 0.79 0.05ϑ −0.6 −0.50 0.33 −0.49 0.33 −0.59 0.12 −0.58 0.11 −0.59 0.07 −0.59 0.07σARMA 1 1.03 0.12 0.97 0.12 1.05 0.06 1.00 0.06 1.05 0.04 1.00 0.04ση 0.2 0.28 0.25 0.22 0.10 0.21 0.08φ 0.9 0.64 0.34 0.84 0.16 0.87 0.08
Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej
Normality-DH 4.04 0.23 2.49 0.09 10.90 0.56 2.28 0.05 20.89 0.84 2.85 0.10H(n/3) 1.23 0.41 1.07 0.33 1.04 0.27 1.01 0.02 1.02 0.20 1.00 0.01LL -144.37 -142.84 -732.86 -727.81 -1469.69 -1453.81
Reported are the average value of the estimated parameters (p) together with the standard devia-tions (s.d.) of the estimates. The average value of the Normality (DH) and heteroskedasticity (H)test statistics are reported, together with the fraction of rejections (fRej), using a significance levelof 5%. Furthermore, the average loglikelihood value (LL) is given.
The true DGP model that forms the basis of the results in Table 2 is the ARMA model (8)
with p∗ = q∗ = 1 and with coefficients ϕ = 0.8, ϑ = −0.6 and σ2t = 1 for t = 1, . . . , n for the
case without CSV. Estimating the coefficients using a realised time series from the true model
may be difficult due to the existence of strong (negative) correlation between estimates of ϕ and ϑ
caused by possible root cancellations. For the DGP with CSV we consider model (2) for σ2t with
coefficients d = 0, φ = 0.9 and ση = 0.2.
For the smaller sample sizes, estimates of ϕ and ϑ have relatively large standard errors but
they become smaller for larger samples. The estimated coefficients for the mean equation of the
model are not biased irrespective of the specification of the common variance part of the model.
The estimated coefficients for the variance equation of the model tend to be insignificant when
15
the DGP is without CSV.
When the DGP has a fixed variance (No CSV), the normality test is rejected in about 5% of
the cases, so the size of the test is correct. When the DGP includes the CSV specification but it
is not estimated, normality is rejected in 23% of the cases for n = 100 and in 84% of the cases for
n = 1000. The heteroskedasticity test appears to be a less reliable test statistic for detecting of
model misspecification due to the omission of a common stochastic variance. When CSV is both
generated and estimated, the size of the normality test is also not correct for all sample sizes; the
rejection rate of the normality test can be 10%.
The estimation of state space models with a common stochastic variance is more demanding
compared to the estimation of standard state space models. However, the extra computational
burden due to the use of simulation methods is not excessive. As an overall conclusion we empha-
size that estimation results can be obtained quickly and the precision is good.
4.3 Other simulation results
A Monte Carlo study has also been carried out for the class of unobserved components models
with and without a common stochastic variance equation. Tables 7 and 8 (included at the end of
this paper) report the results for DGPs based on model (9). Table 7 concerns the local level model
(1) with σ2 = 1 and q = 0.5 or, equivalently, the basic structural model (9) with q1 = 1, q2 = 0.5
and βt = γt = 0 for t = 1, . . . , n. The DGP with CSV has the same specification for the common
stochastic variance σ2t as for the ARMA model of the previous section. Table 8 concerns the local
level model with a stochastic seasonal component, that is model (9) with q3 = 0 and β1 = 0 such
that βt = 0 for t = 1, . . . , n. The parameter values for this DGP are selected as q1 = 1, q2 = 0.5
and q4 = 0.2. The same specification for the CSV is taken as for the other models.
The Monte Carlo results for the two models produce similar results as for the ARMA model
reported in the previous section. It is surprising to see that when the DGP is taken with no CSV
(that is, ht = 0) and estimation is based on the model with CSV, the estimated coefficients of
the mean equation have no bias while the estimated coefficients of the variance equation appear
not to be significant. The reverse case (CSV is part of DGP but it is not estimated) does also
produce unbiased estimates for the mean equation but in this case the normality test statistics
are unsatisfactory and on these grounds most of the estimated models would have correctly been
rejected when sample sizes are sufficiently large. The heteroskedasticity tests are also unreliable
in this context.
16
5 Analysing macro-economic time series: two illustrations
5.1 U.S. monthly inflation rate
We consider the first difference of the logarithm of the monthly U.S. CPI index1 between 1957:1
and 2001:9 and is taken as the indicator of U.S. inflation. The data is presented in figure 1. In
terms of an unobserved components model, the time series plot of inflation suggests the inclusion
of level and seasonal components. Therefore we base our analysis on model (9) with s = 12 and
βt = 0 for t = 1, . . . , n and n = 537. The common variance specification for σ2t is based on the
log-variance specification (2) with d = 0. The parameter vector ψ for this model is given by
ψ = (q1, q2, q4, φ, ση)′,
where the coefficients q1, q2 and q4 refer to equation (9) and φ and ση refer to (2). The mean
equation requires a diffuse initial condition for the state vector.
20406080
100120140160180200
1960 1970 1980 1990 2000
(i)
-0.6-0.4-0.2
00.20.40.60.8
11.21.4
1960 1970 1980 1990 2000
(ii)
Figure 1: (i) The U.S. monthly consumer price index (CPI) and (ii) the difference of log CPI
(inflation)
The estimation results reported in table 3 have been obtained by numerically maximising the
loglikelihood function which is computed either by the Kalman filter only (for the fixed variance
case) or by the Kalman filter and importance sampling methods described in section 3 (for the
general model with a common stochastic variance). The numerical optimisation is carried out by
the BFGS method as implemented in the the Ox functions library of Doornik (1999). In all cases
convergence was obtained rapidly and required 10 to 50 loglikelihood evaluations depending on
the specification of the model and the starting value of the parameter vector.
The parameter estimates of three model specifications are reported in table 3 together with 95%
(asymmetric) confidence intervals of the estimates and some diagnostic test statistics. The final
rows report the normality test of Doornik and Hansen (1994), the Box-Ljung Q test of Ljung and
Box (1978) for residual correlation (with 21 degrees of freedom for the LL-Seas model, 14 degrees1Source: Bureau of Labor Statistics, series CUUR0000SA0L1E, with the city average CPI index, all items less food
and energy, not seasonally adjusted.
17
Table 3: Estimation results of models for the U.S. monthly inflation rate
Parameter LL-Seas LL-Seas-CSV LL-Seas-CSV-δ
q1 (s.d. irr) 0.1579 [0.144, 0.173] 0.2493 [0.197, 0.316] 0.2892 [0.235, 0.356]q2 (s.d. level) 0.0474 [0.035, 0.063] 0.0554 [0.038, 0.080] 0.0538 [0.037, 0.078]q4 (s.d. seas) 0.0249 [0.018, 0.035] 0.0703 [0.052, 0.096] 0.0853 [0.064, 0.115]ϕ (CSV) 0.9935 [0.961, 0.999] 0.9994 [0.988, 1.000]ση (CSV) 0.2222 [0.159, 0.311] 0.0809 [0.040, 0.164]δL, 1974:2 0.5605 [0.372, 0.749]δL, 1974:11 −0.4744 [−0.667, −0.282]δI, 1980:7 −1.2057 [−1.627, −0.785]δL, 1981:9 −0.5260 [−0.794, −0.258]δL, 1982:8 −0.3683 [−0.643, −0.094]
LL 85.52 146.33 179.18Test statistic
Normality-DH 102.085 (0.000) 0.354 (0.838) 0.930 (0.628)Q 54.474 (0.000) 26.204 (0.125) 29.553 (0.009)ARCH 4.904 (0.270) 7.097 (0.229) 8.121 (0.215)H(n/3) 0.586 (1.000) 0.693 (0.992) 0.719 (0.985)Time 0.97 1:03.72 3:53.88
Parameter estimates with 95% confidence intervals of three models for U.S. inflation 1957-2001. The three models consist of a combination of level, seasonal plus irregular components(LL-Seas), common stochastic variance components (LL-Seas-CSV) and interventions, (LL-Seas-CSV-δ), respectively. The loglikelihood (LL) value and various diagnostic test statistics(with p-values) are also reported. The last row reports the estimation time.
of freedom for the LL-Seas-CSV-δ model), the ARCH-test of Engle (1982) for correlation in the
squared residuals and the heteroskedasticity test statistics H(n/3) together with the corresponding
p-values between square brackets. More details of the test statistics have been given in section 4.
In the second column, labelled as LL-Seas, the estimation results for the local level model
plus a stochastic seasonal component and with a fixed common variance are presented. The test
statistics based on the standardised residuals reveal that the model does not capture all dynamic
properties of the time series accurately. The residuals have relatively large values in the year 1974
and in the earlier years of the 1980s than in the other years. We therefore consider the inclusion
of a common stochastic variance such that the overall variance of the model can be different for
different time periods in the sample.
The third column with label LL-Seas-CSV in table 3 reports the estimated parameters for the
model with a common stochastic variance. This specification produces satisfactory results and,
based on the diagnostic test statistics, we conclude that a more appropriate model for the U.S.
inflation is found compared to model LL-Seas. Some concern is raised by the relative large value
for the Box-Ljung test but we emphasise that we only consider a simple parsimonious model for
inflation. In fact, it is surprising that the inclusion of the common stochastic variance can improve
the fit dramatically for such a simple model. Similar gains may be obtained when richer dynamic
mean specifications for the U.S. inflation series are considered.
18
The autoregressive parameter φ of the common stochastic variance is estimated at a value of
0.9935 which is close to the unit root. This type of near-random walk behaviour does not cause a
problem for the estimation method employed and may be caused by neglecting outliers and shifts
in the inflation series. To investigate the robustness of the new model and to ensure that our
results do not rely on a few a-typical observations we have included some dummy variables that
take account of an outlier in July 1980 and some possible level shifts in the inflation series. The
estimation results can be found in the column labelled as LL-Seas-CSV-δ and we observe that a
further increase of the loglikelihood value is obtained. The estimated value of φ is even closer to
unity. Further, even though all breaks are significant, the diagnostics do not indicate that the
model with dummy variables for outliers and breaks is superior compared to the model without
the dummy variables. We do not continue the analysis by considering other possible specifications
for the common variance such as a random walk model or a stochastic spline model.
-4
-3
-2
-1
0
1
2
3
60 70 80 90 00
(i)
-0.2-0.15
-0.1-0.05
00.05
0.10.15
0.20.25
0 10 20 30 40 50 60
(ii)
0
0.2
0.4
0.6
0.8
1
60 70 80 90 00
(iii)
Figure 2: Prediction residuals for the estimated LL-Seas-CSV-δ model: (i) standardised residuals;
(ii) autocorrelogram of residuals (continuous lines) and squared residuals; (iii) scaled cumulative
sum of squared residuals.
Figure 2 presents graphs of the standardised prediction residuals together with the autocorrela-
tion functions of the residuals and their squares and the cumulative sum (CUSUM) of the squared
residuals. These graphical diagnostic tests are based on one-step ahead prediction residuals and
they raise no concern with respect to the appropriateness of the LL-Seas-CSV-δ model for the
inflation series. From these results, we may conclude that the common stochastic variance is an
important characteristic of U.S. inflation rates. However, a more elaborate model with a richer
dynamic specification for the mean equation may well give a better fit and may provide a more
economic meaningful interpretation.
Figure 3 displays the estimated time-varying unobserved components trend µt and seasonal γt
together with the estimated time-varying standard deviation of the observation equation q1σt =
q1 exp(ht/2) for the final model reported in table 3. The estimation of the components are discussed
in section 3.4. In the first graph of figure 3 the jumps in the estimated level component are
associated with the level shifts in the model LL-Seas-CSV-δ. We note that when the common
19
00.10.20.30.40.50.60.70.80.9
11.1
60 70 80 90 00
(i)
-0.4-0.3-0.2-0.1
00.10.20.30.4
60 70 80 90 00
(ii)
0
0.05
0.1
0.15
0.2
0.25
60 70 80 90 00
(iii)
Figure 3: Estimation results for the LL-Seas-CSV-δ: (i) estimated level (including level breaks),
(ii) estimated seasonal and (iii) estimated standard deviation of the irregular component (scaled
by the estimated time-varying common stochastic standard deviation)
stochastic variance component is included, the model is flexible and can account for those shifts
and the breaks in the model are not absolutely necessary. The seasonality in the second graph
is smooth but clearly time-varying. The final graph of figure 3 presents the estimated series q1σt
which corresponds to the estimate of q1 in the LL-Seas model (that is approximately 0.16). The
common stochastic variance component displays a higher level in the early 1980s corresponding to
a higher uncertainty in the early period od the 1980s which is sometimes referred to as the “second
oil crisis”. Further it shows a gradual reduction in the level in the early 1990s corresponding to
the less volatile period associated with the Volcker-Greenspan regime at the U.S. Federal Reserve.
We therefore conclude that some interesting features of the U.S. inflation series are effectively and
realistically represented by the common stochastic variance in this basic model.
Figure 4 plots the observation weights for the trend plus seasonal signal µt + γt which are
computed as discussed in section 3.4. The weights are presented for the estimated models LL-
Seas (upper panels) and LL-Seas-CSV-δ (bottom panels). The panels on the left hand side are
associated with time point 1981:1 and we observe that the distribution of the weights have heavier
tails for the LL-Seas model with a constant variance compared to the model with a common
stochastic variance. The latter model allows for the increased variability at the beginning of
the 1980s and therefore the observations surrounding time point 1981:1 are more relevant than
observations in more tranquile periods and are given more weight. Also the weighting pattern is
asymmetric whereas for the model with a constant variance the weighting pattern is symmetric.
The weights associated with time point 1995:8 are the same as for time point 1981:1 in the upper
panel (apart from end-point effects) because the model has constant variances. In the bottom
panel it is shown that the less weight is given to observations in the 1980s and the beginning of
the 1990s whereas relative more weight is given to observations after 1995:8. This reflects that
observations from 1993 onwards can be relied upon with more confidence.
20
-0.1
0
0.1
0.2
0.3
65 70 75 80 85 90 95
LLS, 1981:1
-0.1
0
0.1
0.2
0.3
65 70 75 80 85 90 95
LLS-CSV-δ, 1981:1
-0.1
0
0.1
0.2
0.3
80 85 90 95 00
LLS, 1995:8
-0.1
0
0.1
0.2
0.3
80 85 90 95 00
LLS-CSV-δ, 1995:8
Figure 4: Observation weights for the signal of the LL-Seas and LL-Seas-CSV-δ models at periods
1981:1 and 1995:8
Table 4: Forecasting results for the inflation data
LL-Seas 1 3 6 12
MFE −0.004 −0.007 −0.005 −0.011
MAPE 0.123 0.131 0.120 0.104
RMSE 0.154 0.163 0.149 0.133
LL-Seas-CSV 1 3 6 12
MFE −0.004 −0.006 −0.005 −0.010
MAPE 0.107 0.109 0.103 0.098
RMSE 0.137 0.138 0.131 0.126
LL-Seas-CSV-δ 1 3 6 12
MFE −0.005 −0.008 −0.007 −0.011
MAPE 0.108 0.109 0.104 0.099
RMSE 0.137 0.138 0.133 0.127
The table reports the mean forecast error (MFE), the
mean absolute prediction error (MAPE) and the root mean
squared forecast error (RMSE), at horizons of 1, 3, 6 and 12
months ahead. Forecasting starts at 1985:1.
21
To investigate whether the common stochastic variance component improves the forecasting
performance, we have carried out the following forecasting exercise. Starting with the sample
1957:1-1985:1, the model is estimated and h-step forecasts are computed for h = 1, 3, 6, 12; for
the details, see section 3. The four forecast errors em,h = ym+h − ym+h|m are computed where
m refers to the final observation in the estimation sample and yt|m is the forecast of yt given
the observations of the estimation sample for t > m. The estimation and forecasting steps are
repeated for samples that are increased by one observation at a time until sample 1957:1-2000:9
is reached. Finally, some statistics are computed for the collection of forecast errors.
Table 4 reports the mean forecast error (MFE), the mean absolute prediction error (MAPE)
and the root mean squared error (RMSE) which are defined by
MFE =∑
et+h/k, MAPE =∑
|et+h|/k, RMSE =√∑
e2t+h/k.
where k is the number of forecasts errors for horizons h = 1, 3, 6, 12 months. We compare results
for the three inflation models as considered in table 3. The models with the common stochastic
variance outperform the forecast statistics of the model with constant variance for all forecasting
horizons. This is mainly due to the gradual change of the variance after 1985 which seems a
salient feature of US inflation data. By allowing for these changes in the common variance of the
model, more accurate forecasts can be produced. The last panel of table 4 gives the results for the
LL-Seas-CSV model with the pre-1985 level breaks. These statistics are similar to the ones based
on same the model without the breaks. This is probably due to the fact that no breaks occur in
the forecasting periods. Finally we note that for all models the MAPE and RMSE statistics are
lower at horizons 6 and 12 than at horizons 1 and 3 although this is less true for the two models
with a common stochastic variance. It appears that the seasonal effect is strong and therefore
good forecasts can be produced at yearly horizons in particular.
5.2 US industrial production
Another macroeconomic time series of interest is U.S. industrial production2. The monthly time
series from 1960 to 2001 is presented in levels, in logarithms and as percentages of month-to-month
growth rates in figure 5. The time series plot of growth reveals that the variability of the series is
lower after the early 1980s.
The graph of production growth shows further that seasonal variation is the most prominent
feature. We therefore start to estimate an unobserved components time series model with seasonal
and irregular components, that is model (9) with µt = βt = 0 (and q2 = q3 = 0). The estimation
results of this simple model are reported in the column of table 5 indicated by Seas. The large2Industrial Production – Market Group – Total Index – Not Seasonally Adjusted. Source: Economagic, series
b500001 ipnsa, Federal Reserve, Board of Governors. Available is 1919:1-2001:12 but we use only 1960:1-2001:12.
22
20
40
60
80
100
120
140
160
60 70 80 90 00
(i)
3.43.63.8
44.24.44.64.8
55.2
60 70 80 90 00
(ii)
-10-8-6-4-20246
60 70 80 90 00
(iii)
Figure 5: The U.S. monthly industrial production: (i) in levels, (ii) in logs and (iii) in percentage
growth
Table 5: Estimation results for U.S. industrial growth ∆ log(IP)
Parameter Seas AR-Seas AR-Seas-CSV
q1 (s.d. irr) 0.9499 [ 0.885, 1.019] 0.7248 [ 0.648, 0.811] 0.7463 [0.594, 0.938]q4 (s.d. seas) 0.0311 [ 0.024, 0.040] 0.0365 [ 0.029, 0.046] 0.0344 [0.025, 0.046]q5 (s.d. ar1) 0.5434 [ 0.426, 0.693] 0.4980 [0.352, 0.704]ϕ (ar1) 0.8756 [ 0.704, 0.935] 0.9471 [0.884, 0.970]φ (CSV) 0.9432 [0.810, 1.077]ση (CSV) 0.2072 [0.051, 0.848]LL -753.89 -715.56 -699.42Test statistics
Normality-DH 32.260 (0.000) 17.397 (0.000) 5.559 (0.062)Box-Ljung Q 82.733 (0.000) 24.271 (0.186) 29.261 (0.032)ARCH 35.894 (0.105) 17.540 (0.149) 1.829 (0.405)H(n/3) 0.477 (1.000) 0.480 (1.000) 0.785 (0.939)Time 0.58 1.48 1:19.34
Parameter estimates with 95% confidence intervals of three models for U.S. monthly in-dustrial growth 1960-2001. The three models consist of a combination of seasonal plusirregular components (Seas), autoregressive (AR-Seas) component and common stochas-tic variance (AR-Seas-CSV) component respectively. The loglikelihood (LL) value andvarious diagnostic test statistics (with p-values) are also reported. The last row reportsthe estimation time.
23
Box-Ljung Q portmanteau test3 for serial correlation indicates that the model is not appropriate
for the growth series. A closer investigation shows that the serial correlation coefficient of the
prediction residuals at the first lag is relatively large. Therefore an autoregressive component is
included in the model that is given by
yt = ρt + γt + ε1t, t = 1, . . . , n,
where γt and ε1t are specified in (9) and ρt is the autoregressive component given by
ρt+1 = ϕρt + q5ε5t, t = 1, . . . , n,
with ρ1 ∼ N{0, q25/(1−ϕ2)}. The estimation results of this model for the production growth series
is reported in the column of table 5 indicated by AR-Seas. The increase of the loglikelihood value
and the decrease of the Box-Ljung Q test-statistic for serial correlation are substantial compared to
the results of the seasonal plus irregular model. The estimate of the autoregressive ϕ is close to 0.9
and provides the expected persistency in U.S. growth whereas the time-variation of the seasonal
component remains unchanged. The normality test statistic however raises some concern about
the specification of the current model for U.S. growth in production. Some graphical diagnostics
are presented in figure 6 and they are reasonable although the prediction residuals are subject to
some unusual large values during the oil crises in the middle of the 1970s and the early 1980s.
They appear even more clearly in the final plot of figure 6, that is the cumulative sum of squared
prediction residuals.
It is observed earlier that the overall variation of growth is locally varying over time and
therefore we introduce the common stochastic variance. The estimation of this model does not
give numerical problems and the loglikelihood value increases by a value of almost 16 at the expense
of two additional unknown parameters and more computing time. The loglikelihood ratio clearly
indicates that this more elaborate model is preferable. The reported estimated parameter values
show that U.S. growth in the last, say, forty years is very persistent (the estimated autoregressive
coefficient for ρt is close to 0.95) and the persistency of the uncertainty in growth, as indicated
by the autoregressive coefficient φ of the CSV, is also substantial given the estimated value 0.94.
It is surprising to observe that the estimated confidence intervals for the seasonal component and
the irregular component remain close to their estimated values for the model without CSV. This
may indicate that the time-varying heteroskedasticity is most applicable to the autoregressive
component.
It is shown that some improvements in the modelling of macroeconomic time series can be
obtained by allowing the common variance to change stochastically over time. This is confirmed
by the graphical diagnostic plots for the prediction residuals of the final estimated model presented3The Q-statistic has 21 degrees of freedom for the smallest model, 17 degrees of freedom for the largest.
24
-4-3-2-101234
60 69 80 89 00
(i)
-0.2-0.15
-0.1-0.05
00.05
0.10.15
0.20.25
0 10 20 30 40 50 60
(ii)
00.10.20.30.40.50.60.70.80.9
1
60 69 80 89 00
(iii)
Figure 6: Prediction residuals of the estimated model “AR-Seas” with AR(1), seasonal and irreg-
ular components: (i) plot of standardised residuals; (ii) autocorrelogram of residuals (continuous
lines) and squared residuals; (iii) scaled cumulative sum of squared residuals.
in figure 7. The residuals are well-behaved and by considering a richer dynamic specification for
the mean equation we may well reduce the remaining serial correlation in the residuals further.
-3
-2
-1
0
1
2
3
60 69 80 89 00
(i)
-0.2-0.15
-0.1-0.05
00.05
0.10.15
0.20.25
0 10 20 30 40 50 60
(ii)
00.10.20.30.40.50.60.70.80.9
1
60 69 80 89 00
(iii)
Figure 7: Residuals of the model with AR(1), seasonal, irregular and SV components: (i) plot of
standardised residuals; (ii) autocorrelogram of residuals (continuous lines) and squared residuals;
(iii) scaled cumulative sum of squared residuals.
-1.5
-1
-0.5
0
0.5
1
60 69 80 89 00
(i)
-8
-6
-4
-2
0
2
4
6
60 69 80 89 00
(ii)
00.20.40.60.8
11.21.41.6
60 69 80 89 00
(iii)
Figure 8: Estimation results for the production growth model: (i) estimated autoregression, (ii)
estimated seasonal and (iii) estimated standard deviation of the irregular component (scaled by
the estimated time-varying common stochastic standard deviation)
25
Finally, the plots of the estimated components of interest in figure 8 are informative and they
display the main characteristics of the dynamics in U.S. growth. Further analyses may consider an
autoregressive component ρt that is based on a second order autoregressive process with complex
roots or on a cyclical component based on time-varying trigonometric terms. The introduction of
(lagged) explanatory variables may lead to an empirical model that is of more interest from an
economic point of view.
-0.1
0
0.1
0.2
0.3
60 65 70 75 80 85
AR-Seas, 1974:4
-0.1
0
0.1
0.2
0.3
80 85 90 95 00
AR-Seas, 1993:6
-0.1
0
0.1
0.2
0.3
60 65 70 75 80 85
AR-Seas-CSV, 1974:4
-0.1
0
0.1
0.2
0.3
80 85 90 95 00
AR-Seas-CSV, 1993:6
Figure 9: Weights for the AR-Seas and AR-Seas-CSV models, at the high-variability period 1974:4
and the low uncertainty period 1993:6
The observation weights of the estimated signal γt +ρt are presented for the time points 1974:4
and 1993:6 in figure 9. The top panels display the weights for the AR-Seas model with a fixed
variance and are therefore equivalent. The bottom panels present the weight patterns implied by
the model with a common stochastic variance and they are different from each other but also from
the associating weights of the top panels. The main difference is that the patterns of the bottom
panels are not symmetric and therefore different weights are assigned to observations with an
equal distance to the time index of the signal. This is caused by the (common) heteroskedasticity
modelled as a stochastic function of time.
The same forecasting exercise as for US inflation is carried out for US industrial growth where
forecasts are made based on the data until 1980:1 upto 2000:12. For each subsample the model with
a fixed variance and with a common stochastic variance is estimated and forecasts are computed
for horizons h = 1, 3, 6, 12. The results are reported in table 6. The mean of the forecasts errors
reduces considerably for the model with a common stochastic variance leading to less bias in the
26
Table 6: Forecasting results for US industrial growth
AR-Seas 1 3 6 12
MFE 0.103 0.136 0.191 0.161
MAPE 0.697 0.707 0.696 0.677
RMSE 0.921 0.925 0.886 0.859
AR-Seas-CSV 1 3 6 12
MFE 0.073 0.088 0.134 0.115
MAPE 0.704 0.705 0.693 0.688
RMSE 0.929 0.929 0.892 0.880
The table reports the mean forecast error (MFE), the
mean absolute prediction error (MAPE) and the root
mean squared forecast error (RMSE), at horizons of
1, 3, 6 and 12 months ahead. Forecasting starts at
1980:1.
forecasts. Note that the bias is still higher than in the case of US inflation. It is surprising that the
variation of the forecast errors is equivalent for both models. This is probably due to the fact that
the main feature of US growth is described only by the seasonal component. The autoregressive
component and the heteroskedastic nature of the time series seem less important.
6 Conclusions
In this paper, we have considered the possibility of simultaneously modelling both the mean
equation and the common stochastic variance equation within a general class of linear time series
models. The state space framework is flexible and useful in, for example, dissecting time series
into unobserved components so that level, slope and seasonal effects in a time series can be
analysed separately. With the inclusion of a stochastic volatility component into the model,
further flexibility is attained that can be useful for the modelling of macroeconomic time series.
After an introduction of the general model together with a discussion of some special cases in
section 2, a short theoretical exposition is given in section 3. A Monte Carlo experiment is carried
out to show the small sample properties of the estimates produced by the importance sampling
estimation method. These simulation based maximum likelihood estimates are obtained subject
to simulation error which can be made arbitrarily small by enlarging the number of simulations.
The methodology is illustrated by modelling a monthly time series of U.S. inflation between
1957–2001 (537 observations). It is found that a simple local level model with a seasonal component
and a common stochastic variance is appropriate and that it can incorporate some important
27
dynamic features of inflation that is difficult to model since the series has been subject to some
shocks in the last forty years. The autoregressive common log-variance has been able to capture
(i) the volatile period of the 1970s and early 1980s associated with oil crises and (ii) the less
volatile period after the end of the 1980s associated with the Volcker-Greenspan regime at the U.S.
Federal Reserve. The role that the common stochastic variance play in the analysis is illustrated
by presenting the observations weights associated with the estimated signal of the time series and
by presenting forecasting results. We show that US inflation forecasts that are based on a simple
descriptive model can be improved by incorporating a common stochastic variance component.
A similar analysis is carried out for the monthly percentage growth of US industrial production
in the years 1960 to 2001. The in-sample fit of the model with a common stochastic variance
component improves significantly although the forecast results are less affected by the extension.
This may be due to the highly seasonal nature of the time series.
The framework of a common stochastic variance can be exploited into other directions. In
this paper we have focused on stochastic heteroskedasticity as a function of time although it can
also be related to other characteristics of the observation yt. For example, the actual value of the
observation or the value of another variable associated with yt may be a source of heteroskedas-
ticity. Let us assume that heteroskedasticity is a stochastic function of the variable xt associated
with yt. The treatment of a common stochastic variance as a function of xt is similar to the
method described in section 3. The Kalman filter is used to obtain innovations based on the state
space model (3) conditional on the observations and on a given value of σ∗. It is noticed that
the innovations are independently distributed. An intermediate step is to re-order the innovations
vt and their scaled variances Ft subject to the magnitude of, say, xt. Denote these re-ordered
innovations by voj and their variances by F o
j . It is still valid to treat the re-ordered innovations as
being generated by the model
voj ∼ N (0, σo2
j ), j = 1, . . . , n.
We then can use model (5) for the re-ordered quantities (so using index j instead of time index t)
and obtain a Monte Carlo estimate of the loglikelihood function since the equality p(y|ψ) = p(vo|ψ)
where vo is the stack of the re-ordered innovations still applies. Further research is required to
investigate the effectiveness of this development but this short exposition reveals the flexibility of
the proposed methodology for modelling stochastic heteroskedasticity.
References
Anderson, B. D. O. and J. B. Moore (1979). Optimal Filtering. Englewood Cliffs: Prentice-Hall.
Bos, C. S., R. J. Mahieu, and H. K. van Dijk (2000). Daily exchange rate behaviour and hedging
28
of currency risk. J. Applied Econometrics 15 (6), 671–696.
Bowman, K. O. and L. R. Shenton (1975). Omnibus test contours for departures from normality
based on√b1 and b2. Biometrika 62, 243–250.
Brockwell, P. J. and R. A. Davis (1987). Time Series: Theory and Methods. New York: Springer-
Verlag.
Danielson, J. (1994). Stochastic volatility in asset prices: Estimation with simulated maximum
likelihood. Journal of Econometrics 61, 375–400.
De Jong, P. (1989, December). Smoothing and interpolation with the state-space model. J.
American Statistical Association 84 (408), 1082–1088.
Doornik, J. A. (1999). Object-Oriented Matrix Programming using Ox (3rd ed.). London: Tim-
berlake Consultants Ltd. See http://www.nuff.ox.ac.uk/Users/Doornik.
Doornik, J. A. and H. Hansen (1994). An omnibus test for univariate and multivariate normality.
Technical report, Nuffield College, Oxford, UK.
Doucet, A., J. F. G. deFreitas, and N. J. Gordon (Eds.) (2000). Sequential Monte Carlo methods
in practice. New York: Springer-Verlag.
Durbin, J. and S. J. Koopman (1997). Monte Carlo maximum likelihood estimation of non-
Gaussian state space model. Biometrika 84, 669–84.
Durbin, J. and S. J. Koopman (2001). Time Series Analysis by State Space Methods. Oxford:
Oxford University Press.
Durbin, J. and S. J. Koopman (2002). A simple and efficient simulation smoother for state space
time series analysis. Biometrika 89, 603–616.
Durham, G. B. and A. Gallant (2002). Numerical techniques for maximum likelihood estimation
of continuous-time diffusion. J. Business and Economic Statist. 20 (3), 297–316.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance
of United Kingdom inflations. Econometrica 50, 987–1008.
Engle, R. F. and A. D. Smith (1999). Stochastic permanent breaks. Review of Economics and
Statistics 81 (4), 553–574.
Gallant, A. R., D. Hsieh, and G. Tauchen (1997). Estimation of stochastic volatility models
with diagnostics. Journal of Econometrics 81 (1), 159–192.
Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration.
Econometrica 57 (6).
29
Ghysels, E., A. C. Harvey, and E. Renault (1996). Stochastic volatility. In G. Maddala and
C. Rao (Eds.), Handbook of Statistics, Volume 14, Statistical Methods in Finance, pp. 119–
191. North-Holland, Amsterdam.
Hamilton, J. (1994). Time Series Analysis. Princeton: Princeton University Press.
Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cam-
bridge: Cambridge University Press.
Harvey, A. C. and S. J. Koopman (2000). Signal extraction and the formulation of unobserved
components models. Econometrics Journal 3, 84–107.
Harvey, A. C., E. Ruiz, and N. Shephard (1994). Multivariate stochastic variance models. Rev.
Economic Studies 61, 247–264.
Jacquier, E., N. G. Polson, and P. E. Rossi (1994). Bayesian analysis of stochastic volatility
models. J. Business and Economic Statist. 12, 371–417.
Kim, C. J. and C. R. Nelson (1999). State Space Models with Regime Switching. Cambridge,
Massachusetts: MIT Press.
Kim, S., N. Shephard, and S. Chib (1998). Stochastic volatility: Likelihood inference and com-
parison with ARCH models. Rev. Economic Studies 64, 361–393.
Kloek, T. and H. K. Van Dijk (1978). Bayesian estimates of equation system parameters: An
application of integration by Monte Carlo. Econometrica 46, 1–20.
Koopman, S. J. and N. Shephard (2002). Testing the assumptions behind the use of importance
sampling. Discussion paper, Nuffield College, Oxford.
Koopman, S. J., N. Shephard, and J. A. Doornik (1999). Statistical algorithms for models in
state space using SsfPack 2.2. Econometrics Journal 2, 107–160.
Ljung, G. M. and G. E. P. Box (1978). On a measure of lack of fit in time series models.
Biometrika 66, 67–72.
Nabeya, S. and K. Tanaka (1988). Asymptotic theory of a test for the constancy of regression
coefficients against the random walk alternative. Annals of Statistics 16, 218–235.
Ord, J. K., A. B. Koehler, and R. D. Snyder (1997). Estimation and prediction for a class of
dynamic nonlinear statistical models. J. American Statistical Association 92, 1621–1629.
Sandmann, G. and S. J. Koopman (1998). Estimation of stochastic volatility models via Monte
Carlo maximum likelihood. Journal of Econometrics 87, 271–301.
Schweppe, F. (1965). Evaluation of likelihood functions for Gaussian signals. IEEE Transactions
on Information Theory 11, 61–70.
Shephard, N. (1994). Partial non-Gaussian state space. Biometrika 81 (1), 115–131.
30
Shephard, N. (1996). Statistical aspects of ARCH and stochastic volatility. In D. Cox, D. Hink-
ley, and O. Barndorff-Nielsen (Eds.), Time Series Models in Econometrics, Finance and
Other Fields, Number 65 in Monographs on Statistics and Applied Probability, pp. 1–67.
Chapman and Hall, London.
Shephard, N. and M. K. Pitt (1997). Likelihood analysis of non-Gaussian measurement time
series. Biometrika 84, 653–667.
Shumway, R. H. and D. S. Stoffer (2000). Time Series Analysis and its Applications. Springer
Texts in Statistics. New York: Springer.
Taylor, S. J. (1994). Modeling stochastic volatility: A review and comparative study. Mathe-
matical Finance 4 (2), 183–204.
Table 7: Small sample results for the Local Level modelGenerating no SV
n = 100 n = 500 n = 1000Estimating No SV SV No SV SV No SV SV
p p s.d. p s.d. p s.d. p s.d. p s.d. p s.d.
q1 1.0 1.00 0.10 0.98 0.11 1.00 0.04 0.99 0.04 1.00 0.03 0.99 0.03q2 0.5 0.48 0.11 0.48 0.11 0.50 0.05 0.49 0.05 0.50 0.04 0.50 0.03ση 0.15 0.23 0.11 0.13 0.08 0.11φ 0.38 0.31 0.34 0.30 0.35 0.28
Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej
Normality-DH 1.96 0.04 1.31 0.02 1.99 0.05 1.29 0.01 2.01 0.05 1.52 0.03H(n/3) 1.12 0.33 1.05 0.29 1.01 0.14 1.02 0.12 1.00 0.04 1.00 0.02LL -164.12 -164.20 -830.95 -831.98 -1663.95 -1664.71
Generating SV
n = 100 n = 500 n = 1000Estimating No SV SV No SV SV No SV SV
p p s.d. p s.d. p s.d. p s.d. p s.d. p s.d.
q1 1.0 1.05 0.15 0.98 0.14 1.05 0.07 1.00 0.06 1.05 0.05 1.00 0.05q2 0.5 0.51 0.13 0.49 0.13 0.52 0.06 0.50 0.06 0.52 0.04 0.50 0.04ση 0.2 0.23 0.22 0.22 0.10 0.20 0.07φ 0.9 0.68 0.33 0.85 0.13 0.88 0.08
Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej
Normality-DH 3.86 0.19 1.49 0.02 9.42 0.51 2.42 0.06 17.92 0.80 3.75 0.17H(n/3) 1.20 0.37 1.02 0.27 1.08 0.28 1.00 0.03 1.03 0.20 1.00 0.00LL -168.64 -165.79 -856.56 -850.34 -1713.49 -1699.89
See table 2 for a description of the entries in the table.
31
Table 8: Small sample results for the Local Level plus Seasonal modelGenerating no SV
n = 100 n = 500 n = 1000Estimating No SV SV No SV SV No SV SV
p p s.d. p s.d. p s.d. p s.d. p s.d. p s.d.
q1 1.0 0.99 0.15 0.98 0.18 0.99 0.06 0.99 0.06 1.00 0.04 0.99 0.04q2 0.5 0.48 0.12 0.48 0.13 0.50 0.05 0.49 0.05 0.50 0.03 0.50 0.03q4 0.2 0.20 0.05 0.19 0.06 0.20 0.02 0.20 0.02 0.20 0.01 0.20 0.01ση 0.18 0.24 0.11 0.13 0.09 0.11φ 0.39 0.34 0.33 0.31 0.29 0.28
Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej
Normality-DH 2.04 0.05 1.24 0.01 1.91 0.05 1.35 0.02 1.89 0.05 1.46 0.03H(n/3) 1.06 0.14 1.02 0.11 1.02 0.01 0.99 0.00 1.01 0.00 1.00 0.00LL -182.91 -182.73 -939.40 -938.98 -1884.43 -1882.91
Generating SV
n = 100 n = 500 n = 1000Estimating No SV SV No SV SV No SV SV
p p s.d. p s.d. p s.d. p s.d. p s.d. p s.d.
q1 1.0 1.03 0.21 1.12 0.52 1.04 0.08 1.01 0.08 1.05 0.06 1.00 0.06q2 0.5 0.51 0.13 0.56 0.31 0.53 0.06 0.51 0.06 0.53 0.04 0.51 0.04q4 0.2 0.20 0.06 0.22 0.12 0.21 0.02 0.20 0.03 0.21 0.02 0.20 0.02ση 0.2 0.24 0.23 0.19 0.09 0.17 0.06φ 0.9 0.70 0.32 0.87 0.13 0.90 0.07
Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej Stat fRej
Normality-DH 3.18 0.15 1.33 0.02 7.87 0.39 1.65 0.02 13.05 0.63 2.38 0.06H(n/3) 1.21 0.28 0.96 0.07 1.05 0.10 0.98 0.00 1.03 0.03 0.99 0.00LL -186.52 -184.97 -962.74 -959.00 -1937.07 -1922.74
See table 2 for a description of the entries in the table.
32