time series analysis - uni- · pdf filetime series analysis andrea beccarini center for...
TRANSCRIPT
Time Series Analysis
Andrea Beccarini
Center for Quantitative Economics
Winter 2013/2014
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 1 / 143
IntroductionObjectives
Time series are ubiquitous in economics, and very importantin macro economics and financial economics
GDP, inflation rates, unemployment, interest rates, stock prices
You will learn . . .
the formal mathematical treatment of time series and stochasticprocesseswhat the most important standard models in economics arehow to fit models to real world time series
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 2 / 143
IntroductionPrerequisites
Descriptive Statistics
Probability Theory
Statistical Inference
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 3 / 143
IntroductionClass and material
Class
Class teacher: Sarah Meyer
Time: Tu., 12:00-14:00
Location: CAWM 3
Start: 22 October 2013
Material
Course page on Blackboard
Slides and class material are (or will be) downloadable
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 4 / 143
IntroductionLiterature
Neusser, Klaus (2011), Zeitreihenanalyse in denWirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden.−→ available online in the RUB-Netz
Hamilton, James D. (1994), Time Series Analysis,Princeton University Press, Princeton.
Pfaff, Bernhard (2006), Analysis of Integrated andCointegrated Time Series with R, Springer, New York.
Schlittgen, Rainer und Streitberg, Bernd (1997),Zeitreihenanalyse, 7. Aufl., Oldenbourg, Munchen.
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 5 / 143
BasicsDefinition
Definition: Time series
A sequence of observations ordered by time is called time series
Time series can be univariate or multivariate
Time can be discrete or continous
The states can be discrete or continuous
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 6 / 143
BasicsDefinition
Typical notationsx1, x2, . . . , xT
or x(1), x(2), . . . , x(T )or xt , t = 1, . . . ,Tor (xt)t≥0
This course is about . . .
univariate time seriesin discrete timewith continuous states
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 7 / 143
BasicsExamples
Quarterly GDP Germany, 1991 I to 2012 II
1995 2000 2005 2010
350
400
450
500
550
600
650
Time
GD
P (
in c
urre
nt b
illio
n E
uro)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 8 / 143
BasicsExamples
DAX index and log(DAX), 31.12.1964 to 6.4.2009
1970 1980 1990 2000 2010
2000
6000
Time
DA
X
1970 1980 1990 2000 2010
6.0
7.0
8.0
9.0
Time
loga
rithm
of D
AX
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 9 / 143
BasicsDefinition
Definition: Stochastic process
A sequence (Xt)t∈T of random variables, all defined on the sameprobability space (Ω,A,P), is called stochastic process with discrete timeparameter (usually T = N or T = Z)
Short version: A stochastic process is a sequence of random variables
A stochastic process depends on both chance and time
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 10 / 143
BasicsDefinition
Distinguish four cases: both time and chance can be fixed or variable
t fixed t variable
ω fixed Xt(ω) is a Xt(ω) is a sequence ofreal number real numbers (path,
realization, trajectory)
ω variable Xt(ω) is a Xt(ω) is a stochasticrandom variable process
process.R
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 11 / 143
BasicsExamples
Example 1: White noise
εt ∼ NID(0, σ2
)Example 2: Random walk
Xt = Xt−1 + εt and X0 = 0
εt ∼ NID(0, σ2)
Example 3: A random constant
Xt = Z
Z ∼ N(0, σ2)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 12 / 143
BasicsMoment functions
Definition: Moment functions
The following functions of time are called moment functions:
µ(t) = E (Xt) (expectation function)σ2(t) = Var(Xt) (variance function)γ(s, t) = Cov(Xs ,Xt) (covariance function)
Correlation function (autocorrelation function)
ρ(s, t) =γ(s, t)√
σ2(s)√σ2(t)
moments.R [1]
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 13 / 143
BasicsEstimation of moment functions
Usually, the moment functions are unknown and have to be estimated
Problem: Only a single path (realization) can be observed
X(1)1 X
(2)1 . . . X
(n)1
X(1)2 X
(2)2 . . . X
(n)2
...... . . .
...
X(1)T X
(2)T . . . X
(n)T
Can we still estimate the expectation function µ(t) and theautocovariance function γ(s, t)? Under which conditions?
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 14 / 143
BasicsEstimation of moment functions
X(1)1 X
(2)1 . . . X
(n)1
X(1)2 X
(2)2 . . . X
(n)2
...... . . .
...
X(1)T X
(2)T . . . X
(n)T
Usually, the expectation function µ(t) should be estimated byaveraging over realizations,
µ(t) =1
n
n∑i=1
X(i)t
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 15 / 143
BasicsEstimation of moment functions
X(1)1 X
(2)1 . . . X
(n)1
X(1)2 X
(2)2 . . . X
(n)2
...... . . .
...
X(1)T X
(2)T . . . X
(n)T
Under certain conditions, µ(t) can be estimated byaveraging over time,
µ =1
T
T∑t=1
Xt(1)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 15 / 143
BasicsEstimation of moment functions
X(1)1 X
(2)1 . . . X
(n)1
X(1)2 X
(2)2 . . . X
(n)2
...... . . .
...
X(1)T X
(2)T . . . X
(n)T
Usually, the autocovariance γ(t, t + h) should be estimated byaveraging over realizations,
γ(t, t + h) =1
n
n∑i=1
(X(i)t − µ(t))(X
(i)t+h − µ(t + h))
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 16 / 143
BasicsEstimation of moment functions
X(1)1 X
(2)1 . . . X
(n)1
X(1)2 X
(2)2 . . . X
(n)2
...... . . .
...
X(1)T X
(2)T . . . X
(n)T
Under certain conditions, γ(t, t + h) can be estimated byaveraging over time,
γ(t, t + h) =1
T
T−h∑t=1
(Xt(1) − µ)(Xt+h
(1) − µ)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 16 / 143
BasicsDefinition
Moment functions cannot be estimated without additionalassumptions since only one path is observed
There are restrictions which allow to estimate the moment functions
Restriction of the time heterogeneity:The distribution of (Xt(ω))t∈T must not be completely different foreach t ∈ TRestriction of the memory:If the values of the process are coupled too closely over time, theindividual observations do not supply any (or only insufficient)information about the distribution
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 17 / 143
BasicsRestriction of time heterogeneity: Stationarity
Definition: Strong stationarity
Let (Xt)t∈T be a stochastic process, and let t1, . . . , tn ∈ T be an arbitrarynumber of n ∈ N arbitrary time points.
(Xt)t∈T is called strongly stationary if for arbitrary h ∈ Z
P(Xt1 ≤ x1, . . . ,Xtn ≤ xn) = P(Xt1+h ≤ x1, . . . ,Xtn+h ≤ xn)
Implication: all univariate marginal distributions are identical
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 18 / 143
BasicsRestriction of time heterogeneity: Stationarity
Definition: Weak stationarity
(Xt)t∈T is called weakly stationary if
1 the expectation exists and is constant: E (Xt) = µ <∞ for all t ∈ T2 the variance exists and is constant: Var(Xt) = σ2 <∞ for all t ∈ T3 for all t, s, r ∈ Z (in admissible range)
γ(t, s) = γ (t + r , s + r)
Simplified notation for covariance and correlation functions
γ(h) = γ(t, t + h)
ρ(h) = ρ(t, t + h)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 19 / 143
BasicsRestriction of time heterogeneity: Stationarity
Strong stationarity implies weak stationarity(but only if the first two moments exist)
A stochastic process is called Gaussian if the joint distributionof Xt1 , . . . ,Xtn is multivariate normal
For Gaussian processes, weak and strong stationarity coincide
Intuition: An observed time series can be regarded as a realization ofa stationary process, if a gliding window of
”appropriate width“
always displays”qualitatively the same“ picture
stationary.R
Examples [2]
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 20 / 143
BasicsRestriction of memory: Ergodicity
Definition: Ergodicity (I)
Let (Xt)t∈T be a weakly stationary stochastic process with expectation µand autocovariance γ(h); define
µ =1
T
T∑t=1
Xt
(Xt)t∈T is called (expectation) ergodic, if
limT→∞
E[(µT − µ)2
]= 0
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 21 / 143
BasicsRestriction of memory: Ergodicity
Definition: Ergodicity (II)
Let (Xt)t∈T be a weakly stationary stochastic process with expectation µand autocovariance γ(h); define
γ(h) =1
T
T−h∑t=1
(Xt − µ)(Xt+h − µ)
(Xt)t∈T is called (covariance) ergodic, if for all h ∈ Z
limT→∞
E[(γ(h)− γ(h))2
]= 0
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 22 / 143
BasicsRestriction of memory: Ergodicity
Ergodicity is consistency (in quadratic mean) of the estimators µ of µand γ(h) of γ(h) for dependent observations
The process (Xt)t∈T is expectation ergodic if (γ(h))h∈Z isabsolutely summable, i.e.
∞∑h=−∞
|γ(h)| <∞
The dependence between far away observations must be sufficientlysmall
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 23 / 143
BasicsRestriction of memory: Ergodicity
Ergodicity condition (for autocovariance): A stationary Gaussianprocess (Xt)t∈T with absolutely summable autocovariance functionγ(h) is (autocovariance) ergodic
Under ergodicity, the law of large numbers holds even if theobservations are dependent
If the dependence γ(h) does not diminish fast enough,the estimators are no longer consistent
Examples [3]
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 24 / 143
BasicsEstimation of moment functions
Summary of estimators electricity.R
µ = XT =1
T
T∑t=1
Xt
γ(h) =1
T
T−h∑t=1
(Xt − µ)(Xt+h − µ)
ρ(h) =γ(h)
γ(0)
Sometimes, γ(h) is defined with factor 1/(T − h)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 25 / 143
BasicsEstimation of moment functions
A closer look at the expectation estimator µ
The estimator µ is unbiased, i.e. E (µ) = µ [4]
The variance of µ is [5]
Var (µ) =γ (0)
T+
2
T
T−1∑h=1
(1− h
T
)γ (h)
Under ergodicity, for T →∞
T · Var (µ)→ γ (0) + 2∞∑h=1
γ (h) =∞∑
h=−∞γ(h)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 26 / 143
BasicsEstimation of moment functions
For Gaussian processes, µ is normally distributed
µ ∼ N (µ,Var(µ))
and asymptotically
√T (µ− µ)→ Z ∼ N
(0, γ (0) + 2
∞∑h=1
γ (h)
)
For non-Gaussian processes, µ is (often) asymptotically normal
√T (µ− µ)→ Z ∼ N
(0, γ (0) + 2
∞∑h=1
γ (h)
)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 27 / 143
BasicsEstimation of moment functions
A closer look at the autocovariance estimators γ(h)
For Gaussian processes with absolutely summable covariance function,(√T (γ (0)− γ (0)) , . . . ,
√T (γ (K )− γ (K ))
)′is multivariate normal with expectation vector (0, . . . , 0)′ and
T · Cov (γ (h1) , γ (h2))
=∞∑
r=−∞(γ (r) γ (r + h1 + h2) + γ (r − h2) γ (r + h1))
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 28 / 143
BasicsEstimation of moment functions
A closer look at the autocorrelation estimators ρ(h)
For Gaussian processes with absolutely summable covariance function,the random vector(√
T (ρ (0)− ρ (0)) , . . . ,√T (ρ (K )− ρ (K ))
)′is multivariate normal with expectation vector (0, . . . , 0)′ and acomplicated covariance matrix
Be careful: For small to medium sample sizes the autocovariance andautocorrelation estimators are biased!
autocorr.R
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 29 / 143
BasicsEstimation of moment functions
An important special case for autocorrelation estimators:
Let (εt) be a white-noise process with Var (εt) = σ2 <∞, then
E (ρ (h)) = −T−1 + O(T−2)
Cov (ρ (h1) , ρ (h2)) =
T−1 + O(T−2) for h1 = h2
O(T−2
)else
For white-noise processes and long time series, the empiricalautocorrelations are approximately independent normal randomvariables with expectation −T−1 and variance T−1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 30 / 143
Mathematical digression (I)Complex numbers
Some quadratic equations do not have real solutions, e.g.
x2 + 1 = 0
Still it is possible (and sensible) to define solutions to such equations
The definition in common notation is
i =√−1
where i is the number which, when squared, equals −1
The number i is called imaginary (i.e. not real)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 31 / 143
Mathematical digression (I)Complex numbers
Other imaginary numbers follow from this definition, e.g.
√−16 =
√16√−1 = 4i√
−5 =√
5√−1 =
√5i
Further, it is possible to define numbers that containboth a real part and an imaginary part, e.g. 5− 8i or a + bi
Such numbers are called complex and the set of complex numbersis denoted as CThe pair a + bi and a− bi is called conjugate complex
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 32 / 143
Mathematical digression (I)Complex numbers
Geometric interpretation:
seq(
0, 8
, len
gth
= 1
1)
real axis
imag
inar
y ax
is a+bi
imaginary part b
real part a
absolute value r
θ
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 33 / 143
Mathematical digression (I)Complex numbers
Polar coordinates and Cartesian coordinates
z = a + bi
= r · (cos θ + i sin θ)
= re iθ
a = r cos θ
b = r sin θ
r =√a2 + b2
θ = arctan
(∣∣∣∣ba∣∣∣∣)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 34 / 143
Mathematical digression (I)Complex numbers
Rules of calculus:
Addition(a + bi) + (c + di) = (a + c) + (b + d)i
Multiplication (cartesian coordinates)
(a + bi) · (c + di) = (ac − bd) + (ad + bc)i
Multiplication (polar coordinates)
r1eiθ1 · r2e iθ2 = r1r2e
i(θ1+θ2)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 35 / 143
Mathematical digression (I)Complex numbers
Addition:
seq(
−2,
8, l
engt
h =
11)
real axis
imag
inar
y ax
is
a+bi
c+di
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 36 / 143
Mathematical digression (I)Complex numbers
Addition:
seq(
−2,
8, l
engt
h =
11)
real axis
imag
inar
y ax
is
a+bi
c+di
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 36 / 143
Mathematical digression (I)Complex numbers
Addition:
seq(
−2,
8, l
engt
h =
11)
real axis
imag
inar
y ax
is
a+bi
c+di
(a+c)+(b+d)i
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 36 / 143
Mathematical digression (I)Complex numbers
Multiplication:
seq(
−2,
8, l
engt
h =
11)
real axis
imag
inar
y ax
is
r1
θ1
r2
θ2
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 37 / 143
Mathematical digression (I)Complex numbers
Multiplication:
seq(
−2,
8, l
engt
h =
11)
real axis
imag
inar
y ax
is
r1
θ1
r2
θ2
r = r1 ⋅ r2
θ = θ1 + θ2
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 37 / 143
Mathematical digression (I)Complex numbers
The quadratic equation
x2 + px + q = 0
has the solutions
x = −p
2±√
p2
4− q
If p2
4 − q < 0 the solutions are complex (and conjugate)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 38 / 143
Mathematical digression (I)Complex numbers
Example: The solutions of
x2 − 2x + 5 = 0
are
x = −(−2)
2+
√(−2)2
4− 5 = 1 + 2i
and
x = −(−2)
2−√
(−2)2
4− 5 = 1− 2i
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 39 / 143
Mathematical digression (II)Linear difference equations
First order difference equation with initial value x0:
xt = c + φ1xt−1
p-th order difference equation with initial value x0:
xt = c + φ1xt−1 + . . .+ φpxt−p
A sequence (xt)t=0,1,... that satisfies the difference equationis called a solution of the difference equation
Examples (diffequation.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 40 / 143
Mathematical digression (II)Linear difference equations
We only consider the homogeneous case, i.e. c = 0
The general solution of the first-order difference equation
xt = φ1xt−1
isxt = A · φt1
with arbitrary constant A since xt = Aφt1 = φ1Aφt−11 = φ1xt−1
The constant is definitized by the initial condition, A = x0
The sequence xt = Aφt1 is convergent if and only if |φ1| < 1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 41 / 143
Mathematical digression (II)Linear difference equations
Solution of the p-th order difference equation
xt = φ1xt−1 + . . .+ φpxt−p
Let xt = Az−t , then
Az−t = φ1Az−(t−1) + . . .+ φpAz
−(t−p)
z−t = φ1z−(t−1) + . . .+ φpz
−(t−p)
and thus1− φ1z
1 − . . .− φpzp = 0
Characteristic polynomial, characteristic equation
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 42 / 143
Mathematical digression (II)Linear difference equations
There are p (possibly complex, possibly nondistinct) solutionsof the characteristic equation
Denote the solutions (called roots) by z1, . . . , zp
If all roots are real and distinct, then
xt = A1z−t1 + . . .+ Apz
−tp
is a solution of the homogeneous difference equation
If there are complex roots the solution is oscillating
The constants A1, . . . ,Ap can be definitized with p initial conditions(x0, x−1, . . . , xp−1)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 43 / 143
Mathematical digression (II)Linear difference equations
Stability condition: The linear difference equation
xt = φ1xt−1 + . . .+ φpxt−p
is stable (i.e. convergent) if and only if all roots of thecharacteristic polynomial
1− φ1z − . . .− φpzp = 0
are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p
In R, the stability condition can be checked easily using thecommands polyroot (base package) or ArmaRoots (fArma package)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 44 / 143
ARMA modelsDefinition
Definition: ARMA process
Let (εt)t∈T be a white noise process; the stochastic process
Xt = φ1Xt−1 + . . .+ φpXt−p + εt + θ1εt−1 + . . .+ θqεt−q
with φp, θq 6= 0 is called ARMA(p, q) process
AutoRegressive Moving Average process
ARMA processes are important since every stationary process can beapproximated by an ARMA process
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 45 / 143
ARMA modelsLag operator and lag polynomial
The lag operator is a convenient notational tool
The lag operator L shifts the time index of a stochastic process
L (Xt)t∈T = (Xt−1)t∈TLXt = Xt−1
Rules
L2Xt = L (LXt) = Xt−2
LnXt = Xt−n
L−1 = Xt+1
L0Xt = Xt
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 46 / 143
ARMA modelsLag operator and lag polynomial
Lag polynomial
A(L) = a0 + a1L + a2L2 + . . .+ apL
p
Example: Let A(L) = 1− 0.5L and B(L) = 1 + 4L2, then
C (L) = A(L)B(L)
= (1− 0.5L)(1 + 4L2
)= 1− 0.5L + 4L2 − 2L3
Lag polynomials can be treated in the same way as ordinarypolynomials
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 47 / 143
ARMA modelsLag operator and lag polynomial
Define the lag polynomials
Φ(L) = 1− φ1L− . . .− φpLp
Θ(L) = 1 + θ1L + . . .+ θqLq
The ARMA(p, q) process can be written compactly as
Φ(L)Xt = Θ(L)εt
Important special cases
MA(q) process : Xt = εt + θ1εt−1 + . . .+ θqεt−q
AR(1) process : Xt = φ1Xt−1 + εt
AR(p) process : Xt = φ1Xt−1 + · · ·+ φpXt−p + εt
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 48 / 143
ARMA modelsMA(q) process
The MA(q) process is
Xt = Θ(L)εt
Xt = εt + θ1εt−1 + . . .+ θqεt−q
with εt ∼ NID(0, σ2ε)
Expectation function
E (Xt) = E (εt + θ1εt−1 + . . .+ θqεt−q)
= E (εt) + θ1E (εt−1) + . . .+ θqE (εt−q)
= 0
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 49 / 143
ARMA modelsMA(q) process
Autocovariance function
γ (s, t)
= E[
(εs + θ1εs−1 + . . .+ θqεs−q) (εt + θ1εt−1 + . . .+ θqεt−q)]
= E[εsεt + θ1εsεt−1 + θ2εsεt−2 + . . .+ θqεsεt−q
+θ1εs−1εt + θ21εs−1εt−1 + θ1θ2εs−1εt−2 + . . .+ θ1θqεs−1εt−q
+ . . .
+θqεs−qεt + θ1θqεs−qεt−1 + θ2θqεs−qεt−2 + . . .+ θ2qεs−qεt−q
]The expectations of the cross products are
E (εsεt) =
0 for s 6= tσ2ε for s = t
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 50 / 143
ARMA modelsMA(q) process
Define θ0 = 1, then
γ (t, t) = σ2ε
∑q
i=0θ2i
γ (t − 1, t) = σ2ε
∑q−1
i=0θiθi+1
γ (t − 2, t) = σ2ε
∑q−2
i=0θiθi+2
γ (t − q, t) = σ2εθ0θq = σ2
εθq
γ (s, t) = 0 for s < t − q
Hence, MA(q) processes are always stationary
Simulation of MA(q) processes (maqsim.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 51 / 143
ARMA modelsAR(1) process
The AR(1) process is
Φ(L)Xt = εt
(1− φ1L)Xt = εt
Xt = φ1Xt−1 + εt
with εt ∼ NID(0, σ2ε)
Expectation and variance function [6]
Stability condition: AR(1) processes are stable if |φ1| < 1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 52 / 143
ARMA modelsAR(1) process
Stationarity: Stable AR(1) processes are weakly stationary if [7]
E (X0) = 0
Var(X0) =σ2ε
1− φ21
Nonstationary stable processes converge towards stationarity [8]
It is common parlance to call stable processes stationary
Covariance function of stationary AR(1) process [9]
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 53 / 143
ARMA modelsAR(p) process
The AR(p) process is
Φ(L)Xt = εt
Xt = φ1Xt−1 + . . .+ φpXt−p + εt
with εt ∼ NID(0, σ2ε)
Assumption: εt is independent from Xt−1, Xt−2, . . . (innovations)
Expectation function [10]
The covariance function is complicated (ar2autocov.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 54 / 143
ARMA modelsAR(p) process
AR(p) processes are stable if all roots of the characteristic equation
Φ(z) = 0
are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p
An AR(p) process is weakly stationary if the joint distribution of thep initial values (X0,X−1, . . . ,X−(p−1)) is
”appropriate“
Stable AR(p) processes converge towards stationarity;they are often called stationary
Simulation of AR(p) processes (arpsim.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 55 / 143
ARMA modelsInvertability
AR and MA processes can be inverted (into each other)
Example: Consider the stable AR (1) process with |φ1| < 1
Xt = φ1Xt−1 + εt
= φ1(φ1Xt−2 + εt−1) + εt
= φ21Xt−2 + φ1εt−1 + εt
...
= φn1Xt−n + φn−11 εt−(n−1) + . . .+ φ2
1εt−2 + φ1εt−1 + εt
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 56 / 143
ARMA modelsInvertability
Since |φ1| < 1
Xt =∞∑i=0
φi1εt−i
= εt + θ1εt−1 + θ2εt−2 + . . .
with θi = φi1A stable AR(1) process can be written as an MA(∞) process(the same is true for stable AR(p) processes)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 57 / 143
ARMA modelsInvertability
Using lag polynomials this can be written as
(1− φ1L)Xt = εt
Xt = (1− φ1L)−1εt
Xt =∞∑i=0
(φ1L)iεt
General compact and elegant notation
Φ(L)Xt = εt
Xt = (Φ(L))−1εt
= Θ(L)εt
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 58 / 143
ARMA modelsInvertability
MA(q) can be written as AR(∞) if all roots of Θ(z) = 0 are largerthan 1 in absolute value (invertability condition)
Example: MA(1) with |θ1| < 1; from
Xt = εt + θ1εt−1
θ1Xt−1 = θ1εt−1 + θ21εt−2
we find Xt = θ1Xt−1 + εt − θ21εt−2
Repeated substitution of the εt−i terms yields
Xt =∞∑i=1
φiXt−i + εt with φi = (−1)i+1 θi1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 59 / 143
ARMA modelsInvertability
Summary
ARMA(p, q) processes are stable if all roots of
Φ(z) = 0
are larger than 1 in absolute value
ARMA(p, q) processes are invertible if all roots of
Θ(z) = 0
are larger than 1 in absolute value
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 60 / 143
ARMA modelsInvertability
Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q)process either as AR(∞) or as MA(∞)
ARMA(p, q) can be written as AR(∞) or MA(∞)
Φ(L)Xt = Θ(L)εt
Xt = (Φ(L))−1 Θ(L)εt
(Θ(L))−1 Φ(L)Xt = εt
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 61 / 143
ARMA modelsDeterministic components
Until now we only considered processes with zero expectation
Many processes have both a zero-expectation stochasticcomponent (Yt) and a non-zero deterministic component (Dt)
Examples:
linear trend Dt = a + btexponential trend Dt = abt
saisonal patterns
Let (Xt)t∈Z be a stochastic process with deterministic component Dt
and define Yt = Xt − Dt
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 62 / 143
ARMA modelsDeterministic components
Then E (Yt) = 0 and
Cov (Yt ,Ys) = E [(Yt − E (Yt)) (Ys − E (Ys))]
= E [(Xt − Dt − E (Xt−Dt))(Xs − Ds − E (Xs−Ds))]
= E [(Xt − E (Xt)) (Xs − E (Xs))]
= Cov (Xt ,Xs)
The covariance function does not depend on the deterministiccomponent
To derive the covariance function of a stochastic process, simply dropthe deterministic component
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 63 / 143
ARMA modelsDeterministic components
Special case: Dt = µt = µ
ARMA(p, q) process with constant (non-zero) expectation
Xt − µ = φ1(Xt−1 − µ) + . . .+ φp(Xt−p − µ)
+εt + θ1εt−1 + . . .+ θqεt−q
The process can also be written as
Xt = c + φ1Xt−1 + . . .+ φpXt−p + εt + θ1εt−1 + . . .+ θqεt−q
where c = µ (1− φ1 − . . .− φp)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 64 / 143
ARMA modelsDeterministic components
Wold’s representation theorem: Every stationary stochastic process(Xt)t∈T can be represented as
Xt =∞∑h=0
ψhεt−h + Dt
with ψ0 = 1,∑∞
h=0 ψ2j <∞ and εt white noise with variance σ2 > 0
Stationary stochastic processes can be written as a sum of adeterministic process and an MA(∞) process
Often, low order ARMA(p, q) processes can approximate MA(∞)processes well
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 65 / 143
ARMA modelsLinear processes and filter
Definition: Linear process
Let (εt)t∈Z be a white noise process; a stochastic process (Xt)t∈Z is calledlinear if it can be written as
Xt =∞∑
h=−∞ψhεt−h
= Ψ(L)εt
where the coefficients are absolutely summable, i.e.∑∞
h=−∞ |ψh| <∞.
The lag polynomial Ψ(L) is called (linear) filter
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 66 / 143
ARMA modelsLinear processes and filter
Some special filters
Change from previous period (difference filter)
Ψ(L) = 1− L
Change from last year (for quarterly or monthly data)
Ψ(L) = 1− L4
Ψ(L) = 1− L12
Elimination of saisonal influences (quarterly data)
Ψ(L) =(1 + L + L2 + L3
)/4
Ψ(L) = 0.125L2 + 0.25L + 0.25 + 0.25L−1 + 0.125L−2
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 67 / 143
ARMA modelsLinear processes and filter
Hodrick-Prescott filter (important tool in empirical macro economics)
Decompose a time series (Xt) into a long-term growth component(Gt) and a short-term cyclical component (Ct)
Xt = Gt + Ct
Trade-off between goodness-of-fit and smoothness of Gt
Minimize the criterion function
T∑t=1
(Xt − Gt)2 + λ
T−1∑t=2
[(Gt+1 − Gt)− (Gt − Gt−1)]2
with respect to Gt for given smoothness parameter λ
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 68 / 143
ARMA modelsLinear processes and filter
The FOCs of the minimization problem are G1...
GT
= A
X1...
XT
where A = (I + λK ′K )−1 with
K =
1 −2 1 0 0 . . . 0 0 00 1 −2 1 0 . . . 0 0 00 0 1 −2 1 . . . 0 0 0...
......
...... . . .
......
...0 0 0 0 0 . . . 1 −2 1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 69 / 143
ARMA modelsLinear processes and filter
The HP filter is a linear filter
Typical values for smoothing parameter λ
λ = 10 annual dataλ = 1600 quarterly dataλ = 14400 monthly data
Implementation in R (code by Olaf Posch)
Empirical examples (hpfilter.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 70 / 143
Estimation of ARMA modelsThe estimation problem
Problem: The parameters φ1, . . . , φp, θ1, . . . , θq, σ2ε of an ARMA(p, q)
process are usually unknown
They have to be estimated from an observed time series X1, . . . ,XT
Standard estimation methods:
Least squares (OLS)Maximum likelihood (ML)
Assumption: the lag orders p and q are known
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 71 / 143
Estimation of ARMA modelsLeast squares estimation of AR(p) models
The AR(p) model with non-zero constant expectation
Xt = c + φ1Xt−1 + . . .+ φpXt−p + εt
can be writte in matrix notationXp+1
Xp+2...
XT
=
1 Xp Xp−1 . . . X1
1 Xp+1 Xp . . . X2...
......
. . ....
1 XT−1 XT−2 . . . XT−p
cφ1...φp
+
εp+1
εp+2...εT
Compact notation: y = Xβ + u
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 72 / 143
Estimation of ARMA modelsLeast squares estimation of AR(p) models
The standard least squares estimator is
β =(X′X
)−1X′y
The matrix of exogenous variables X is stochastic−→ usual results for OLS regression do not hold
But: There is no contemporaneous correlation between the error termand the exogenous variables
Hence, the OLS estimators are consistent and asymptotically efficient
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 73 / 143
Estimation of ARMA modelsLeast squares estimation of ARMA models
Solve the ARMA equation
Xt = c + φ1Xt−1 + . . .+ φpXt−p + εt + θ1εt−1 + . . .+ θqεt−q
for εt ,
εt = Xt − c − φ1Xt−1 − . . .− φpXt−p − θ1εt−1 − . . .− θqεt−q
Define the residuals as functions of the unknown parameters
εt (d , f1, . . . , fp, g1, . . . , gq) = Xt − d − f1Xt−1 − . . .− fpXt−p
−g1εt−1 − . . .− gq εt−q
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 74 / 143
Estimation of ARMA modelsLeast squares estimation of ARMA models
Define the sum of squared residuals
S (d , f1, . . . , fp, g1, . . . , gq) =T∑t=1
(εt (d , f1, . . . , fp, g1, . . . , gq))2
The least squares estimators are
(c , φ1, . . . , φp, θ1, . . . , θq) = arg min S (d , f1, . . . , fp, g1, . . . , gq)
Since the residuals are defined recursively one needs starting valuesε0, . . . , ε−q+1 and X0, . . . ,X−p+1 to calculate ε1
Easiest way: Set all starting values to zero (”conditional estimation“)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 75 / 143
Estimation of ARMA modelsLeast squares estimation of ARMA models
The first order conditions are a nonlinear equation systemwhich cannot be solved easily
Minimization by standard numerical methods(implemented in all usual statistical packages)
Either solve the nonlinear first order conditions equation system orminimize S
Simple special case: ARMA(1, 1)
arma11.R
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 76 / 143
Estimation of ARMA modelsMaximum likelihood estimation
Additional assumption: The innovations εt are normally distributed
Implication: ARMA processes are Gaussian
The joint distribution of X1, . . . ,XT is multivariat normal
X =
X1...
XT
∼ N (µ,Σ)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 77 / 143
Estimation of ARMA modelsMaximum likelihood estimation
Expectation vector
µ = E
X1
...XT
=
c/ (1− φ1 − . . .− φp)...
c/ (1− φ1 − . . .− φp)
Covariance matrix
Σ = Cov
X1
X2...
XT
=
γ(0) γ(1) . . . γ(T − 1)γ(1) γ(0) . . . γ (T − 2)
......
. . ....
γ(T − 1) γ (T − 2) . . . γ(0)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 78 / 143
Estimation of ARMA modelsMaximum likelihood estimation
The expectation vector and the covariance matrix contain allunknown parameters ψ =
(φ1, . . . , φp, θ1, . . . , θq, c , σ
2ε
)The likelihood function is
L (ψ; X) = (2π)−T/2 (det Σ)−1/2 exp
(−1
2(X− µ)′Σ−1 (X− µ)
)and the loglikelihood function is
ln L (ψ; X) = −T
2ln (2π)− 1
2ln (det Σ)− 1
2(X− µ)′Σ−1 (X− µ)
The ML estimators are ψ = arg max ln L (ψ; X)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 79 / 143
Estimation of ARMA modelsMaximum likelihood estimation
The loglikelihood function has to be maximized by numerical methods
Standard properties of ML estimators:
1 consistency2 asymptotic efficiency3 asymptotically jointly normally distributed4 the covariance matrix of the estimators can be consistently estimated
Example: ML estimation of an ARMA(3, 3) model for the interestrate spread (arma33.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 80 / 143
Estimation of ARMA modelsHypothesis tests
Since the estimation method is maximum likelihood,the classical tests (Wald, LR, LM) are applicable
General null and alternative hypotheses
H0 : g(ψ) = 0
H1 : not H0
where g(ψ) is an m-valued function of the parameters
Example: If H0 : φ1 = 0 then m = 1 and g(ψ) = φ1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 81 / 143
Estimation of ARMA modelsHypothesis tests
Likelihood ratio test statistic
LR = 2(ln L(θML)− ln L(θR))
where θML and θR are the unrestricted and restricted estimators
Under the null hypothesis
LRd−→ U ∼ χ2
m
and H0 is rejected at significance level α if LR > χ2m;1−α
Disadvantage: Two models must be estimated
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 82 / 143
Estimation of ARMA modelsHypothesis tests
For the Wald test we only consider g(ψ) = ψ − ψ0, i.e.
H0 : ψ = ψ0
H1 : not H0
Test statisticW = (ψ − ψ0)′Cov(ψ)(ψ − ψ0)
If the null hypothesis is true then Wd−→ U ∼ χ2
m
The asymptotic covariance matrix can be estimated consistently asCov(ψ) = H−1 where H is the Hessian matrix returned by themaximization procedure
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 83 / 143
Estimation of ARMA modelsHypothesis tests
Test example 1:
H0 : φ1 = 0
H1 : φ1 6= 0
Test example 2
H0 : ψ = ψ0
H1 : not H0
Illustration (arma33.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 84 / 143
Estimation of ARMA modelsModel selection
Usually, the lag orders p and q of an ARMA model are unknown
Trade-off: Goodness-of-fit against parsimony
Akaike’s information criterion for the model with non-zero expectation
AIC = ln σ2︸︷︷︸goodness-of-fit
+ 2 (p + q + 1) /T︸ ︷︷ ︸penalty
Choose the model with the smallest AIC
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 85 / 143
Estimation of ARMA modelsModel selection
Bayesian information criterion BIC (Schwarz information criterion)
BIC = ln σ2 + (p + q + 1) · lnT/T
Hannan-Quinn information criterion
HQ = ln σ2 + 2 (p + q + 1) · ln (lnT ) /T
Both BIC and HQ are consistent while the AIC tends to overfit
Illustration (arma33.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 86 / 143
Estimation of ARMA modelsModel selection
Another illustration: The true model is ARMA(2, 1) withXt = 0.5Xt−1 + 0.3Xt−2 + εt + 0.7εt−1; 1000 samples of size n = 500were generated; the table shows the model’s orders p and q as selected byAIC and BIC
# orders selected by AIC # orders selected by BICq q
p 0 1 2 3 4 5 0 1 2 3 4 50 0 0 0 0 0 0 0 0 0 0 0 01 0 18 64 23 14 6 0 310 167 4 0 02 0 171 21 16 5 7 0 503 3 1 0 03 0 7 35 58 80 45 1 0 2 1 0 04 9 2 12 139 37 44 6 1 0 0 0 05 11 6 12 56 46 56 1 0 0 0 0 0
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 87 / 143
Integrated processesDifference operator
Define the difference operator
∆ = 1− L,
then∆Xt = Xt − Xt−1
Second order differences
∆2 = ∆(∆) = (1− L)2 = 1− 2L + L2
Higher orders ∆n are defined in the same way; note that ∆n 6= 1− Ln
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 88 / 143
Integrated processesDefinition
Definition: Integrated process
A stochastic process is called integrated of order 1 if
∆Xt = µ+ Ψ(L)εt
where εt is white noise, Ψ(1) 6= 0, and∑∞
j=0 j |ψj | <∞
Common notation: Xt ∼ I (1)
I (1) processes are also called difference stationary orunit root processes
Stochastic and deterministic trends
Trend stationary processes are not I (1) (since Ψ(1) = 0)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 89 / 143
Integrated processesDefinition
Stationary processes are sometimes called I (0)
Higher order integrations are possible, e.g.
Xt ∼ I (2)
∆2Xt ∼ I (0)
In general, Xt ∼ I (d) means that ∆dXt ∼ I (0)
Most economic time series are either I (0) or I (1)
Some economic time series may be I (2)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 90 / 143
Integrated processesDefinition
Example 1: The random walk with drift, Xt = b + Xt−1 + εt ,is I (1) because
∆Xt = Xt − Xt−1
= b + εt
= b + Ψ(L)εt
where ψ0 = 1 and ψj = 0 for j 6= 0
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 91 / 143
Integrated processesDefinition
Example 2: The trend stationary process, Xt = a + bt + εt ,is not I (1) because
∆Xt = b + εt − εt−1
= Ψ(L)εt
with ψ0 = 1, ψ1 = −1 and ψj = 0 for all other j
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 92 / 143
Integrated processesDefinition
Example 3: The”AR(2) process“
Xt = b + (1 + φ)Xt−1 − φXt−2 + εt
(1− φL) (1− L)Xt = b + εt
is I (1) if |φ| < 1 because ∆Xt = Ψ(L) (b + εt) with
Ψ(L) = (1− φL)−1 = 1 + φL + φ2L2 + φ3L3 + φ4L4 + . . .
and thus Ψ(1) =∑∞
i=0 φi = 1
1−φ 6= 0. The roots of the characteristicequation are z = 1 and z = 1/φ
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 93 / 143
Integrated processesDefinition
Example 4: The process
Xt = 0.5Xt−1 − 0.4Xt−2 + εt
is a stationary (stable) zero expectation AR(2) process; the process
Yt = a + bt + Xt
is trend stationary and I (0) since
∆Yt = b + ∆Xt
with ∆Xt = Ψ(L)εt = (1− L)(1− 0.5L + 0.4L2
)−1εt
and therefore Ψ(1) = 0 (i0andi1.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 94 / 143
Integrated processesDefinition
Definition: ARIMA process
Let (εt)t∈T be a white noise process; the stochastic process (Xt)t∈Z iscalled integrated autoregressive moving-average process of the ordersp, d and q, or ARIMA(p, d , q), if ∆dXt is an ARMA(p, q) process
Φ(L)∆dXt = Θ(L)εt
For d > 0 the process is nonstationary (I (d)) even if all roots ofΦ(z) = 0 are outside the unit circle
Simulation of an ARIMA(p, d , q) process (arimapdqsim.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 95 / 143
Integrated processesDeterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends?
Reason 1: Long-term forecasts and forecasting errors
Deterministic trend: The forecasting error variance is bounded
Stochastic trend: The forecasting error variance is unbounded
Illustrations
i0andi1.R
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 96 / 143
Integrated processesDeterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends?
Reason 2: Spurious regression
OLS regressions will show spurious relationships betweentime series with (deterministic or stochastic) trends
Detrending works if the series have deterministic trends,but it does not help if the series are integrated
Illustrations
spurious1.R
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 97 / 143
Integrated processesIntegrated processes and parameter estimation
OLS estimators (and ML estimators) are consistent andasymptotically normal for stationary processes
The asymptotic normality is lost if the processes are integrated
We only look at the very special case
Xt = φ1Xt−1 + εt
with εt ∼ NID(0, 1) and X0 = 0
The AR(1) process is stationary if |φ1| < 1 and has a unit rootif |φ1| = 1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 98 / 143
Integrated processesIntegrated processes and parameter estimation
The usual OLS estimator of φ1 is
φ1 =
∑Tt=1 XtXt−1∑Tt=1 X
2t−1
How does the distribution of φ look like?
Influence of φ and T
Consistency?
Asymptotic normality?
Illustration (phihat.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 99 / 143
Integrated processesIntegrated processes and parameter estimation
Consistency and asymptotic normality for I (0) processes (|φ1| < 1)
plim φ1 = φ1
√T(φ1 − φ1
)d→ Z ∼ N
(0, 1− φ2
1
)Consistency and asymptotic normality for I (1) processes (φ1 = 1)
plim φ1 = 1
T(φ1 − 1
)d→ V
where V is a nondegenerate, nonnormal random variable
Root-T -consistency and superconsistency
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 100 / 143
Integrated processesUnit root tests
Importance to distinguish between trend stationarity and differencestationarity
Test of hypothesis that a process has a unit root (i.e. is I (1))
Classical approaches: (Augmented) Dickey-Fuller-Test,Phillips-Perron-Test
Basic tool: Linear regression
Xt = deterministics + φXt−1 + εt
∆Xt = deterministics + (φ− 1)︸ ︷︷ ︸=:β
Xt−1 + εt
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 101 / 143
Integrated processesUnit root tests
Null and alternative hypothesis
H0 : φ = 1 (unit root)
H1 : |φ| < 1 (no unit root)
or, equivalently,
H0 : β = 0 (unit root)
H1 : β < 0 (no unit root)
Unit root tests are one-sided; explosive process are ruled out
Rejecting the null hypothesis is evidence in favour of stationarity
If the null hypothesis is not rejected, there could be a unit root
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 102 / 143
Integrated processesDF test and ADF test
Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests
Possible regressions
Xt = φXt−1 + εt or ∆Xt = βXt−1 + εtXt = a + φXt−1 + εt or ∆Xt = a + βXt−1 + εtXt = a + bt + φXt−1 + εt or ∆Xt = a + bt + βXt−1 + εt
Assumption for Dickey-Fuller test: no autocorrelation in εt
If there is autocorrelation in εt , use the augmented DF test
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 103 / 143
Integrated processesDF test and ADF test
Dickey-Fuller regression, case 1: no constant, no trend
∆Xt = βXt−1 + εt
Null and alternative hypotheses
H0 : β = 0
H1 : β < 0
Null hypothesis: stochastic trend without drift
Alternative hypothesis: stationary process around zero
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 104 / 143
Integrated processesDF test and ADF test
Dickey-Fuller regression, case 2: constant, no trend
∆Xt = a + βXt−1 + εt
Null and alternative hypotheses
H0 : β = 0 or H0 : β = 0, a = 0
H1 : β < 0 or H0 : β < 0, a 6= 0
Null hypothesis: stochastic trend without drift
Alternative hypothesis: stationary process around a constant
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 105 / 143
Integrated processesDF test and ADF test
Dickey-Fuller regression, case 3: constant and trend
∆Xt = a + bt + βXt−1 + εt
Null and alternative hypotheses
H0 : β = 0 or β = 0, b = 0
H1 : β < 0 or β < 0, b 6= 0
Null hypothesis: stochastic trend with drift
Alternative hypothesis: trend stationary process
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 106 / 143
Integrated processesDF test and ADF test
Dickey-Fuller test statistics for single hypotheses
“ρ-test” : T · β“τ -test” : β/σφ
The τ -test statistic is computed in the same way asthe usual t-test statistic
Reject the null hypothesis if the test statistics are too small
The critical values are not the quantiles of the t-distribution
There are tables with the correct critical values(e.g. Hamilton, table B.6)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 107 / 143
Integrated processesDF test and ADF test
The Dickey-Fuller test statistics for the joint hypotheses arecomputed in the same way as the usual F -test statistics
Reject the null hypothesis if the test statistic is too large
The critical values are not the quantiles of the F -distribution
There are tables with the correct critical values(e.g. Hamilton, table B.7)
Illustrations (dftest.R)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 108 / 143
Integrated processesDF test and ADF test
If there is autocorrelation in εt the DF test does not work (dftest.R)
Augmented Dickey-Fuller test (ADF test) regressions:
∆Xt = γ1∆Xt−1 + . . .+ γp∆Xt−p + βXt−1 + εt∆Xt = a + γ1∆Xt−1 + . . .+ γp∆Xt−p + βXt−1 + εt∆Xt = a + bt + γ1∆Xt−1 + . . .+ γp∆Xt−p + βXt−1 + εt
The added lagged differences capture the autocorrelation
The number of lags p must be large enough to make εt white noise
The critical values remain the same as in the no-correlation case
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 109 / 143
Integrated processesDF test and ADF test
Further interesting topics (but we skip these)
Phillips-Perron test
Structural breaks and unit roots
KPSS test of stationarity
H0 : Xt ∼ I (0)
H1 : Xt ∼ I (1)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 110 / 143
Integrated processesRegression with integrated processes
Spurious regression: If Xt and Yt are independent but both I (1)then the regression
Yt = α + βXt + ut
will result in an estimated coefficient β that is significantly differentfrom 0 with probability 1 as T →∞BUT: The regression
Yt = α + βXt + ut
may be sensible even though Xt and Yt are I (1)
Cointegration
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 111 / 143
Integrated processesRegression with integrated processes
Definition: Cointegration
Two stochastic processes (Xt)t∈T and (Yt)t∈T are cointegrated if bothprocesses are I (1) and there is a constant β such that the process(Yt − βXt) is I (0)
If β is known, cointegration can be tested using a standard unit roottest on the process (Yt − βXt)
If β is unknown, it can be estimated from the linear regression
Yt = α + βXt + ut
and cointegration is tested using a modified unit root test on theresidual process (ut)t=1,...,T
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 112 / 143
GARCH modelsConditional expectation
Let (X ,Y ) be a bivariate random variable with a joint densityfunction, then
E (X |Y = y) =
∫ ∞−∞
x fX |Y=y (x)dx
is the conditional expectation of X given Y = y
E (X |Y ) denotes a random variable with realization E (X |Y = y)if the random variable Y realizes as y
Both E (X |Y ) and E (X |Y = y) are called conditional expectation
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 113 / 143
GARCH modelsConditional variance
Let (X ,Y ) be a bivariate random variable with a joint densityfunction, then
Var(X |Y = y) =
∫ ∞−∞
(x − E (X |Y = y))2fX |Y=y (x)dx
is the conditional variance of X given Y = y
Var(X |Y ) denotes a random variable with realization Var(X |Y = y)if the random variable Y realizes as y
Both Var(X |Y = y) and Var(X |Y ) are called conditional variance
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 114 / 143
GARCH modelsRules for conditional expectations
1 Law of iterated expectations: E (E (X |Y )) = E (X )
2 If X and Y are independent, then E (X |Y ) = E (X )
3 The condition can be treated like a constant,E (XY |Y ) = Y · E (X |Y )
4 The conditional expecation is a linear operator. For a1, . . . , an ∈ R
E
(n∑
i=1
aiXi |Y
)=
n∑i=1
aiE (Xi |Y )
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 115 / 143
GARCH modelsBasics
Some economic time series show volatility clusters, e.g. stock returns,commodity price changes, inflation rates, . . .
Simple autoregressive models cannot capture volatility clusters sincetheir conditional variance is constant
Example: Stationary AR(1)-process, Xt = αXt−1 + εt with |α| < 1;then
Var(Xt) = σ2X =
σ2ε
1− α2,
and the conditional variance is
Var(Xt |Xt−1) = σ2ε
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 116 / 143
GARCH modelsBasics
In the following, we will focus on stock returns
Empirical fact: squared (or absolute) returns are positivelyautocorrelated
Implication: Returns are not independent over time
The dependence is nonlinear
How can we model this kind of dependence?
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 117 / 143
GARCH modelsARCH(1)-process
Definition: ARCH(1)-process
The stochastic process (Xt)t∈Z is called ARCH(1)-process if
E (Xt |Xt−1) = 0
Var(Xt |Xt−1) = σ2t
= α0 + α1X2t−1
for all t ∈ Z, with α0, α1 > 0
Often, an additional assumption is
Xt | (Xt−1 = xt−1) ∼ N(0, α0 + α1x2t−1)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 118 / 143
GARCH modelsARCH(1)-process
The unconditional distribution of Xt is a non-normal distribution
Leptokurtosis: The tails are heavier than the tails of the normaldistribution
Example of an ARCH(1)-process
Xt = εtσt
where (εt)t∈Z is white noise with σ2ε = 1 and
σt =√α0 + α1X 2
t−1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 119 / 143
GARCH modelsARCH(1)-process
One can show that [11]
E (Xt |Xt−1) = 0
E (Xt) = 0
Var(Xt |Xt−1) = α0 + α1X2t−1
Var(Xt) = α0/ (1− α1)
Cov(Xt ,Xt−i ) = 0 for i > 0
Stationarity condition: 0 < α1 < 1
The unconditional kurtosis is 3(1− α21)/(1− 3α2
1) if εt ∼ N(0, 1).If α1 >
√1/3 = 0.57735, the kurtosis does not exist. [12]
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 120 / 143
GARCH modelsARCH(1)-process
Squared returns follow [13]
X 2t = α0 + α1X
2t−1 + vt
with vt = σ2t (ε2
t − 1)
Thus, squared returns of ARCH(1) are AR(1)
The process (vt)t∈Z is white noise
E (vt) = 0
Var(vt) = E (v2t ) = const.
Cov(vt , vt−i ) = 0 (i = 1, 2, . . .)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 121 / 143
GARCH modelsARCH(1)-process
Simulation of an ARCH(1)-process for t = 1, . . . , 2500
Parameters: α0 = 0.05, α1 = 0.95, start value X0 = 0
Conditional distribution: εt ∼ N(0, 1)
archsim.R
Check whether the simulated time series shows the typical stylizedfacts of return distributions
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 122 / 143
GARCH modelsEstimation of an ARCH(1)-process
Of course, we do not know the true values of the model parametersα0 and α1
How can we estimate the unknown parameters α0 and α1 ?
Observations X1, . . . ,XT
Because ofX 2t = α0 + α1X
2t−1 + vt
a possible estimation method is OLS
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 123 / 143
GARCH modelsEstimation of an ARCH(1)-process
OLS estimator of α1
α1 =
∑Tt=2
(X 2t − X 2
t
)(X 2t−1 − X 2
t−1
)∑T
t=2
(X 2t−1 − X 2
t−1
)2≈ ρ(X 2
t ,X2t−1)
Careful: These estimators are only consistent if the kurtosis exists(i.e. if α1 <
√1/3)
Test of ARCH-effects
H0 : α1 = 0
H1 : α1 > 0
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 124 / 143
GARCH modelsEstimation of an ARCH(1)-process
For T large, under H0
√T α1 ∼ N(0, 1)
Reject H0 if√T α1 > Φ−1(1− α)
Second version of this test: Consider the R2 of the regression
X 2t = α0 + α1X
2t−1 + vt ,
then under H0
T α21 ≈ TR2 appr∼ χ2
1
Reject H0 if TR2 > F−1χ2
1(1− α)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 125 / 143
GARCH modelsARCH(p)-process
Definition: ARCH(p)-process
The stochastic process (Xt)t∈Z is called ARCH(p)-process if
E (Xt |Xt−1, . . .Xt−p) = 0
Var(Xt |Xt−1, . . . ,Xt−p) = σ2t
= α0 + α1X2t−1 + . . .+ αpX
2t−p
for t ∈ Z, where αi ≥ 0 for i = 0, 1, . . . , p − 1 and αp > 0
Often, an additional assumption is that
Xt |(Xt−1 = xt−1, . . . ,Xt−p = xt−p) ∼ N(0, σ2t )
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 126 / 143
GARCH modelsARCH(p)-process
Example of an ARCH(p)-process
Xt = εtσt
where(εt)t∈Z is white noise with σ2ε = 1 and
σt =√α0 + α1X 2
t−1 + . . .+ αpX 2t−p
An ARCH(p) process is weakly stationary if all roots of1− α1z − α2z
2 − . . .− αpzp = 0 are outside the unit circle
Then, for all t ∈ Z, E (Xt) = 0 and
Var(Xt) =α0
1−∑p
i=1 αi
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 127 / 143
GARCH modelsARCH(p)-process
If (Xt)t∈Z is a stationary ARCH(p) process, then (X 2t )t∈Z is a
stationary AR(p) process
X 2t = α0 + α1X
2t−1 + . . .+ αpX
2t−p + vt
As to the error term,
E (vt) = 0
Var(vt) = const.
Cov(vt , vt−i ) = 0 for i = 1, 2, . . .
Simulating an ARCH(p) is easy
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 128 / 143
GARCH modelsEstimation of ARCH(p) models
OLS estimation of
X 2t = α0 + α1X
2t−1 + . . .+ αpX
2t−p + vt
Test of ARCH-effects
H0 : α1 = α2 = . . . = αp = 0 vs H1 : not H0
Let R2 denote the coefficient of determination of the regression
Under H0, the test statistic TR2 ∼ χ2p;
thus reject H0 if TR2 > F−1χ2p
(1− α)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 129 / 143
GARCH modelsMaximum likelihood estimation
Basic idea of the maximum likelihood estimation method:Choose parameters such that the joint density of the observations
fX1,...,XT(x1, . . . , xT )
is maximized
Let X1, . . . ,XT denote a random sample from X
The density fX (x ; θ) depends on R unknown parametersθ = (θ1, . . . , θR)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 130 / 143
GARCH modelsMaximum likelihood estimation
ML estimation of θ: Maximize the (log)likelihood function
L (θ) = fX1,...XT(x1, . . . , xT ; θ)
=T∏t=1
fX (xt ; θ)
ln L (θ) =T∑t=1
ln fX (xt ; θ)
ML estimateθ = argmax [ln L (θ)]
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 131 / 143
GARCH modelsMaximum likelihood estimation
Since observations are independent in random samples
fX1,...,XT(x1, . . . , xT ) =
T∏t=1
fXt (xt)
or
ln fX1,...,XT(x1, . . . , xT ) =
T∑t=1
ln fXt (xt)
=T∑t=1
ln fX (xt)
But: ARCH-returns are not independent!
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 132 / 143
GARCH modelsMaximum likelihood estimation
Factorization with dependent observations
fX1,...,XT(x1, . . . , xT ) =
T∏t=1
fXt |Xt−1,...,X1(xt |xt−1, . . . , x1)
or
ln fX1,...,XT(x1, . . . , xT ) =
T∑t=1
ln fXt |Xt−1,...,X1(xt |xt−1, . . . , x1)
Hence, for an ARCH(1)-process
fX1,...,XT(x1, . . . , xT ) = fX1(x1)
T∏t=2
1√
2π√σ2t
exp
(−1
2
(xtσt
)2)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 133 / 143
GARCH modelsMaximum likelihood estimation
The marginal density of X1 is complicated but becomes negligible forlarge T and, therefore, will be dropped from now on
Log-likelihood function (without initial marginal density)
ln L(α0, α1|x1, . . . , xT )
= −T
2ln 2π − 1
2
T∑t=2
lnσ2t −
1
2
T∑t=2
(xtσt
)2
where σ2t = α0 + α1x
2t−1
ML-estimation of α0 and α1 by numerical maximization ofln L(α0, α1) with respect to α0 and α1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 134 / 143
GARCH modelsGARCH(p,q)-process
Definition: GARCH(p,q)-process
The stochastic process (Xt)t∈Z is called GARCH(p, q)-process if
E (Xt |Xt−1,Xt−2, . . .) = 0
Var(Xt |Xt−1,Xt−2, . . .) = σ2t
= α0 + α1X2t−1 + . . .+ αpX
2t−p
+β1σ2t−1 + . . .+ βqσ
2t−q
for t ∈ Z with αi , βi ≥ 0
Often, an additional assumption is that
(Xt |Xt−1 = xt−1,Xt−2 = xt−2, . . .) ∼ N(0, σ2t )
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 135 / 143
GARCH modelsGARCH(p,q)-process
Conditional variance of GARCH(1, 1)
Var(Xt |Xt−1,Xt−2, . . .) = σ2t
= α0 + α1X2t−1 + β1σ
2t−1
=α0
1− β1+ α1
∞∑i=1
βi−11 X 2
t−i
Unconditional variance
Var(Xt) =α0
1−∑p
i=1 αi −∑q
j=1 βj
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 136 / 143
GARCH modelsGARCH(p,q)-process
Necessary condition for weak stationarity
p∑i=1
αi +
q∑j=1
βj < 1
(Xt)t∈Z has no autocorrelation
GARCH-processes can be written as ARMA(max (p, q) , q)-processesin the squared returns
Example: GARCH(1, 1)-process with Xt = εtσt andσ2t = α0 + α1X
2t−1 + β1σ
2t−1
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 137 / 143
GARCH modelsEstimation of GARCH(p,q)-processes
Estimation of the ARMA(max (p, q) , q)-process in the squared returns
Alternative (and better) method: Maximum likelihood
For a GARCH(1, 1)-process
fX1,...,XT(x1, . . . , xT )
= fX1(x1)T∏t=2
1√
2π√σ2t
exp
(−1
2
(xtσt
)2)
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 138 / 143
GARCH modelsEstimation of GARCH(p,q)-processes
Again, the density of X1 can be neglected
Log-Likelihood function
ln L(α0, α1, β1|x1, . . . , xT )
= −T
2ln 2π − 1
2
T∑t=2
lnσ2t −
1
2
T∑t=2
(xtσt
)2
with σ2t = α0 + α1x
2t−1 + β1σ
2t−1 and σ2
1 = 0
ML-estimation of α0, α1 and β1 by numerical maximization
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 139 / 143
GARCH modelsEstimation of GARCH(p,q)-processes
Conditional h-step forecast of the volatility σ2t+h in a GARCH(1, 1)
model
E(σ2t+h|Xt ,Xt−1, . . .
)= (α1 + β1)h
(σ2t −
α0
1− α1 − β1
)+
α0
1− α1 − β1
If the process is stationary
limh→∞
E (σ2t+h|Xt ,Xt−1, . . .) =
α0
1− α1 − β1
Simulation of GARCH-processes is easy; the estimation can becomputer intensive
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 140 / 143
GARCH modelsResiduals of an estimated GARCH(1,1) model
Careful: Residuals are slightly different from what you know from OLSregressions
Estimates: α0, α1, β1, µ
From σ2t = α0 + α1X
2t−1 + β1σ
2t−1 and Xt = µ+ σtεt we calculate
the standardized residuals
εt =Xt − µσt
=Xt − µ√
α0 + α1X 2t−1 + β1σ2
t−1
Histogram of the standardized residuals
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 141 / 143
GARCH modelsAR(p)-ARCH(q)-models
Definition: (Xt)t∈Z is called AR(p)-ARCH(q)-process if
Xt = µ+ φ1Xt−1 + εt
σ2t = α0 + α1ε
2t−1
where εt ∼ N(0, σ2t )
mean equation / variance equation
Maximum likelihood estimation
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 142 / 143
GARCH modelsExtensions of the GARCH model
There are a number of possible extensions to the GARCH model:
Empirical fact: Negative shocks have a larger impact on volatilitythan positive shocks (leverage effect)
News impact curve
Nonnormal innovations, e.g. εt ∼ tν
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 143 / 143