financial econometrics - time series

33
Financial Econometrics - Time Series Johnew Zhang March 5, 2018 Contents 1 Basic Time Series Concepts 3 1.1 Some Processes ......................................... 3 1.2 How should we define market efficiency? ........................... 6 2 Forecasting 7 2.1 Wold’s Decomposition Theorem ................................ 7 3 Introduction to the Generalized Method Moments (Hayashi) 8 3.1 Endogeneity Bias ........................................ 8 3.2 Two-Staged Least Square ................................... 8 3.3 Single Equation GMM ..................................... 9 3.4 Hansen’s J-Test ......................................... 12 3.5 Likelihood-Ratio Test of H 0 .................................. 12 3.6 Newey-West Motivation .................................... 12 3.7 Hansen-Hodrick JPE(1980) .................................. 13 3.8 Non-linear GMM: Consumption-based Asset Pricing .................... 14 3.9 The Asymptotic Distribution of the Orthogonality Conditions ............... 16 3.10 Hansen-Hodrick (1983) ..................................... 17 4 Vector Auto-regression 17 4.1 Maximum Likelihood Estimation ............................... 17 4.2 Vector Auto-regression ..................................... 18 4.3 Choice of Lag Length ..................................... 20 4.4 Cambell (1991) and Hodrick (1992) .............................. 20 4.5 Impulse Response Functions .................................. 22 4.6 Variance Decompositions .................................... 23 4.7 Models of Non-Stationarity Time Series ........................... 23 4.7.1 Compare Forecasts ................................... 24 4.8 Hodrick-Prescott Filter ..................................... 25 4.9 Special Cases .......................................... 25 4.10 Beveridge-Nelson Decomposition ............................... 25 4.11 Fractional Integration ..................................... 26 4.11.1 GARCH ......................................... 27 4.12 Testing For Unit-Root ..................................... 27 4.13 Cointegration .......................................... 28 4.13.1 Purchasing Power Parity ................................ 29 4.14 Price-Dividend Ratio and Campbell-Shiller Decomposition ................. 29 4.14.1 Lettau-Ludigson: log Consumption Wealth Ratio .................. 29 4.14.2 Hamilton’s Canonical Example ............................ 30 4.15 Normalizations ......................................... 31 4.16 Triangular Representation ................................... 32 1

Upload: others

Post on 20-Nov-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Financial Econometrics - Time Series

Financial Econometrics - Time Series

Johnew Zhang

March 5, 2018

Contents

1 Basic Time Series Concepts 31.1 Some Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 How should we define market efficiency? . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Forecasting 72.1 Wold’s Decomposition Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Introduction to the Generalized Method Moments (Hayashi) 83.1 Endogeneity Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Two-Staged Least Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Single Equation GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4 Hansen’s J-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Likelihood-Ratio Test of H0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.6 Newey-West Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.7 Hansen-Hodrick JPE(1980) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.8 Non-linear GMM: Consumption-based Asset Pricing . . . . . . . . . . . . . . . . . . . . 143.9 The Asymptotic Distribution of the Orthogonality Conditions . . . . . . . . . . . . . . . 163.10 Hansen-Hodrick (1983) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Vector Auto-regression 174.1 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Vector Auto-regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Choice of Lag Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4 Cambell (1991) and Hodrick (1992) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.5 Impulse Response Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6 Variance Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.7 Models of Non-Stationarity Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.7.1 Compare Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.8 Hodrick-Prescott Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.9 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.10 Beveridge-Nelson Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.11 Fractional Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.11.1 GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.12 Testing For Unit-Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.13 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.13.1 Purchasing Power Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.14 Price-Dividend Ratio and Campbell-Shiller Decomposition . . . . . . . . . . . . . . . . . 29

4.14.1 Lettau-Ludigson: log Consumption Wealth Ratio . . . . . . . . . . . . . . . . . . 294.14.2 Hamilton’s Canonical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.15 Normalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.16 Triangular Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1

Page 2: Financial Econometrics - Time Series

4.17 Error Correlation VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2

Page 3: Financial Econometrics - Time Series

1 Basic Time Series Concepts

Suppose stochastic process Yt. We need to do inference on ytTt=1.Let fYt(yt) unconditional density of Yt and its mean is

E(Yt) =

∫ ∞−∞

ytfY (yt)dyt

where Yt = µ+ εt, E(εt) = 0, V (εt) = σ2. It is easy to see E(Yt) = µ.Alternatively, there may be period so we can define Yt = βt + εt. Here E(Yt) = βt and V (Yt) =

E((Yt − µt)2) =∫∞−∞(yt − µt)2fY (yt)dyt = γ0t = σ2

Definition 1. Auto-covariance: YtTt=1. Consider vector xt =

YtYt−1

...Yt−j

. The joint distribution is

(Yt, Yt−1, · · · , Yt−j). Therefore the jth auto-covariance is

γjt =

∫ ∞−∞· · ·∫ ∞−∞

(yt − µt)(yt−j − µt−j)fYt,··· ,Yt−j (yt, · · · , yt−j)dyt · · · dyt−j

Serial correlation implies non-zero auto-covariances.

Definition 2. Stationarity:

• Covariance Stationarity means E(Yt) = µ and E[(Yt − µ)(Yt−j − µ)] = γj (it does not dependenton time t. For scalars, γj = γ−j and for matrix Cj, it will be Cj = C ′j.

• Strict Stationarity: The entire joint distribution of (Yt, · · · , Yt−j1 , · · ·Yt−j2 , · · · , Yt−jn) dependsonly on the intervals seperating the dates (j1, j2, · · · , jn).

• Note that a process is strictly stationary with finite second moments, then it must be covariance-stationary. Strict stationarity does not imply weak stationarity (Cauchy distribution). Weak doesnot imply the strict stationarity. If the processes are Gaussian, then weak is equivalent to thestrict.

Definition 3. Ergodicity: Time series averages are going to converge to the unconditional momentsas T →∞. It means y = 1

T

∑Tt=1 yt → µ as T →∞.

1.1 Some Processes

• εt is a white noise if E[εt] = 0 and E[ε2t ] = σ2 and E[εtεt−j ] = 0,∀j 6= 0.

• Moving average process is defined as yt = εt + θεt−1 + µ. This is called the first-order MA.

E[yt] = E[εt] + θE[εt−1] + E[µ] = µ

V (yt) = E[(yt − µ)2] = E[[εt + θεt−1)2] = E[ε2t 2εtεt−1θ + θ2ε2t−1] = (1 + θ2)σ2

Let auto-covariance

E[(yt − µ)(yt−1 − µ)] = E[(εt + θεt−1)(εt−1 + θεt−2)] = θσ2

First-order auto-correlation is

γ1√γ0√γ1

=γ1γ0

=θσ2

(1 + θ2)σ2=

θ

1 + θ2

where γj = 0, j > 1

3

Page 4: Financial Econometrics - Time Series

• pth order moving average process

yt = µ+ εt + θ1εt−1 + · · ·+ θpεt−p

where E[yt] = µ and V (yt) = E[(yt − µ)2] = E[εt + θ1εt−1 + · · ·+ θpεt−p]2 = 1 +

∑pi=1 θ

2pσ

2 andE[εtεt−j ] = 0 and E[ε2t−j ] = σ2.

The jth autocovariance is

γj = E[(yt − µ)(yt−j − µ)] = E[(εt + θ1εt−1 + · · ·+ θpεt−p)(εt−j + θ1εt−j−1 + · · ·+ θpεt−j−p)

= θjσ2 + θj+1θ1σ

2 + · · ·+ θpθp−jσ2 j = 1, · · · , p

= 0 j > p

• ∞ order moving average processes

yt = µ+∞∑j=1

ψjεt−j

V (yt) = E[(yt − µ)2] = (

∞∑j=0

ψ2j )σ

2

where ψj is square summable. If∑∞ |ψj | <∞, then it is ergodic for mean absolute summability.

• Autoregressive Processes: First order AR processes,

yt = c+ φyt−1 + εt

where |φ| < 1 for covariance stationarity.

Define the lag operator L such that Lyt = yt−1. Then yt(1−φL) = c+εt where |φ| < 1. Therefore

yt = (1− φL)−1(c+ εt) =c

1− φ+ εt + φεt−1 + φ2εt−2 + · · ·

E[yt] =c

1− φ

V (yt) = σ2(1 + φ2 + φ4 + φ6 + · · · ) =σ2

1− φ2

The first order autocovariance is

E[(yt − µ)(yt−1 − µ)] = E[εt + φεt−1 + φ2εt−2 + · · · )(εt−1 + φεt−2 + φ2εt−3 + · · · )= φσ2 + φ3σ2 + φ5σ2 + · · ·

=φσ2

1− φ2

The jth autocovariance is φjσ2

1−φ2 and the atuocorrelation is justγjγ0

= φj

For AR(p) = c+ φ1yt−1 + φ2yt−2 + · · ·+ φpyt−p + εt, then

yt(1− φ1L− φ2L2 − · · · − φpLp) = c+ εt

The p-th order polynomial L is

(1− φ1z − φ2z2 − · · · − φpzp) = (1− λ1z)(1− λ2z) · · · (1− λpz)

The roots (λ+j ) to be outside the unit circle for covariance stationarity.

Variance of AR(p) is

E[(yt−µ)2] = φ1[E(yt−1−µ)(yt−µ)]φ2[E(yt−2−µ)(yt−µ)]+· · ·+φpE[(yt−p−µ)(yt−µ)]+E[εt(yt−µ)]

4

Page 5: Financial Econometrics - Time Series

where γ0 = φ1γ1 + φ2γ2 + · · ·+ φpγp + σ2

Multiply through yt−µ = φ1(yt−1−µ) + · · ·+φp(yt−p−µ) + εt with yt−1−µ and take E[]. Then

γ1 = φ1γ0 + φ2γ1 + · · ·+ φpγp−1...

γp = φ1γp + φ2γp−2 + · · ·+ φpγ0

divide above by γ0 then we can solve the system of equations so

pj = φ1pj + φ2pj−2 + · · ·+ φppj−p, j > p

This processes is called Yule-Walker.

• ARMA(p, q) is defined

yt = c+ φ1yt−1 + · · ·+ φpyt−p + εt + θ1εt−1 + · · ·+ θqεt−q

ARMA(1, 1) without a constant term. yt(1 − φ1L) = εt(1 + θ1L). notice if θ1 = −φ1, lagpolynomial cancels.

Definition 4. Autocovariance generating function is defined as when γj is absolutely summable, then

gy(z) =∞∑

j=−∞γjz

j

for complex scalar z.The Fourier Transform of a time series xt is x(ω) =

∑∞t=−∞ e

−iwtxt as a complex function of ω.Here ω is the frequency. The inverse Fourier transformation is

xt =1

∫ π

−πeiωtxtdω

Hence we can define the Fourier transform of the autocovariance as

S(ω) =∞∑

j=−∞e−iωjγj

= γ0(cos(ω) + i sin(ω)) γj = −γj

+∞∑j=1

cos(ωj)γj + i sin(ωj)γj cos(x) = cos(−x), sin(x) = − sin(−x)

= γ0 + 2∞∑j=1

γj cos(ωj)

For auto-correlation,

f(ω) =S(ω)

γ0=

∞∑j=−∞

e−iωjρj

where ρj = γiγ0

The inverse is

ρj =1

∫ π

−πeiωjf(ω)dω

When j = 0,

1 =1

∫ π

−πf(ω)dω

Here f(ω)2π looks like a density function. This is called spectrum density function.

5

Page 6: Financial Econometrics - Time Series

For MA(1),

gy(z) = θσ2z−1 + (1 + θ2)σ2z0 + θσ2z1

= σ2(θz−1 + (1 + θ2) + θz)

= σ2(1 + θz)(1 + θz−1)

For MA(q),

gy(z) = σ2(1 + θ1z + θ2z2 + · · ·+ θqz

q)(1 + θ1z−1 + θ2z

−2 + · · ·+ θqz−q)

For AR(1),

gy(z) =σ2

(1− φz)(1− φz−1)

Definition 5. Invertibility: εt is recoverable from yt history. Then

yt = µ+ (1 + θL)εt

If |θ| < 1 we can multiply by (1 + θL)−1.

(1− θL+ θ2L2 + · · · )(yt − µ) = εt

MA process is invertible.

gy(z) = σ2(1 + θz)(1 + θz−1)

Consider Yt, (Yt − µ) = (1 + θL)εt then

gy(z) = σ2(1 + θz)(1 + θz−1)

= σ2(θz)(θ−1z−1 + 1)(θz−1)(θ−1z + 1)

Let θ = θ−1, σ2 = σ2θ2, yt, yt are the same autocovariances and same mean but yt not invertible socannot inverse εt.

1.2 How should we define market efficiency?

There should have no predictability of returns. Let pt = logSt and pt+1 = pt + εt+1 + µ. Stock returnwill be rt+1 = pt+1 − pt = εt+1 + µ so there is no serial correlation. Early research fail to reject theρ = 0. One way to look at it is to check

rt+1 = µ+ ρrt + εt+1

run a regression

ρ =

T∑t=1

(rt+1 − r)(rt − t)/T∑t=1

(rt − r)2

Under the null hypothesis, ρ = 0. Test statistic. ρ/√σ2/T ∼ N(0, 1) asymptotically. Under the

alternative, ρ > 0.We will do Monte Carlo under the null or alternative with sample size T . Then a histogram can be

generated for the test statistic.(check paper Poterba, Summers) Suppose pt = p∗t + µt where ut is serially correlated and p∗t is a

random walk. Here rt = pt − pt−1 = p∗t − p∗t−1 + µt − µt−1 and ut = eut−1 + vt where vt is iid. Therewill be serial correlation in returns but rt = p∗t − p∗t−1 + (ρ− 1)ut−1 + vt. Suppose ρ = 0.98. They seta variance ratio as

V (24∑k=1

rt+k)/2V (12∑k=1

rt+k)

6

Page 7: Financial Econometrics - Time Series

2 Forecasting

Suppose we want to forecast based on Yt+1 and xt = data. Let y∗t+1,t be forecast. We have quadratic

loss function, E[Yt+1 − Y ∗t+1,t]2. On Hamilton, it proves E[Yt+1|Yt] is the minimum mean square error.

The linear projection is Y ∗t+1,t − x′tα. Forecast error is yt+1 − x′tα. Linear projection makes forecasterror orthogonal to xt, that is

E[xt(Yt+1 − x′tα)] = 0

E[xtYt+1]− E[xtx′t]α = 0

α = E[xtx′t]−1E[xtYt+1] population statistics

OLS regression of yt+1 on xt isyt+1 = x′tβ + ut+1, t = 1, · · · , T

where b =[∑T

t=1 xtx′t

]−1 [∑Tt=1 xtyt+1

]as sample estimate.

With covariance stationary, sample moment converges to the population moments as T →∞.

1

T

T∑t=1

xtx′t

p→ E[xtx′t]

1

T

T∑t=1

xtyt+1p→ E[xtYt+1]

bp→ α

Here we are assuming data are ergodic for second moments.

2.1 Wold’s Decomposition Theorem

Any covariance stationary

Yt =

∞∑j=0

ψjεt−j + kt

where kt is the linearly deterministic. and ψ0 = 1,∑∞

j=0 ψ2j <∞ and εt = yt − E[Yt|Yt−1, · · · ] a linear

projection errors.Suppose we know εt’s, what is forecast of yt+s?

Yt+s = εt+s + ψ1εt+s−1 + ψ2εt+s−2 + · · ·E[Yt+s|εtεt−1 · · · ] = ψsεt + ψs+1εt−1 + · · ·

The MSE of forecast is (1 + ψ21 + ψ2

2 + · · ·+ ψ2s−1)σ

2 where σ2 = V (ε).

Define ψ(L)Ls = L−s + ψ1L

1−s + ψ2L2−s + · · · .

Define annihilation operator [ ]+ that sets negative powers to 0.

E[Yt+s|εt, εt−1, · · · ] =

[ψ(L)

Ls

]+

εt

Consider forecast of Yt+s based on Yt, Yt−1, · · · .

η(L)Yt = εt

η(L) =∞∑j=0

ηjLj , η0 = 1,

∞∑j=0

|ηj | <∞

for invertible representation η(L) = ψ(L)−1

E[Yt+s|Yt, Yt−1 · · · ] =

[ψ(L)

Ls

]+

η(L)Yt =

[ψ(L)

Ls

]+

1

ψ(L)Yt

is called the Weiner-Kolmogorov Prediction Form.

7

Page 8: Financial Econometrics - Time Series

3 Introduction to the Generalized Method Moments (Hayashi)

3.1 Endogeneity Bias

Coffee market with demand qdt = α0 +α1pt +ut where ut is the unobservable shifter in demand α1 < 0.Supply is qst = β0 + β1pt + vt where vt shifts supply. Equilibrium qdt = qst = qt. We observe pt and qt.

Solution is

pt =β0 + α0

α1 − β1+vt − utα1 − β1

and

qt =α1β0 − d0β1α1 − β1

+αvt − β1utα1 − β1

pt increases with rt < 0, ut > 0 and α1 < 0, β1 > 0.OLS of qt on pt gives you qt = δ0 + δ1pt + εt. Here

δ1 =cov(qt, pt)

var(pt)=cov(α0 + α1pt + ut, pt)

var(pt)

= α1 +cov(ut, pt)

var(pt)6= 0

This is called Endogeneity Bias to OLS or simultaneous Equation Bias. The solution for thisdilemma is to estimate the demand curve if we have another variable xt that shifts the supply curve.

Here we can have vt = β2xt + ζt where ζt is a new shock. The new innovations are

qdt = α0 + α1pt + ut

qst = β0 + β1pt + β2xt + ζt

where E[xtζt] = 0.Here

pt =β0 − α0

α1 − β1+

β2α1 − β1

xt +ζt − utα1 − β1

qt =α1β0 − α0β1α1 − β1

+α1β2α1 − β1

xt +α1ζt − β1utα1 − β1

where E[xtζt] = 0 and E[xtut] = 0. The above solution is called the reduced form simultaneous equationsystem.

Express endogenous variables in terms of exogenous variables. OLS of pt on xt with δ1 = β2α1−β1 .

OLS of qt on xt with δ2 = α1β2α1−β1 . Thus

δ2

δ1= α1

This estimator is called the instrumental variables estimator with xt as the instrument.

3.2 Two-Staged Least Square

1. Regress pt on xt with OLS. Thenpt = π0 + π1xt

The fitted value ispt = pt + error at lagged+σpt

2. Regress qt on pt.qt = α0 + α1pt + ut + α1(pt − pt)

where the last two terms is the composite error and is orthogonal to pt so the OLS of qt on ptgives

α1 =cov(qt, pt)

var(pt)= α1

8

Page 9: Financial Econometrics - Time Series

The fundamental equation of finance tells

Et[mt+1Rt+1] = 1

Can we estimate the parameters in this equation?

3.3 Single Equation GMM

Supposeyt = z′tδ + εt, t = 1, · · · , T

where zt is L× 1.

A3.1 : Linearity

A3.2 : Ergodic stationarity such that xt is a k × 1 vector of instruments and wt is unique elements of(yt, z

′t, x′t)′. This is stationary and ergodic.

A3.3 E[xtεt] = 0 is the orthogonality conditions. Let’s define gt = xtεt = xt(yt − z′tδ) is the functionof data and parameters. Here the variables are k × 1.

A4.4 k ≥ L is the rank condition for identification. Here E[xtz′t] is full column rank. Weak instruments

satisfy this axiom poorly. If identification, E[gt(wt, δ0)] = 0 at the true δ0. It is not 0 at δ 6= δ0.

E[xt(yt − z′tδ)] = 0

σxy − Σ′tδ = 0

in terms of population parameters. This is a system of k equations in L ≤ k unknowns. Thenecessary and sufficient condition for one solution is k ≥ L. Over-identification is k > L. Just orexact is k = L and under-identification is k < L.

A3.5 gt is a Martingale difference sequence with finite second moments. gt is a Martingale differencesequence if

E[gt|gt−1gt−2 · · · ] = 0

xt is a Martingale such thatE[xt|Φt−1] = xt−1

If Φt−1 is xt−1, xt−2,

st =

t∑j=1

gt−j = gt + gt−1 + gt−2 + · · · = gt + st−1, E[st] = E[gt] + st−1 = st−1

st is a Martingale and gt = st − st−1 is Martingale difference sequence.

A3.6 Finite 4th moment of the wt process

If gt is a Martingale difference sequence (MDS), then E[gtg′t] = variance of gt = S. Billingsley (1961)

used Central Limit Theorem for MDS. if gt is MDS that is stationary and ergodic with E[gtg′t] = S

then g = 1T

∑Tt=1 gt is sample mean and

√T g = 1√

T

∑Tt=1 gt

d→ N(0, S).

Comments

• If instruments include a constant E[εt] = 0.

• Alternative to A3.5 is E[εt|xt, xt−1, · · · ] = 0.

• gtg′t = ε2txtx′t.

• We will relax the linearity and serial correlation of gt.

9

Page 10: Financial Econometrics - Time Series

GMM defined an Economic model that gives a set of theoretical orthogonality condition E[xtεt] = 0.This is the population moments. GMM chooses parameters to set a weighted average of sample momentsas close to zero as possible. (corresponding to the population moments). The model says

gT (δ) =1

T

T∑t=1

gt(wt, δt)

where E[gt(δ0)] = 0. Here

gT (δ) =1

T

T∑t=1

xtyt −

(1

T

T∑t=1

xtz′t

)δ = sxy − Sxz δ

If k = L, just-identified, what is δ? δxz = S−1xz sxy. Sets the sample orthogonality conditions to zero.If xt = zt, then this is OLS.

If k > L, over-identified, GMM objective function

J(δ,W ) = TgT (δ)′WgT (δ)

where gT is the sample mean. Choose δ as argmin of J(δ,W ). That is

J(δ,W ) = T (sxy − Sxz δ)′W (sxy − Sxz δ)

minimized by choice of S. FOC:S′xzWsxy − S′xzWSxz δ = 0

δ =[S′xzWSxz

]−1S′xzWsxy

Single equations GMM estimator with instrumental variable. Here W must be positive-definite.

sxy =1

T

T∑t=1

xtyt =1

T

∑t=1

Txt(z′tδ0 + εt) = Sxzδ0 +

1

T

T∑t=1

xtεt = Sxzδ0 + gT (δ0)

δ =(S′xzWSxz

)−1S′xzW (Sxzδ0 + gT )

δ = δ0 +(S′xzWSxz

)−1S′xzWgT

√T (δ − δ0) = (SxzWSxz)

−1 S′xzW1√T

T∑t=1

gt

converges to N(0, S) and S = E[gtg′t], the variance of gt. As T →∞, sample moments Sxz

p→ Σxz.

Avar(δ) =(Σ′xzWΣxz

)−1Σ′xzWSWΣxz

(Σ′xzWΣxz

)−1Estimator of Avar use Sxz for Σxz., we need. to estimate S for S. That is

S =1

T

T∑t=1

gtg′t =

1

T

T∑t=1

εt2xtx

′t

εt = yt − z′tδ = z′tSt + εt − z′tδ= εt + z′t(δ0 − δ)

1

T

T∑t=1

ε2t

T∑t=1

(ε2t − 2(δ − δ0)′ztεt + (δ − δ0

)′ztz′t(δ − δ0)

as T →∞, then 1T ε

2t → E[ε2t ], δ − δ0 → 0. 1

T

∑Tt=1 ztεt → finite middle→ 0 and 1

T

∑Tt=1 ztz

′t → finite.

10

Page 11: Financial Econometrics - Time Series

To test this, we check√T (δ − δ0)

d→ N(0, ˆAvar(δ)). Test δ2 = δ1,0. First we will check

δ2 − δ1,0se(δ2)

where

se(δ2) =

√e′iAvar(δ)ei/T

and ei = [0, 0, · · · , 1, · · · ], the ith element is 1.Robust to conditional Heteroskedasticity. Wald Test of a vector of linear restriction is

H0 : Rδ0 = r

where r is number of restriction. Then

√T (Rδ −Rδ0)

d→ N(0, R ˆAvar(δ)R′)

where

Wald = T (Rδ − r)′[R ˆAvar(δ)R′

]−1g(Rδ − r)

Non-linear restrictionH0 : a(δ0) = 0

A(δ) = ∆δa(δ)

The Wald test is

Ta(δ)′A(δ) ˆAvar(δ)A(δ)′

−1a(δ)

wherea(δ) = a(δ0) +A(δ)(δ − δ0)√Ta(δ)→ A(δ)

√T (δ − δ0)

What is W? Efficient GMM uses W = S−1.

Avar(δ) = Σ−1xz S−1Σ−1xz

.Hanson 1982 his theorem (3.2) proves that this is the smallest asymptotic variance of δ for orthog-

onality conditions. (Hysashi P245 Prob 3).However, we don’t know S. There is a 2-step efficient GMMs:

1. Use known W where W = Ik. Hisashi recommends to use W = S−1xx . If this is used, then

δ1 = (S′xzS−1xx Sxz)

−1S′xzS−1xx sxy

εt = yt − z′tδ1

2. Use εt to estimate

S1 =1

T

T∑t=1

ε2txtx′t

3. Use W = S−11 to estimate

δ2 = (S′xzS−11 Sxz)

−1S′xzS−11 sxy

4. Either stop or use (S′xzS−11 Sxz)

−1 as ˆAvar of δ2 and then iterate util convergence. MC processsuggests this is suggested but not required.

Limit the number of orthogonality conditions: distinct elements S are k(k+1)2 additional parame-

ters. T observations and k series can implode the equations very quickly.

11

Page 12: Financial Econometrics - Time Series

3.4 Hansen’s J-Test

Model may give overidentify restrictions because the number of orthogonality conditions is greater thanthe number of parameters. The GMM objective function

J(δ, S−1) = TgT (δ)′S−1gT (δ)

We will take the arg min of δ as the above. Suppose this is δ0. Then

J =√TgT (δ0)

′S−1√TgT (δ0)→ χ2(k)

because√TgT (δ0)→ N(0, S). The estimation of δ sets the L linear combination of

√TgT (δ) = 0. We

know that J(δ, S−1)→ χ2(k − L) and this is called the Hansen’s J-Test.

√TgT (δ) =

√T

1

T

T∑t=1

xt(yt − z′tδ)

=√T (sxy − Sxz δ) =

√T[sxy − Sxz(S′xzS−1Sxz)−1S′xzS−1sxy

]=√T[I − Sxz(S′xzS−1Sxz)−1S′xzS−1

]sxy

=√TBsxy B is not full column rank

BSxz = 0

3.5 Likelihood-Ratio Test of H0

Here H0 has restrictions on the parameters.

1. Estimate without restrictions and get S1. TgT (δ)′S1gT (δ) where δ is the unrestricted estimators.

2. Estimate with H0 restrictions using S1. TgT (δ)′S1gT (δ) where δ is the restricted estimators.

Since we know the optimization with the constraints will be larger than the one without so we willcheck the difference

J(δ, S1)− J(δ, S1)→ χ2(r)

3.6 Newey-West Motivation

Suppose we have ytTt=1 n dimensional and covariance stationary. The mean is E[yt] = µ. Thereforeµ = 1

T

∑Tt=1 yt. E[µ] = 1

T

∑Tt=1E[yt] = T

T µ = µ is unbiased.The variance is

E[(µ− µ)(µ− µ)′] = E

[1

T

T∑t=1

(yt − µ)1

T

T∑t=1

(yt − µ)′

]

=1

T 2E

[(y1 − µ)

T∑t=1

(yt − µ)′ + (y2 − µ)

T∑t=1

(yt − µ)′ + · · ·+ (yT − µ)

T∑t=1

(yt − µ)′

]

=1

T 2

TΓ0 + (T − 1)[Γ1 + Γ′1] + · · ·+ [ΓT−1 + Γ′T−1]

TV (µ) = Γ0 +

T − 1

T[Γ1 + Γ′1] + · · ·+ 1

T[ΓT−1 + Γ′T−1]

limT→∞

TV (µ) =

∞∑j=−∞

Γj

Then we know

TE[(µ− µ)(µ− µ)′

]= lim

T→∞E

( 1√T

T∑t=1

gt(yt, µ)

)(1√T

T∑t=1

gt(yt, µ)

)′12

Page 13: Financial Econometrics - Time Series

Then the variance of√TgT is S =

∑∞j=−∞ Γj .

Recall S(ω) =∑∞

j=−∞12πΓje

−iωj with the spectrum density at frequency ω. The above conditionis just S(0).

3.7 Hansen-Hodrick JPE(1980)

Forward exchange rates as predictors of Future spot rates.

Ft,k = Et[St+k]

whereSt+k = Et[St+k] + εt,t+k, with some reaction to news

However the actual data is not very stationary so the paper propose using the rates of appreciationst+k − st (i.e. .05 means 5% appreciations in dollar) in logs and forward premium ft,k − st in logs (i.e..02 means 2% more expensive to purchase ponders with dollars for delivery in k periods).

With rational expectation that

st+k − st = Et(st+k − st) + ut+k,t

and Et(ut+k,t) = 0, under null hypothesis

Et[st+k − st] = α+ (ft,k − st)

Alternativelyst+k − st = α+ β(ft,k − st) + ut+k,t

where β = 1 as null is in interest. What are legitimate instruments to use? Anything is in theinformation can be used as the instrument. (e.g., constant, forward premium). The orthogonalitycondition is

E

[ut+k,t

(1

ft,k − st

)]= 0

Then

gt(δ) = (st+k − st)− α− β(ft,k − st)(

1ft,k − st

)

where δ = (α, β)′. Let yt+k = st+k − st, xt =

(1

ft,k − st

), y = (y1+k, · · · , yt+k)′, X =

x′1...x′T

and

u =

u1+k,1...

uT+k,T

. We have gT (δ) = 1TX

′µ. Based on GMM, we have

J(δ,W ) = TgT (δ)′WgT (δ) = T

[1

TX ′(y −Xδ)

]′W

[1

TX ′(y −Xδ)

]δ = (X ′X)−1X ′y is OLS.

δ = (X ′X)−1X ′(Xδ0 + u) = δ0 +

((X ′X)

T

)−1gT (δ0)

√T (δ − δ0) =

((X ′X)

T

)−1√TgT (δ0)

√TgT (δ0)→ N(0, S)

S =

∞∑j=−∞

Γj , if Γj 6= 0

13

Page 14: Financial Econometrics - Time Series

Γj = E[ut+k,txtut+k−j,t−jx′t−j ]

for j < k, Γj 6= 0, j ≥ k, Γj = 0. The paper was able to sample data more timely than the forecasting.

Hansen-Hodrick GMM uses S = Γ0 +∑k−1

j=1(Γj + Γ′j). Sometimes this estimator does not turns out tobe positive definite so Newey-West comes along.

3.8 Non-linear GMM: Consumption-based Asset Pricing

Let pjt = real price of asset j, djt = real dividend of asset j. u′(ct) = marginal utility of consumption.The first order condition for equilibrium investment in an asset is the marginal cost is equal to the ex-pected marginal utility in the future

u′(ct)pjt = Et[βu′(ct+1)(pj,t+1 + dj,t+1)], j = 1, · · · , N (1)

One utility function people use is u(ct) =c1−αt1−α (CRRA). Then

rj,t+1 = real return =pj,t+1 − dj,t+1

pjt

We can divide equation (1) by u′(ct)pit and take unconditional expectation

1 = E

(ct+1

ct

)−αRj,t+1

]

must hold for j = 1, · · · , NOrthogonality condition is

E

(ct+1

ct

)−αRj,t+1 − 1

]= 0

where θ = (α, β)′.

εt+1

(θ,Rt+1,

ct+1

ct

)=

(ct+1

ct

)−αRj,t+1 − 1

]

Et

[εt+1

(θ,Rt+1,

ct+1

ct

)]= 0

Et

[εt+1

(θ,Rt+1,

ct+1

ct

)⊗ xt

]= 0, xt ∈ Φt

(M instruments usual 1 of which is a constant.)Define

gt(θ, wt+1) = εt+1

(θ,Rt+1,

ct+1

ct

)⊗ xt

where wt+1 unique elements of dataHere E[gt(θ, wt+1)] = 0. gt(θ, wt+1) is a k = MN dimensional time series function of data and

parameters.

A1 wt+1 is stationary and ergodic. Then, when gt(θ, wt+1) is continuous in θ for all wt+1 and differen-tiable with respect to θ then

gT (θ) =1

T

T∑t=1

gt(θ, wt+1)p→ E(gt(θ, wt+1))

this is the sample mean of the orthogonality condition.

GT (θ) = ∇gT (θ)→ E[G(θ)]

14

Page 15: Financial Econometrics - Time Series

A2 Identification case: E(gt(θ, wt+1)) 6= 0,∀θ 6= θ0. Otherwise, 0.

A3√TgT (θ0)

d→ N(0, S). S =∑∞

j=−∞ Γj but theory will often limit j.

GMM objective function isJT (θ) = arg min

θTgT (θ)′WgT (θ)

for some positive definite symmetric k × k weighting matrix W. For over-identified k > p, the FOC is

GT (θ)WgT (θ) = 0

p linear combinations of sample average orthogonality conditions are zero.

aT gT (θ) = 0

(Hensen and Cochrarne use this) where aT = GT (θ)′WApply the mean-value theorem,

gT (θ) = gT (θ0) +GT (θ)(θ − θ0)

We will substitute into the FOC.

GT (θ)W [gT (θ0) +GT (θ)(θ − θ0)] = 0

√T (θ − θ0) = −[GT (θ)′WGT (θ)]−1GT (θ)′W

√TgT (θ0)

Under the standard regularity conditions,

GT (θ), GT (θ)

converges to E[G(θ0)]. √TgT (θ0)

d→ N(0, S)√T (θ − θ0)

d→ N(0, Avar(θ))

where Avar(θ) = (G′WG)−1G′WSWG(G′WG)−1 and S is asymptotic variance of gT (θ, wt+1)Setting W = S−1 is optimal

Avar(θ) = (G′S−1G)−1

1. Calculate θ1 with known W = I

2. Calculate S1 using θ1 to get the variance of gt(θ, wt+1). Impose the lag restrictions on Γj = 0.

3. Use W = S−1 to get θ2 either stop or iterate to convergence.

4. Form GT (θ) = ∇θgT (θ) either analytically or numerically. Define a procedure that calculatesgT (θ). Taking numerical gradient at θ of procedure.

5. Do tests with √T (θ − θ0)

d→ N(0,[GT (θ)′S−1GT (θ)

]−1)

Suppose we have n assets.

εt+1 = β

(ct+1

ct

)−αRt+1 − 1

gt(θ, wt+1) = εt+1

gT (θ) =1

T

T∑t=1

gt(θ,Wt+1)

15

Page 16: Financial Econometrics - Time Series

where θ = (β, α)′

GT (θ) = ∇θgT (θ) =

[1

T

T∑t=1

(ct+1

ct

)−αRt+1;

1

T

T∑t=1

−β log

(ct+1

ct

)(ct+1

ct

)−αRt+1

]Et[εt+1] = 0, by theory

S =1

T

T∑t=1

gt(θ, wt+1)gt(θ, wt+1)′

√T

[(βα

)−(β0α0

)]d→ N(0, (GT (θ)′S−1GT (θ))−1)

3.9 The Asymptotic Distribution of the Orthogonality Conditions

gT (θ) = gT (θ0) +GT (θ)(θ − θ0)The following is true because the first-order condition

G′TWgT (θ) = 0 = G′TWgT (θ0) +G′TWGT (θ − θ0)

We argue thatθ − θ0 = −(G′TWGT )−1G′TWgT (θ0)

gT (θ) = gT (θ0)−GT (G′TWGT )−1G′TWgT (θ0)

= [I −GT (G′TWGT )−1G′TW ]gT (θ0)

We know that √T (gT (θ0))

d→ N(0, S)√TgT (θ)

d→ N(0, [I −G(G′WG)−1G′W ]S[I −G(G′WG)−1G′W ]′)

If W = S−1, then above is going to be reduced to

[IG(G′S−1G)−1G′S−1]S[IG(G′S−1G)−1G′S−1]′

= S −G(G′S−1G)−1G′S−1S −G(G′S−1G)−1G′S−1 +G(G′S−1G)−1G′S−1

= [S −G(G′S−1G)−1G′

Asymptotically, √TgT (θ)

d→ N(0, S −GT (G′T S−1G)−1G′T )

From Lemma 4.2 (either Hayashi or Hamilton), we know that

Tg′T (θ)S−1gT (θ) ∼ χ2(r − p)

where r is the number equations and p is number of parameters.S estimates the variance of gt(θ) where V (gt(θ)) = E[gt(θ)gt(θ)

′]. Under null we have E[gt(θ)] = 0.Here we have a few tips as given in the following to improve our test

1. Consider the estimate for S as 1T

∑Tt=1 gt(θ)gt(θ)

′ where gt(θ) is serially uncorrelated. We canimprove the power of the test by setting

S =1

T

T∑t=1

[gt(θ)− gT (θ)][gt(θ)− gT (θ)]′

2. Scale data so variances of gt(θ) are similar.

3. Keep model relatively small. Since K(K+1)2 in S to be unknown, the size of S can be very large.

4. In our asset pricing modelEt[mt+1(θ)Rt+1] = 1

If we have instrument 1 and xt, then the orthogonality conditions of our model becomes

E

[mt+1(θ)Rt+1 ⊗

(1xt

)− 1⊗

(1xt

)]= 0

16

Page 17: Financial Econometrics - Time Series

3.10 Hansen-Hodrick (1983)

Consider conditional CAPM with constant βs with excess return

Et(Rit+1) = βiEt(Rmt+1), i = 1, · · · , N

We know from rational expectation

Rit+1 = Et(Rit+1) + εit+1, εit+1 ⊥ Φt

where Φt is the information set of the investors. Consider the linear projection of Et[Rmt+1] on to xtobservable. Then

Et(Rmt+1) = α+ δ′xt + vt, vt ⊥ (1, xt)

where this is a latent variable approach.

Rit+1 = βi(α+ δ′xt) + βivt + εit+1, i = 1, · · · , N

Let’s call

uit+1 = βivt + εit+1 ⊥(

1xt

)Normalize β1 = 1. Then we can write the following

Rit+1...

RNt+1

=

1β2...βN

(α+ δ′xt) +

u1t+1

u2t+1...

uNt+1

where our orthogonality conditions are

E

u1t+1

u2t+1...

uNt+1

⊗(

1xt

) = 0

Here we assume xt has m elements and k from α and δ, (N01) from β’s. Then We have Nk > N−1+kover-identified GMM.

4 Vector Auto-regression

4.1 Maximum Likelihood Estimation

Let f(yt|xt, yt−1; θ) probability density function of yt given past xt and yt−1 (the past history). Viewf(yt|xt, yt−1; θ) as a function of unknown θ and a likelihood function. Here we know∫

Af(yt|xt, yt−1; θ)dyt = 1

Cremer (1946) says under appropriate regularity conditions, we can differentiate the above with respectto θ ∫

A

∂f(yt|xt, yt−1; θ)∂θ

= 0

Multiply by ff on both side. Then we have∫

A

∂f(yt|xt, yt−1; θ)∂θf(yt|xt, yt−1; θ)

f(yt|xt, yt−1; θ) = 0

17

Page 18: Financial Econometrics - Time Series

Thus

E

[∂ log f(yt|xt, yt−1; θ)

∂θ

]= 0

Define st(θ) = ∂ log f(yt|xt,yt−1;θ)∂θ is the t-th score function. The maximum likelihood function tells us

Et−1[st(θ)] = 0

andE[st(θ)] = 0

Hence the maximum likelihood function is GMM on the score function.

L(yt) =T∏t=1

f(yt|xt, yt−1; θ)

since the innovations (the residuals basically) in yt are serially uncorrelated. Then

l(yt) =

T∑t=1

log f(yt|xt, yt−1; θ)

From maximum likelihood, we havemax l(yt)

to set

1

T

T∑t=1

st(θ) =1

T

T∑t=1

st(θ0) +1

T

T∑t=1

(st(θ)− st(θ0) = 0

We know√T (

1

T

T∑t=1

st(θ0))d→ N(0, S)

where S = E[st(θ0)st(θ0)′] because st(θ0) is serially uncorrelated. We know

1

T

T∑t=1

∂st(θ)

∂θ

d→ E

[∂2 log f

∂θ∂θ′

]= −G

√T (θ − θ0)

d→ N(0, G−1SG−1)

where G has same square dimension as S. In MLE, S = G = I = fisher’s information matrix

√T (θ − θ0)

d→ N(0, I−1)

S = E

(∂f

∂θ

∂f

∂θ

′)= −E

[∂2f

∂θ∂θ′

]if the model is true.

4.2 Vector Auto-regression

We will be doing first-order vector-autoregression

yt = Ayt−1 + εt

with zero means andytN×1

= CN×1

+ Φ1N×N

yt−1 + · · ·+ Φpyt−p + εtN×1

where εt is serially uncorrelated and N(0,Ω) and yt has dimension N . The key is VAR completelycharacterizes the auto-correlated yt.

18

Page 19: Financial Econometrics - Time Series

T + p observations (conditional on the first p observation) on yt. Goal is estimate

θ = (C,Φ1, · · · ,Φp,Ω)

Conditional distribution of yt given in past data is

yt ∼ N(C + Φ1yt−1 + · · ·+ Φpyt−p,Ω) = N(Π′xt,Ω)

Define

xt =

1

yt−1...

yt−p

,Π′ = (C,Φ1, · · · ,Φp)

yjt = Cj + Φ′1jyt−1 + · · ·+ Φ′pjyt−p + εjt

The conditional density function of the t-th observation is going to be

f(yt|xt, θ) = (2π)−N/2∣∣Ω−1∣∣1/2 exp

[−1

2(yt −Π′xt)

′Ω−1(yt −Π′xt)

]Then the log-likelihood function is

l(θ) =

T∑t=1

log f(yt|xt, θ) =−TN

2log(2π) +

T

2log(Ω−1)− 1

2

T∑t=1

(yt −Π′xt)′Ω−1(yt −Π′xt)

Choose Θ to maximize L(Θ)

Π′ =

[T∑t=1

ytx′t

][T∑t=1

xtx′t

]−1This is the OLS equation by equation.

Useful matrix calculation results:

1. Consider a quadratic form in A non-symmetric

∂x′Ax

∂aij= xixj

∂x′Ax

∂A= xx′

2.∂ log |A|∂A

= (A′)−1

Then we have

∂l(θ)

∂θ=T

2Ω′ − 1

2

T∑t=1

εtε′t = 0

Then

Ω′ =1

T

T∑t=1

εtε′t

19

Page 20: Financial Econometrics - Time Series

4.3 Choice of Lag Length

Given Ω, value of l(θ)

l(Ω, Π) = −TN2

log(2π) +T

2log∣∣∣Ω−1∣∣∣− 1

2

T∑t=1

ε′tΩ−1εt = −TN

2(12 + log(2π)) +

T

2log∣∣∣Ω−1∣∣∣

where

1

2

T∑t=1

ε′tΩ−1εt =

1

2tr

(T∑t=1

ε′tΩ−1εt

)=

1

2tr

(T∑t=1

Ω−1εtε′t

)=

1

2tr(

Ω−1T Ω)

=TN

2

where

Ω =1

T

T∑t=1

εtε′t

Suppose we want to test lag length p0 < p1. p0 imposes N2(p1 − p0) 0 restrictions.

2(l1 − l0) = 2

(T

2log∣∣∣Ω−1∣∣∣− T

2log∣∣∣Ω−1∣∣∣) = T

(log∣∣∣Ω−1∣∣∣− log

∣∣∣Ω−1∣∣∣) ∼ χ2N (p1 − p0)

For small sample, Sim (1980) argues that

2(l1 − l0) = (T − k)(

log∣∣∣Ω−1∣∣∣− log

∣∣∣Ω−1∣∣∣)where k = 1+Np1. The other ones are Akaike Information Criterion and Schwarz Information Criterion.

They say choose the lag length that minimizes the log∣∣∣Ωp

∣∣∣+ (pN2 +N)C(T )T where C(T ) = 2 for AIC

and log(T ) for SIC.

4.4 Cambell (1991) and Hodrick (1992)

Let zt = [logRt, Dt/Pt, rbt]′ where rbt = it −

∑12j=1 it−j12 (detrend interest rate, relative build rate), Rt is

continuous compounded return, Dt is dividend rate and Pt is price.Here z(t) is de-meaned.

zt+1 = Azt + ut+1

(I −AL)zt+1 = ut+1 =⇒ zt+1 = (I −AL)−1ut+1 = ut+1 +Aut +A2ut−1 + · · ·

ut+1 is serially uncorrelated and let E[ut+1u′t+1] = V is the innovation covariance matrix. The uncon-

ditional variance of zt+1 is equal to

C(0) = E[zt+1z′t+1] = E[(ut+1 +Aut +A2ut−1 + · · · )(ut+1 +Aut +A2ut−1 + · · · )′]

= V +AV A′ +A2V A′2 + · · ·

C(0) =

∞∑j=0

AjV A′j

E[zt+1z′t+1] = E[(Azt + ut+1)(Azt + ut+1)

′] = AE[ztz′t]A′ + E[ut+1u

′t+1]

C(0) = AC(0)A′ + V

Hamilton Proposition 10.4 states

vec(XY Z) = (Z ′ ⊗X)vec(Y )

where vec is a stack operator.Then

vec(C(0)) = vec(AC(0)A′) + vec(V ) = (A⊗A)vec(C(0)) + vec(V ) = [IN2 −A⊗A]−1vec(V )

20

Page 21: Financial Econometrics - Time Series

C(1) = E[zt+1z′t] = E[(Azt + ut+1)(z

′t)] = AE[ztz

′t] = AC(0)

C(2) = E[zt+2z′t] = E[(A2z − t+Aut+1 + ut+2)z

′t] = A2C(0)

C(j) = E[zt+jz′t] = AjC(0)

C(−j) = C(j)′

Suppose we are interested in long horizon predictability. Let

logRt+k,k = logRt+1 + · · ·+ logRt+k

What is the variance of logRt+k,k. First, we will get the variance of∑k

j=1 zt+j

Vk = E[(zt+1 + zt+2 + · · ·+ zt+k)(zt+1 + zt+2 + · · ·+ zt+k)′]

Vk = kC(0) + (k− 1)(C(1) +C(−1)) + · · ·+ (C(k− 1) +C(−k+ 1) = kC(0) +

k−1∑j=1

(k− j)(C(j) +C(j)′)

(Note V (logRt+k,k) = e′1Vke1 where e1 = (1, 0, 0)′)Fama-French looked at

logRt+k,k = αk,1 + βk,1Dt

Pt+ ut+k,k

we can use GMM with overlapping data.

βk,1 = Cov(logRt+1 + · · ·+ logRt+k, Dt/Pt)/V ar(Dt/Pt)

But from VAR, all auto-covariances are determined. In particular, we can get

βk,1 =e′1[C(1) + · · ·+ C(k)]e2

e′2C(0)e2=e′1(A+A2 + · · ·+Ak)C(0)e2

e′2C(0)e2

This is implied slope coefficients. We can use delta method to get the variance.The k-period variance ratio is

V Rk =V ar(logRt+1 + · · ·+ logRt+k

kV ar(Rt+1)=e′1[kC(0) +

∑k−1j=1(k − j)(C(j) + C(j)′)]e1

ke′1C(0)e1

R2 from implied regression is the explained variance over the total variance that is

R21(1) =

β2k,1e′2C(0)e2

e′1Vke1

Explanatory Power of VAR at k horizon is

R22(k) = 1− Innovation Variance

Total Variance

requires the k-period innovation variance. ut+1,1 = ut+1 at k = 1. This is innovation in zt+1.

ut+2,2 = ut+2 +Aut+1 at k = 2

ut+3,3 = ut+3 +Aut+2 +A2ut+1 at k = 3

ut+k,k = [I +AL+A2L2 + · · ·+Ak−1Lk−1]ut+k = [(I −AL)−1(I −AkLk)]ut+kThe innovation variance is

k∑j=1

(I −A)−1(I −Aj)V (I −Aj)′(I −A)′−1 = Wk

R22(l) = 1− e′2Wke2

e′1Vke1

Midterm March 7, 9-12 am, Uris 332, Closed Book and Notes

21

Page 22: Financial Econometrics - Time Series

4.5 Impulse Response Functions

Univariate yt, Et[yt+s]− Et−1[yt+s] response to a shock εt = 1.

yt = α+

∞∑j=0

θjεt−j , θ0 = 1

It’s impulse response functions are the following

Et[yt+1]− Et−1[yt+1] = θ1

...

Consider VAR, yt = µ+ Φyt−1 + εt. Then

E[εtε′t] = Ω, full rank

yt = (I − ΦL)−1(µ+ εt) = (I − Φ)−1µ+∞∑j=0

Φjεt−j

We are interested in impulse response function of yk,t+j to the shock of εk,t, that is

e′kΦjek

where ek and ek are indicator vectors. In other words, the following is equivalent

e′kΨjek

where yt = µ+∑∞

j=0 Ψεt−j εh,t = 1 with εj,t = 0 if j 6= h makes no sense because it never happens inthe world (you cannot really test this).

Ω is real symmetric positive definite so we can write

Ω = ADA′

where A is lower triangular with 1’s on the diagonal with positive entries off diagonal and zero elsewhereand D is a diagonal matrix. Now let’s consider a process ut = A−1εt. Hence we have

E[utu′t] = A−1E[εtε

′t](A

−1)′ = A−1Ω(A−1)′ = A−1ADA′(A−1)′ = D

so ut’s are mutually uncorrelated. How let’s consider

Aut = εt

u1t = ε1t

u2t = ε2t − a21u1t... =

...

ujt = εjt − aj1ut1t − aj2u2t − · · · − aj,j−1uj−1,tBecause ujt are uncorrelated, ujt is the projection error of εjt onto (u1t, · · · , uj−1,t) and ajk are projec-tion coefficients. Let xt be (yt, yt−1, · · · ). Then we have

ε1t = y1t − E[y1t|xt−1], · · · , εjt = yjt − E[yjt|xt−1]

The change in the projection∂E[εjt|y1t, xt−1]

∂y1t= aj1

For the vector we have∂E[εt|y1t, xt−1]

∂y1t= a1

Consequently,∂E[yt+s|y1t, xt−1]

∂y1t= Ψsa1

This is the orthogonalized impulse response function. The issue is that the orthogonalization requirestheory to make sense.

22

Page 23: Financial Econometrics - Time Series

4.6 Variance Decompositions

What percent of forecast error variance is due to ujt? We know that

yt+s − yt+s = εt+s + Ψ1εt+s−1 + Ψ2εt+s−2 + · · ·+ εt+1

MSE(yt+s|t) = Ω + Ψ1ΩΨ′1 + · · ·+ Ψs−1ΩΨs−1

Ω = ADA′, Djj = var(ujt)

Ω = a1a′1var(u1t) + a2a

′2var(u2t) + · · ·+ ama

′mvar(umt)

MSE(yt+s|t) =m∑j=1

var(ujt)[aja′j + Ψ1aja

′jΨ′1 + · · ·+ Ψs−1aja

′jΨs−1]

The contribution of ujt is

var(ujt)[aja′j + Ψ1aja

′jΨ′1 + · · ·+ Ψs−1aja

′jΨs−1]

Since MSE → Γ0, the variance of yt as s→∞, then it becomes the unconditional variance.

4.7 Models of Non-Stationarity Time Series

Hamilton Chapter 15 and Hayashi Chapter 9.When we have stationary processes

yt = µ+∞∑j=0

ψjεt−j , ψ0 = 1

where∑∞

j=0 |ψj | <∞, ψ(z) = .0 has roots outside the unit circle.

• E[yt] = µ

• E[yt+s|ytyt−1, · · · ]→ µ as s→∞

However in general, economics and finance data are not stationary. We can take natural log. Thereare a few methods that attempt to solve the non-stationarity problem.

• Deterministic time trendyt = µ+ δt+ ψ(L)εt

where ψ(L)εt is as above. Here yt is trend stationary. Campbell, Lettau, Malkiel, Xu (2001) JFargues that the aggregate idiosyncratic volatility of returns had a trend.

• Unit-Root Processes(1− L)yt = δ + ψ(L)εt

The last part should be stationary. We also assume ψ(1) 6= 0. yt process is stationary after thefirst difference, e.g. log of GDP, log of price, log of exchange rate.

∆ lnGDPt = rate of growth

∆ lnPt = rate o inflation change

∆ lnSt = change rate of appreciation rate

Why ψ(1) 6= 0? Suppose yt is stationary, yt = µ + χ(L)εt is stationary. Then (1 − L)yt is alsostationary and (1−L)yt = (1−L)χ(L)εt = ψ(L)εt but ψ(1) = 0 in this case. It rules out startingwith a stationary process.

The prototypical unit root process is a random walk with drift, that is

yt = δ + yt−1 + εt, εt i.i.d

23

Page 24: Financial Econometrics - Time Series

dyt = δ + εt

The unit root processes are integrated of order 1.

dy(t)

dt= x(t) =⇒ y(t) =

∫x(t)dt

∆yt = xt

isyt = xt + yt−1

yt−1 = xt−1 + yt−2

...

yt =

∞∑j=0

xt−j

where y − t is the sum over time of xt.

By analogy, we get I(2) are integrated of order 2.

(1− L)2yt = k + ψ(L)εt

ARMA(p, q) was stationary AR(p), MA(q). ARIMA(p, d, q) so difference d times and thenAR(p) and MA(q) processes.

φ(L)(1− L)dyt = θ(L)εt

Typically (1, 1, 1) is enough.

4.7.1 Compare Forecasts

If Yt is the level of GDP, yt = ln(Yt) then

∆yt = growth rate of GDP

This change can be population, labor force participation, investment and technology change. They areusually stationary.

Trend Stationaryyt = α+ δt+ ψ(L)εt

yt+s = α+ δ(t+ s) + ψ(L)εt+s

yt+s,t = E[yt+s|yt, · · · ] = α+ δ(t+ s) + ψsεt + ψs+1εt−1 + · · ·

E[yt+s,t − α− δ(t+ s)]→ 0

as ψj dies out

The forecast errorsyt+s − yt+s,t = εt+s + ψ1εt+s−1 + · · ·+ ψs−1εt+1

The MSE of the Forecast is σ2(1 + ψ21 + · · · + ψ2

s−1). As s → ∞, MSE goes to unconditionalvariance of ψ(L)εt.

Unit Root∆yt = δ + ψ(L)εt

yt+s = ∆yt+s + ∆yt+s−1 + · · ·∆yt+1 + yt

= (δ + ψ(L)εt+s) + (δ + ψ(L) + εt+s−1) + · · ·+ (δ + ψ(L)εt+1) + yt

yt+s,t → sδ + y∗24

Page 25: Financial Econometrics - Time Series

as s→∞The forecast errors

yt+s − yt+s,t = ∆yt+s + ∆yt+s−1 + · · ·+ ∆yt+1 + yt − [∆yt+s,t + · · ·+ ∆yt+1,t + yt]

= (εt+s + ψ1εt+s−1 + · · ·+ ψs−1εt+1)

+ (εt+s−1 + ψ2εt+s−2 + · · ·+ ψs−2εt+1)

+...

= εt+1

= εt+s + (1 + ψ1)εt+s−1 + (1 + ψ1 + ψ2)εt+s−2 + · · ·+ (1 + ψ1 + ψ2 + · · ·+ ψs−1)εt+1

MSE = σ2[1 + (1 + ψ1)2 + · · ·+ (1 + ψ1 + ψ2 + · · ·+ ψs−1)

2

4.8 Hodrick-Prescott Filter

Let yt = log(GDP ) = gt + ct where gt is a smooth trend (∆gt is stationary) and ct is a cyclicalcomponent. We want to minimize the cyclical components subject to gt not varying very much.

mingtTt=1

T∑t=1

(yt − gt)2 + λ

T∑t=1

((gt − gt−1)− (gt−1 − gt−2))2

where quarterly data uses λ = 600 (A particular unobservable components model).

4.9 Special Cases

Random walk with driftyt+s,t = sδ + yt + εt+s

log series is expected to grow at the rate of δ from wherever it is yt.ARIMA(0, 1, 1):

∆yt = δ + εt + θεt−1

yt+1,t = δ + yt + εt+1

yt+1 − yt+1,t = θεt

εt = yt − yt,t−1, for δ = 0

yt+1,t = yt + θ(yt − yt,t−1) = (1 + θ)yt − θyt,t−1For |θ| < 1

(1 + θL)yt+1,t = (1 + θ)yt

yt+1,t. =(1 + θ)yt

1− (−θL)= (1 + θ)

∞∑j=0

(−θ)jyt−j

This is exponential smoothing. If θ < 0. the right hand side is how people formed expectations in1960s. Friedman (1957) says it permanent increases. Muth (1961) says exponential smoothing is onlyrational if series is (0, 1, 1).

4.10 Beveridge-Nelson Decomposition

Every unit-root process can be decomposed into a random walk with drift plus a zero-mean stationarycomponent.

(1− L)yt = µ+ a(L)εt

where roots a(z) are outside the unit circle.

yt = zt + ct

25

Page 26: Financial Econometrics - Time Series

where zt is the random walk with drift and ct is the stationary part.The claim is that

zt = µ+ zt−1 + a(1)εt

ct = a∗(L)εt, a∗j = −

∞∑k=j+1

ak

Proof. Proof by construction.(1− L)yt = (1− L)zt + (1− L)ct

(1− L)zt = µ+ a(1)εt

(1− L)ct = (1− L)a∗(L)εt

a(1) = a0 + a1 + a2 + a3 + · · ·

(1− L)a∗0 = −a1 − a2 − a3 − · · ·+ a1L+ a2L+ · · ·

(1− L)a∗1L = −a2L− a3L− · · ·+ a2L2 + a3L

2 + · · ·...

(1− L)yt = µ+ (a(1) + (1− L)a∗(L))εt = µ+ a(L)εt

4.11 Fractional Integration

ARIMA(p, d, q) implies (1 − L)dyt = ψ(L)εt for MA infinity representation. The impulse responsefunction decays geometrically.

(1− ρL)yt = εt =⇒ yt = εt + ρεt−1 + ρ2εt−2 + ρ3εt−3

Granger, Jayeux (1980) and Hosking (1981) considers [(1− L)d]−1 exists for d < 12

yt = (1− L)−d(ψ(L))εt

f(z) = (1− z)−d

df

dz= d(1− z)−(d+1)

d2f

dz2= (d+ 1)d(1− z)−(d+z)

...

Power series expansion of f(z) around z = 0

f(z) = f(0) +df

dz

∣∣∣∣z=0

z +1

2!

d2f

dz2

∣∣∣∣z=0

z2 + · · ·+ · · ·

(1− z)−d = 1 + dz +1

2(d+ 1)dz2 +

1

3!(d+ 2)(d+ 1)dz3 + · · ·

(1− L)−d =

∞∑j=0

hjLj

where h0 = 1 and hj = 1j!(d+ j − 1)(d+ j − 2) · · · (d+ 1)d Therefore

yt = (1− L)−dεt = h0εt + h1εt−1 + h2εt−2 + · · ·

Infinite order MA with particular impulse response function decays slowly than the geometric decay.(long memory; Bollerslev GARCH)

26

Page 27: Financial Econometrics - Time Series

4.11.1 GARCH

yt = µ+ εt

whereεt = N(0, ht)

andht = ω + βht−1 + αε2t−1 conditional variance processes

ht = E[ε2t ]

E[ht] = ω + βE[ht−1] + αE[ε2t−1]

V = E[ht] = E[ht−1] = E[ε2t−1]

V =ω

1− α− β, α+ β < 1

By applying the factional integration into GARCH, we call it FGARCH.

4.12 Testing For Unit-Root

Section 15.4 from Hamilton gives a good discussion about this topic. Suppose

yt = yt−1 + εt

is the truthyt = ρyt−1 + εt

set ρ = .9999 with 10,000 observation you won’t be able to reject ρ = .99999 v.s. 1Consider

yt = ρyt−1 + ut

where ut is i.i.d N(0, σ2) Estimate ρ with OLS

ρ =

∑Tt=1 yt−1yt∑Tt=1 y

2t−1

=

∑Tt=1 yt−1(ρyt−1 + ut)∑T

t=1 y2t−1

= ρ+

∑Tt=1 yt−1ut∑Tt=1 y

2t−1

when yt is stationary√T (ρ− ρ) =

1/√T∑T

t=1 yt−1ut

1/T∑T

t=1 y2t−1

where√T (ρ − ρ) → N(0,Ω) and Ω = Q−1SQ−1, Q = E[y2t−1] and S = E[y2t−1, u

2t ] = Qσ2 with

homoskedasticity.

Ω = Q−1Qσ2Q−1 = σ2Q−1

Q = E[y2t−1] =σ2

1− ρ2

Ω =σ2

σ2/(1− ρ2)= (1− ρ2)

√T (ρ− ρ)→ N(0, 1− ρ2)

Notice if ρ = 1, we would have N(0, 0). It is impossible. The law of large numbers and convergenceonly work for |ρ| < 1.

Suppose rather by scaling by the root of T, let’s scale by T.

T (ρ− ρ) =1/T

∑Tt=1 yt−1ut

1/T 2∑T

t=1 y2t−1

27

Page 28: Financial Econometrics - Time Series

If yt is random walk, ρ = 1, let y0 = 0

yt = ut + ut−1 + · · ·+ u1

yt ∼ N(0, tσ2) (2)

y2t = (yt−1 + ut)2 = y2t−1 + 2yt−1ut + u2t

yt−1ut =1

2(y2t − y2t−1 − u2t )

Therefore,T∑t=1

yt−1ut =1

2(y2T − y20)− 0.5

T∑t=1

u2t = 0.5y2T − 0.5T∑t=1

u2t

1/TT∑t=1

yt−1ut = 0.5/Ty2T − 0.5/TT∑t=1

u2t

Divide by σ2

0.5/(σ2T )y2T − 0.5/(σ2T )

T∑t=1

u2t

This is equal to

0.5(yT /(σ√T ))2 − 0.5/(σ2T )

T∑t=1

u2t = 0.5χ2(1)− 0.5 = 0.5(χ2(1)− 1)

The denominator is

E

[1

T 2

T∑t=1

y2t−1

]=

1

T 2σ2

T∑t=1

(equation 2− 1) =σ2(T − 1)T

T 22

by functional central limit theoryYou can demonstrate that T (ρ − 1) < 0 68% of the time, even though ρ = 1. Hayashi has Dikey-

Fuller discussion on page 487 and table B5 on page 762.

yt = yt−1 + εt

yt = ρyt−1 + εt

Calculate T (ρ − 1) for T = 100. The probability T (ρ − 1) < 131 is 95% and T (ρ − 1) < −7.9 is 5%.Reject ρ = 1 if T (ρ − 1) < −7.9 at 5% critical value. ρ − 1 = 1

100(−7.9). ρ = 1 − 0.079 = .921. ifρ < .92, you can reject H0 : ρ = 1.

4.13 Cointegration

yt(m× 1)

each yit is I(1), i = 1, · · · ,m. a′yt where a is m× 1 vector of constants is stationary.

28

Page 29: Financial Econometrics - Time Series

4.13.1 Purchasing Power Parity

($/£) = St. P$t is the dollar price level and P£

t is pound price level. Internal purchasing power is the

1

P $t

=Goods

$

and the external purchasing power in the UK

1

St=

£

$,

1

P£t

=Goods

£

Then1

P $t

=1

St

1

P£t

Take logs

S$/£t = P $

t − P£t

St 6= SPPPt

St − P $t + P£

t = deviations from PPP

where St is I(1), P $t is I(1) and P£

t is I(1). Here a′ = (1,−1, 1) and a′

StP $t

P£t

= stationary process cointegration

4.14 Price-Dividend Ratio and Campbell-Shiller Decomposition

Let rt+1 = rate of return on a stock, pt = log price of stock, dt = log dividend.

exp(rt+1) =Pt+1 +Dt+1

Pt

Factor out Dt+1 and Divide numerator and denominator by Dt then

exp(rt+1) =

[Pt+1

Dt+1+ 1]Dt+1

Dt

PtDt

Take logsrt+1 = log[exp(pt+1 − dt+1) + 1] + ∆dt+1 − (pt − dt)

rt+1 = log[1 + exp(p− d)] +exp(p− d)

1 + exp(p− d)(pt+1 − dt+1 − p− d) + ∆dt+1

rt+1 = κ+ ρ(pt+1 − dt+1) + ∆dt+1 − (pt − dt)(κ is some constant term)

Holds ex post and ex ante. Take Et of both side.

(pt − dt)(1− ρL−1) = κ+ ∆dt+1 − rt+1

pt − dt =κ

1− ρ+ Et

∞∑j=1

ρj−1(∆dt+j − rt+j)(Campbell-Shiller Decomposition)

Here rt+j is stationary, dt+1 is I(1), ∆dt+1 is stationary, pt+1 is I(1). Lastly, (pt − dt) is stationary.The CS decomposition is driven by the similarity of the dividend to earning’s ratio Dt

EAtwhere

dt − eat is stationary.

4.14.1 Lettau-Ludigson: log Consumption Wealth Ratio

cayt = ct − αwt29

Page 30: Financial Econometrics - Time Series

4.14.2 Hamilton’s Canonical Example

2y’s,y1t = γy2t + u1t

y2t = y2t−1 + u2t

where u1t and u2t are serially uncorrelated white noise.

∆y2t = u2t

so y2t is I(1) and y2t is ARIMA(0, 1, 0).

∆y1t = γ∆y2t + u1t − u1t−1 = γu2t + u1t − u1t−1 = rt + θrt−1

∆y1t = vt + θvt−1 is I(1) and y1t is ARIMA(0, 1, 1).

y1t − γy2t = u1t

so this is stationary. Hence y1t and y2t are cointegrated with vector (1,−γ).For VAR representation, we need ε1t + ε2t as forecast errors relative to Φt−1.

ε1t = γε2t + u1t

ε2t = u2t

Et−1(y1t) = γEt−1(y2t)

y1t − Et−1(y1t) = ε1t = γ(y2t − Et−1(y2t)) + u1t = γε2t + u1t

Henceu1t = ε1t − γε2t

Postulate stationary VAR in ∆y1t and ∆y2t.(∆y1t∆y2t

)= Ψ(L)

(ε1tε2t

)Can we invert Ψ(L) to get finite order VAR

ψ(L) =

((1− L) γL

0 1

)

∆y1t = γ∆y2t + ∆u1t = γu2t + u1t − u1t−1= γε2t + u1t − (ε1t − γε2t)= ε1t − ε1t−1 + γε2t

= (1− L)ε1t + ε2t−1

We know that Ψ(z) has a root at 1 so |Ψ(1)| = 0 so Ψ(z)−1 does not exist. We have

∆y1t = γ∆y2t −∆u1t = Γu2t + u1t − u1t−1

u1t−1 = y1t−1 − γy2t−1(∆y1t∆y2t

)=

(−1 γ0 0

)(y1t−1y2t−1

)+

(γu2t + u1t

u2t

)(

∆y1t∆y2t

)=

(−1 γ0 0

)(y1t−1y2t−1

)+

(ε1tε2t

)Cointegrated VAR lapped cointegrated variable on the right hand side. This is error correction repre-sentation.

30

Page 31: Financial Econometrics - Time Series

4.15 Normalizations

If yt(m× 1) and each yit is I(1) and a′yt is stationary. “a” (cointegration factor) is not unique and forscalar b, ba also implies stationary process and a11 = 1 is a appropriate normalization. There may beh < m unique cointegrating vectors. We can stack them in A(m× h) where

A′ =

a′1...a′h

∆yt is stationary and δ = E[∆yt] define ut = ∆yt − δ. Write the Wold Decomposition of ut as

ut = εt + Ψ1εt−1 + Ψ2εt−2 + · · · = Ψ(L)εt

E[εtε′t−s] =

Ω s = 0

0 s 6= 0

Ψ(1) = Im + Ψ1 + Ψ2 + · · ·

Claim: If A′yt is stationary, then the necessary conditions are

A′Ψ(1) = 0

A′δ = 0

Proof.∆yt = δ + Ψ(L)εt

a vector MA representation. Iterate into the path and get

yt = y0 + δt+ (ut + ut−1 + · · ·+ u1)

Do the Beveridge-Nelson decomposition, we say

Ψ(L) = Ψ(1) = (1− L)α(L)

where α(L) =∑∞

j=0 αjLj , αj = (Ψj+1 + Ψj+2 + · · · )

ut = Ψ(L)εt = Ψ(1)εt + α(L)(εt − εt−1)

Define ηt = α(L)εt stationary substitute for ut’s

yt = y0 + δt+ [Ψ(1)εt + (ηt − ηt−1) + Ψ(1)εt−1 + (ηt−1 − ηt−2) + · · ·+ Ψ(1)ε1 + η1 − η0]

yt = y0 + δt+ Ψ(1)[εt + εt−1 + · · ·+ ε1] + ηt − η0This is the multivariate Beveridge-Nelson.

A′yt = A′y0 +A′δt+A′ψ(1)[εt + · · ·+ ε1] +A′ηt −A′η0

so A′δ = 0 and A′Ψ(1) = 0 for stationarity.

Ψ(z) =

(1− z γt

0 1

)

Ψ(1) =

(0 γ0 1

)a′ = (1,−γ)

a′Ψ(1) = (1,−γ)

(0 γ0 1

)= 0

31

Page 32: Financial Econometrics - Time Series

4.16 Triangular Representation

A′ =

a′1...a′h

where A′yt is stationary vectors, A′δ = 0 and A′Ψ(1) = 0.

A′ =

1 0 · · · −γ1,h+1 −γ1,h+2 · · · −γ1,m0 1 · · · −γ2,h+1 −γ2,h+2 · · · −γ2,m...

......

......

...· · · · · · 1 −γh,h+1 −γh,h+2 · · · −γh,m

= [I,−Γh×(m−h)]

zt = A′yt and E[zt] = µ∗1 Partition

yt =

(y1ty2t

)Demeaned zt, z

∗t = zt − µ∗1.

z∗t + µ∗1 = A′yt = [In,−Γ]

(y1ty2t

)y1t = Γy2t + µ∗1 + z∗t

∆y2t = δ2 + u2t

where u2t = E[∆y2t] is serially correlated. Write the stationary components as a Wold decomposition.(z∗tu2t

)=∞∑j=0

(Hs

Js

)εt−s

where ε is m by 1, H is h by m and J is g by m. With Beveridge-Nelson Decomposition, we have

y2t = y2s + δ2t+ J(1)[ε1 + · · ·+ εt] + η2t − η2s

We havey2t = u2 + δ2t+ ζ2t + η2t

where u2 = y2s − η2s, ζ2t is random walk and η2t is stationary.

y1t = Γy2t + u∗1 + z∗t

y1t = u∗1 + Γ(u2 + δ2 + ζ2t + η2t) + z∗t

y1t − u∗1 + Γ(δ2 + ζ2t) + η1t

where u1 = µ∗1 + Γu2, η1t = z∗t + Γη2t This is the Stock-Watson Common Trends Representation ofyt series. yt is linear-combination of g deterministic trends δ2t and g common random walks ζ2t. andstationary components (

u1u2

)+

(η1tη2t

)4.17 Error Correlation VAR

yt as p-th order non-stationary VAR.

yt = d+ Φ1yt−1 + · · ·+ Φpyt−p + εt

Φ(L)yt = d+ εt,Φ(L) = I − Φ1L− Φ2L2 − · · · − ΦpL

p

∆yt = δ + Ψ(L)

(∆− L)Φ(L)yt = Φ(1)δ + Φ(L)Ψ(L)εt32

Page 33: Financial Econometrics - Time Series

(1− L)(α+ εt) = Φ(1)δ + Φ(L)Ψ(L)εt

(1− L)α = 0,Φ(1)δ = 0 is required

(1− L)Im = Φ(L)Ψ(L) identical. polynomial in lag operator.

(1− z)Im = Φ(z)Ψ(z), z = 1 =⇒ Φ(1)Ψ(1) = 0

For any row Φ(1) denote Π′,Π′Ψ(1) = 0,Π′δ = 0

determines the cointegration vector.

Π = Ab,Π′ = b′A′,∀ rows of Φ(1),Φ(1) = BA′

. Hence Φ(1) is singular.|Im − Φ1z − Φ2z

2 − · · ·Φpzp| = 0

at z = 1. There is at least unit root.

yt = α+ Φ1.yt−1 + · · ·+ Φpyt−p + εt

yt = ρ1∆yt−1 + ρ2∆yt−2 + · · ·+ ρp−1∆yt−p−1 + α+ ρ+ εt

whereρ = Φ1 + · · ·+ Φp

ρs = −[Φs+1 + · · ·+ Φp], s = 1, · · · , p− 1

Subtract yt−1

yt − yt−1 = ∆yt = ρ1∆yt−1 + · · ·+ ρp−1∆yt−p+1 + α.+ (ρ− I)yt−1 + εt

[ρ− I] = −[I − Φ1 − · · · − Φp] = −Φ(1) = −BA′

∆yt = α∆yt−1 + · · ·+ δp−1∆yt−p+1 −BA′yt−1εt

4.18

PPP theory sayszt = St − P $

t + P£t

is stationary where a = (1,−1, 1). Then we can use DF to test unit root for each series and zt. Then ztis stationary and St, P

$t , P

£t are cointegrated. In Hamilton, with lira and dollar exchange rate, each was

I(1) but Zt could not reject unit root. Normalize an = 1 and estimate (n−1) cointegrating parameters.

y1t = γ2y2t + γ3y3t + · · ·+ γmymt + ε1t

Minimize the sum of squared residual which is the second moment of zt if there is cointegration

1

T

T∑t=1

ε21t → E[z2t ]

if cointegration; otherwise, their 1T SSR diverges and 1

T 2SSR converges to Brownian motion.

33