chapter 10: basic time series regression - miami university · if the time series is trending, we...

Chapter 10: Basic Time Series Regression

1. Time series data have temporal ordering. Past can affect future, not vice versa. That

is the biggest difference from the cross sectional data.

2. In time series data there is a variable that serves as index for the time periods. That

variable also indicates the frequency of observations. For instance, we have yearly data

if one observation becomes available once a year. The stata command tset declares

the data are time series.

3. The regression that uses time series can have causal interpretation, provided that the

regressor series is uncorrelated with the error series. The regression result will be biased

when there are some omitted series. More often, time series are used for forecasting

purpose.

4. Consider the causal study first.

(a) For simplicity a simple regression is considered

Yt = β0 + β1Xt + ut (1)

where subscript t is used to emphasize that data are time series.

(b) Model (1) is a static model because it only captures the immediate or contem-

poraneous effect of X on Y. When Xt changes by one unit, it only has effect on

Yt (in the same period). In other words, Yt+1, Yt+2, and so on are unaffected. In

light of this, model (1) is static.

(c) It is easy to account for lag effect by using lag value. There are first lag, second

lag, and so on. The table below is one example

t Yt Yt−1 Yt−2 ∆Yt

1 2.2 na na na

2 3 2.2 na 0.8

3 -2 3 2.2 -5

i. t is the index series. t = 1 corresponds to the first (or earliest) observation.

ii. Yt−1 is the first lag of Yt. When t = 2, Yt−1 = Y2−1 = Y1 = 2.2. So the second

observation of Yt−1 is the first observation of Yt. The first observation of Yt−1

1

is missing value (denoted by na) because there is no data for Y0. Essentially

you get the first lag by pushing the whole series one period down.

iii. Yt−2 is the second lag of Yt. You get the second lag by pushing the whole

series two periods down (and two missing values are generated).

iv. ∆Yt is the first difference. By definition,

∆Yt ≡ Yt − Yt−1 (2)

for example, when t = 2, ∆Y2 ≡ Y2 − Y2−1 = 3 − 2.2 = 0.8. ∆Y1 is missing

since Y0 is missing

v. Exercise : Use the above table and find ∆Yt−1 for t = 1, 2, 3

vi. The stata commands to generate the first lag and second lag are

gen ylag1 = y[_n-1]

gen ylag2 = y[_n-2]

vii. Alternatively you can refer to the first and second lags by using the lag

operator L (after declaring time series using tset)

L.y, L2.y

(d) The distributed lag (DL) model uses lag value of X to account for the lag effect.

A DL model with two lags are

Yt = β0 + δ0Xt + δ1Xt−1 + δ2Xt−2 + ut (3)

where the parameter δ is called (short-run) multiplier.

i. It follows that a change in Xt has effect on Yt, Yt+1 and Yt+2. Then the effect

of Xt vanishes.

2

ii. Mathematically, we can show

dYt

dXt

= δ0 (4)

dYt+1

dXt

= δ1 (5)

dYt+2

dXt

= δ2 (6)

dYt+j

dXt

= 0, (j > 2) (7)

where dYt+1

dXtcan be obtained from the regression where Yt+1 is the dependent

variable.

iii. When we graph δj as a function of j, we obtain the lag distribution.

iv. The cumulative effect of Xt on Yt, Yt+1 and Yt+2 is called long run propensity

(LRP) or long run multiplier.

LRP ≡ δ0 + δ1 + δ2 (8)

There are two steps to obtain the estimate and standard error for LRP. First,

write θ = δ0 + δ1 + δ2, which implies that δ0 = θ− δ1 − δ2. Then equation (3)

becomes

Yt = β0 + δ0Xt + δ1Xt−1 + δ2Xt−2 + ut (9)

= β0 + (θ − δ1 − δ2)Xt + δ1Xt−1 + δ2Xt−2 + ut (10)

= β0 + θXt + δ1(Xt−1 −Xt) + δ2(Xt−2 −Xt) + ut (11)

The last equation suggests running the transformed regression of Yt onto Xt,

(Xt−1 − Xt) and (Xt−2 − Xt). Then θ = LRP , se(θ) = se(LRP ) and the

confidence interval for θ is the confidence interval for LRP.

v. Due to multicollinearity, it can be difficult to obtain precise estimates of the

short run multiplier δj. However we can often get good estimate of LRP.

vi. To mitigate multicollinearity, we can impose some restriction on the lag

distribution. For example, if we assume the lag distribution is flat, i.e.,

3

δ0 = δ1 = δ2 = δ then model (3) reduces to

Yt = β0 + δZ + ut, (Z ≡ Xt +Xt−1 +Xt−2) (12)

(e) In theory we need to include infinite lag value of X, which is infeasible, if the

effect of X never dies out. Instead we may consider an autoregressive distributed

lag model (ADL) given by

Yt = β0 + ρYt−1 + δXt + ut (13)

By recursive substitution

Yt+1 = β0 + ρYt + δXt+1 + ut+1 (14)

= β0 + ρ(β0 + ρYt−1 + δXt + ut) + δXt+1 + ut+1 (15)

= . . .+ ρδXt + . . . (16)

we can show the effect of X on Y is

dYt

dXt

= δ,dYt+1

dXt

= ρδ, . . . ,dYt+j

dXt

= ρjδ (17)

Alternatively you use apply the chain rule of calculus and get

dYt+1

dXt

=dYt+1

dYt

dYt

dXt

= ρδ (18)

dYt+2

dXt

=dYt+2

dYt+1

dYt+1

dXt

= ρ(ρδ) = ρ2δ (19)

The effect of X decreases exponentially if |ρ| < 1, but never becomes zero no

matter how large j is.

5. We get the autoregressive (AR) model (or autoregression) if X is excluded from the

ADL model. More explicitly, a first order autoregressive model is

Yt = β0 + ρYt−1 + ut (20)

(a) The AR model is more suitable for forecasting than ADL and DL models since

we do not need to forecast X before forecasting Y. This is because in AR model

4

the regressor is just the lagged dependent variable.

(b) The AR model can be used to test serial correlation. A series is serially correlated

if ρ in (20) is significantly different from 0 (when its p-value is less than 0.05).

(c) (Optional) The time series has a unit root (and becomes non-stationary) if it is

extremely serially correlated (or highly persistent). In theory, a series has unit

root if ρ = 1 in (20). In practice ρ ≈ 1 is suggestive of unit root. We can show

the variance of a unit root series goes to infinity as t rises. The conventional law

of large number and central limit theorem both require finite variance, so neither

apply to a unit root series. That means in general we need to difference a unit

root series (to make it stationary) before using it in regression. One exception is,

you do not need to difference unit root series if they are cointegrated. The unit

root series is also called random walk.

(d) The AR model with one lag can be fitted by stata command

reg y L.y

(e) Then the one-period ahead forecast is computed as

Yt+1 = β0 + ρYt (21)

and the two-period ahead forecast is

Yt+2 = β0 + ρYt+1 (22)

In general the h-period ahead forecast is

Yt+h = β0 + ρYt+h−1 (h = 2, 3, . . .) (23)

In short all forecasts can be computed in an iterative (recursive) fashion.

(f) The AR model with two lags can be fitted by stata command

reg y L.y L2.y

(g) We can add third lag, fourth lag, and so on until the new lag has insignificant

coefficient.

5

6. If the time series is trending, we can add a linear trend (and quadratic trend) into the

AR model and run below regression:

Yt = β0 + ρYt−1 + ct+ ut (24)

where t is the linear trend. According to the Frisch-Waugh Theorem, the estimated

coefficient ρ in (24) can be obtained by regressing Y detrendedt onto Y detrended

t−1 , where

Y detrendedt denotes the detrended series, and it is the residual of regressing Yt on the

trend. In short, ρ in (24) measures the effect on the deviation of the series around its

trend.

7. The trend term can also be used in DL model as proxy for unobserved trending factors.

8. If we have monthly or quaterly data, there may exist regular up and down pattern at

specific interval. This phenomenon is called seasonality. For instance, the sale data

typically go up in November and December. It is easy to account for seasonality by

using seasonal dummy that equals one during a particular season.

6

Example: Chapter 10

1. The data are 311 fertil3.dta. See example 10.4 and 10.8 in textbook for detail.

2. The dependent variable is general fertility rate (gfr). It is the number of children born

to every 1000 women of childbearing age. The key independent variable is personal

tax exemption (pe). We want to control two variables. ww2 is a dummy variable that

equals 1 during World War II (year 1941-1945). pill is another dummy variable that

equals one after year 1963 (in that year birth control pill became available)

3. The data contain 72 yearly observations between 1913 to 1984. In 1913, for example,

gfr = 124.7, pe = 0 and pill = 0.

4. Stata command twoway line gfr pe year draws the time series plot of gfr (red line)

and pe (blue line). It is shown that pe was upward trending before mid 1940. Starting

in 1950, both gfr and pe overall have been downward trending. We notice that the

biggest rise in pe happened around WWII.

5. Stata command reg gfr pe ww2 pill estimates the static model of (10.18) in text-

book. All variables are significant, and the signs of their coefficients are expected. The

coefficient of pe indicates that a 12 (= 1/0.083) dollar increase in pe will increase gfr

by about one birth per 1000 women. Being in WWII causes 24 fewer births for every

1000 women. Having access to birth control pill on average causes 31 fewer birth per

1000 women.

6. The static model has a drawback that it only captures the immediate or contempo-

raneous effect of pe on gfr. By contrast, the distributed lag model can capture both

immediate and lag effects. We cannot ignore the lag effect as long as it takes a while

for people to adjust. After generating the first and second lag of pe, stata command

reg gfr pe pelag1 pelag2 ww2 pill

reports the distributed lag model (10.19) in textbook. The F test shows that pe and its

two lags are jointly significant, even though individually insignificant. The individual

insignificance is due to the correlation among pe and its lagged values.

7. Next we run the transformed regression (shown on the top of page 359 in 5th edition)

in order to estimate long run propensity (LRP). It is shown that LRP = .1007191 and

its standard error is .0298027.

7

8. We may worry that the regression (10.18) produces spurious (biased) result. From the

time series plot, it is obvious that gfr and pe tended to move together (both moved

down) after 1950s. We may suspect that there is an omitted trending variable, which

is correlated with both gfr and pe. In other words, we suspect the observed correlation

between gfr and pe is possibly due to that omitted variable, so tells us nothing about

causality.

9. Then we include a linear trend as a proxy for the unobserved trending factors. The

coefficient of trend is -1.149872 and significant. This confirms that gfr is overall down-

ward trending. After the trending factor is controlled for, the coefficient of pe becomes

.2788778, much bigger than without trend. This new estimate measures the effect of

pe on detrended gfr. Put differently, it is shown that pe explains the deviation of gfr

around its trend better than the level of gfr.

10. We obtain equation (10.35) in textbook after using both trend and squared trend as

proxy. The squared trend is significant, confirming the nonlinear trending behavior of

gfr.

11. So far we use these time series for purpose of inducing causality. In reality, the main

use of time series is forecasting. We cannot use the stata time series command until

we declare the data are time series, which is done by command tset year

12. Autoregression (AR) is one example of time series model. It is a popular forecasting

model because it uses a variable’s own history to predict its future. In other words, no

other variables are needed. Regressors in autoregression consist of lagged dependent

variables only.

13. Autoregression is rationalized by the fact that most time series are serially correlated.

That is, the current level of most series is correlated with previous or earlier level.

Autoregression utilizes the idea that history can repeat itself.

14. We start with the first order AR model for gfr. The stata command is

reg gfr L.gfr

where L.gfr refers to the fist lag value gfrt−1. By using the lag operator L, we can

skip the generation of lagged value using [ n-1].

8

15. The estimated AR(1) model is

gfrt = 1.304937 + .9777202gfrt−1

It is easy to obtain forecast using AR model. For example, let t = 1980, gfr1979 = 100,

then the forecast of gfr in 1980 is gfr1980 = 1.304937 + .9777202(100) = 99.076957

16. By contrast, if we use distributed lag model, then we need to forecast X before fore-

casting Y. This is a harder job.

17. The AR(2) model (with two lags) can be estimated by command reg gfr L.gfr

L2.gfr where L2.gfr refers to the second lag gfrt−2. Both the first and second lags

are significant, implying that AR(1) model is mis-specified by omitting the second lag

term.

18. In theory we can keep adding lagged values until it becomes insignificant.

19. Exercise : Use the lag operator L and rewrite the stata command for regression (10.19)

in textbook.

20. Exercise : Please estimate the follow model

gfrt = β0 + β1pet + β2ww2t + β3pillt + β4gfrt−1 + u

where the lagged dependent variable is used as proxy for the unobserved factors. Com-

pare it to regression (10.34) in textbook.

21. Exercise : Continue the above exercise. Can you reject the hypothesis H0 : β4 = 1?

How to estimate the restricted model that imposes this hypothesis?

9

05

01

00

15

02

00

25

0

1920 1940 1960 19801913 to 1984

births per 1000 women 15−44 real value pers. exemption, $

10

Do File

* Do file for chapter 10

set more off

clear

capture log close

cd "I:\311"

log using 311log.txt, text replace

use 311_fertil3.dta, clear

list in 1/5

sum pe

* time plot for gfr and pe

twoway line gfr pe year

* generate dummy variable for World War II (1941-1945)

gen ww2 = 0

replace ww2 = 1 if year==1941 | year==1942 | year == 1943 | year == 1944 | year == 1945

* equation 10.18 in textbook, static model

reg gfr pe ww2 pill

* generate the first and second lagged value of pe

gen pelag1 = pe[_n-1]

gen pelag2 = pe[_n-2]

* equation 10.19 in textbook, finite distributed lag model

reg gfr pe pelag1 pelag2 ww2 pill

* F test for joint significance of pe and its lags

test pe pelag1 pelag2

* estimate LRP and its standard error based on transformed regression

gen dpe1 = pelag1 - pe

gen dpe2 = pelag2 - pe

reg gfr pe dpe1 dpe2 ww2 pill

* generate trend and squared trend

gen t = _n

gen tsq = t^2

* equation 10.34 in textbook

reg gfr pe ww2 pill t

* equation 10.35 in textbook

15

reg gfr pe ww2 pill t tsq

* declear time series data

tset year

* AR(1) regression

reg gfr L.gfr

* AR(2) regression

reg gfr L.gfr L2.gfr

log close

exit

16

chapter 10: basic time series regression - miami university · if the time series is trending, we...

Documents