lecture note 11(2014 s)

8/12/2019 Lecture Note 11(2014 S)

1/42

1

Lecture note 11

Forecasting, Box-Jenkins, and Unit Root Tests

11.1 Forecast Evaluation

11.2 Forecast Exercise

11.3 Box-Jenkins Methodology

11.4 Unit Root Tests

8/12/2019 Lecture Note 11(2014 S)

2/42

2

11.1 Forecast Evaluation

Recall:Suppose we know the true generating process, for example the , then the one-step-aheadforecast would be .

is purely random at , and the __________ part of . Therefore, wealways encounter forecast error.

In practice, we never know the actual order of the process and theparameters, i.e., , , ,,,.., . We need to estimate all these parameters.Therefore, the one-step-ahead forecast is

Obviously, the will not be same as as the forecasts made using theestimated model.

8/12/2019 Lecture Note 11(2014 S)

3/42

3

Consider the true generating process is the . If we use themodel, it will fit the generated data well. However, sometimes a simpler model, say

the,____have a better forecast result compared to the truemodel.

Why so?

Generally, a large model contains in-sample _____________ that induce forecast errors.

Clark and West (2007), Dimitrios and Guerard (2004) and Liu and Enders (2003)

forecasts using overly parsimonious models with little uncertainty can provide betterforecasts that models consistent with the actual data-generating process.

How good are my forecasts?

Compare to the realizations

Compare to different forecast models.

Need to have comparison criteria

8/12/2019 Lecture Note 11(2014 S)

4/42

4

How do we calculate and compare the forecast errors among the models?

Suppose we have 500 observations denoted . We can use 90% ofthe observations, i.e. 450 observations, to estimate the competing models. Themore observations we have, the more we can withhold the data

We use the estimated models to forecast . Since we know the realization of, we can easily calculate the one-step-ahead forecast errors forcompeting models.

At t= 451, we can use to re-estimate the models. We then use thenew estimated models to forecast and calculate . [_______scheme]

Repeating the process, we will have 50 forecasts ( , ,,) and 50forecast errors (, , ,) for each competing model.

8/12/2019 Lecture Note 11(2014 S)

5/42

5

Remarks: Use the, and one-step ahead forecasts as an example(a) ________ scheme:

At time 450, when we forecast , the is estimated based on At time 451, when we forecast , the is estimated based on At time 452, when we forecast , the is estimated based on

(b) ______ schemeAt time 450, when we forecast , the is estimated based on At time 451, when we forecast , the is estimated based on At time 452, when we forecast , the is estimated based on

(c) _____ scheme

At time 450, when we forecast , the is estimated based on At time 451, when we forecast , the is estimated based on At time 452, when we forecast , the is estimated based on

8/12/2019 Lecture Note 11(2014 S)

6/42

6

Forecast Evaluation Criteria

(A) Mean Squared Error:

(B) Root Mean Squared Error: RMSE =

(C) Mean Percentage Error:

(D) Mean Absolute Error:

(E) Mean Absolute Percentage Error:

8/12/2019 Lecture Note 11(2014 S)

7/42

7

How do we know the MSE(s) of___models are statistically different from each

other? Speaking differently, how do we know the difference between two models is

not due to the pure chance?

The F Statistic

Taking theMSEas an example, we can use theFtest, , where iscalculated based on forecast errors of Model A; is calculated based on forecasterrors of Model B. Note that the larger should be put on the numerator.

If both models are indifferent in terms of forecasting performance, theF statistic value

should be 1; if theF statistic value is larger than , then we reject the Models Aand B are indifferent.

oNote that: TheFtest here is valid only if

(i) the forecast have zero mean and are normally distributed;(ii)

are not serially correlated,(which is often violated when we

havej-step ahead forecast,j >1).

(iii) between the two competing models are not _______.

8/12/2019 Lecture Note 11(2014 S)

8/42

8

The Granger-Newbold Test

Define the general loss function as . For example, the quadraticloss function is .

Denote the loss differential between the two forecasts by .

Two models are equally good if for all t. Otherwise, .

If Assumptions (i) and (ii) discussed in F test are held, Granger and Newbold (1976)

showed that for quadratic loss function, testing is equivalent testing , where , .

is just the sample correlation coefficient between and .

8/12/2019 Lecture Note 11(2014 S)

9/42

9

Example:

One step ahead forecast error for 7 periods.

0.01 0.05 -0.01 0.02 0.03 0.07 0.05 0.02 0.06 0.01 -0.04 0.01 0.03 0.1

Answer:

0.03 0.11 0.00 -0.02 0.04 0.10 0.16 -0.01 -0.01 -0.02 0.06 0.02 0.04 -0.05

Since . We dont reject that both are indifferent.

z 0.5915 1.0000 x 1.0000 x z

-1.7969294. display (-0.5915)/sqrt((1-0.5915*0.5915)/6)

8/12/2019 Lecture Note 11(2014 S)

10/42

10

Harvey et al. (1997) suggested researchers to run on , i.e., . If thenull, is not rejected, (where the alternative ), both models areequally good. Otherwise one is better than the other.

A simple and quick way is to regress the (difference of the forecast errorsbetween two models) on a constant term and use the ttest to determine whether the

estimated intercept is statistically different from zero.

_cons -.0042857 .0165985 -0.26 0.805 -.0449008 .0363294d Coef. Std. Err. t P>|t| [95 Conf. Interval]

Total .011571428 6 .001928571 Root MSE = .04392 Adj R-squared = 0.0000 Residual .011571428 6 .001928571 R-squared = 0.0000 Model 0 0 . Prob > F = . F( 0, 6) = 0.00 Source SS df MS Number of obs = 7. reg d. gen d = e1-e2

8/12/2019 Lecture Note 11(2014 S)

11/42

8/12/2019 Lecture Note 11(2014 S)

12/42

12

Stata> arima y, arima (2,0,0) noconstant

The information criteria for the

(a) The AIC and BIC suggest the is better. Thefits the 90 observations better.(b) But this doesnt necessary mean the

delivers a better forecast result.

(c) The estimates are based on sample 90% of the sample observations. We can evaluate the forecast

performances between theand thebased on the remained 10% observations.

/sigma 1.042314 .0551962 18.88 0.000 .934131 1.150496L2. -.0255046 .1024235 -0.25 0.803 -.2262508 .1752417 L1. .8056681 .0917879 8.78 0.000 .6257671 .9855691

arARMAy Coef. Std. Err. z P>|z| [95 Conf. Interval] OPG

Log likelihood = -146.5302 Prob > chi2 = 0.0000 Wald chi2(2) = 120.05Sample: 1 - 100 Number of obs = 100ARIMA regression

Note: N=Obs used in calculating BIC; see [R] BIC note. 100 . -146.5302 3 299.0604 306.8759

Model Obs ll(null) ll(model) df AIC BIC

8/12/2019 Lecture Note 11(2014 S)

13/42

13

One step ahead forecast

Stata > predict yA if t > (90) /*1 step ahead forecast for y after the ___observation*/

Stata > tsline y yA /* Graphically, comparing the true value y and forecast yA*/

Stata> tsline yA if t> (90) || tsline y if t < (91) /* plot forecast yA and historical y together */

Themodel

-6

-4

-2

0

2

4

0 20 40 60 80 100t

y xb prediction, one-step

-4

-2

0

2

4

0 20 40 60 80 100t

xb prediction, one-step y

8/12/2019 Lecture Note 11(2014 S)

14/42

14

Themodel

Note: Although the BIC and AIC suggest themodel is better, graphically the forecast pointsare very similar between these 2 models.

-6

-4

-2

0

2

4

0 20 40 60 80 100t

y xb prediction, one-step

-4

-2

0

2

4

0 20 40 60 80 100t

xb prediction, one-step y

8/12/2019 Lecture Note 11(2014 S)

15/42

15

Forecast Evaluation: One step ahead forecast

Stata > generate errorA = yyA /* calculate the forecast error */

Stata> generate SqError = errorA*errorA /* squared the forecast errors */

Stata> summarize (SqError) /* mean squared error is provided*/

Model A:

Model B:

If we look at the MSE(s) between Model A and Model B, Model __s MSE is smaller.

sqE_A 10 2.568757 5.488667 .1343741 18.02609Variable Obs Mean Std. Dev. Min Max

sqE_B 11 2.330664 5.306183 .0487499 18.17375

Variable Obs Mean Std. Dev. Min Max

8/12/2019 Lecture Note 11(2014 S)

16/42

16

Suppose we create a variable called d(defined as the difference between errorA and

errorB, where errorA is the forecast errors of Model A). Run the don the constant

term only, and obtain the following result.

Another test is the Granger-Newbold test. We create variables called and , anduse the ttest discussed previously.

The t statistic is 0.0488 which is clearly less than s critical value. Bothmodels are indifferent in terms of _______________.

_cons -.008101 .0091618 -0.88 0.400 -.0288265 .0126245d Coef. Std. Err. t P>|t| [95 Conf. Interval]

Total .007554524 9 .000839392 Root MSE = .02897 Adj R-squared = 0.0000 Residual .007554524 9 .000839392 R-squared = 0.0000 Model 0 0 . Prob > F = . F( 0, 9) = 0.00 Source SS df MS Number of obs = 10. reg d(90 missing values generated). gen d = errorA -errorB

.04881362. di (0.01606)/sqrt( (1-0.1606*0.1606)/9)

z 0.0106 1.0000 x 1.0000 x z(obs=10). corr x z

8/12/2019 Lecture Note 11(2014 S)

17/42

17

Confidence Interval for One step forecast

We use Model A as an example.

Stata> predict sigma2, ___

Stata > generate upper = yA + 1.96*sqrt(sigma2)

Stata> generate lower = yA -1.96*sqrt(sigma2)

Stata> tsline yA upper lower if t> (90) || tsline y if t < (91) /* list forecast yA and historical y together */

-6

-4

-2

0

2

4

0 20 40 60 80 100t

one-step ahead prediction 95% CI

95% CI historical y

8/12/2019 Lecture Note 11(2014 S)

18/42

18

J-step ahead forecast

Stata > predict y_J, __________ /* at time 90, forecast */Stata> tsline y_J if t> (90) || tsline y if t < (91)

The true model,, (long term) mean is 0. [Recall: ]Therefore, over the time the forecast value is converging to 0.

-2

0

2

4

0 20 40 60 80 100t

J-step ahead historical y

8/12/2019 Lecture Note 11(2014 S)

19/42

19

Example:

Use simulated 200 data points from , where We used 180data points to estimate the model. Then we forecast 20 points ahead.Stata> arima y if t predict y_J, dynamic(180) /* at time 180, forecast */Stata> tsline y_J if t> (179) || tsline y if t < (181)

The mean is

-4

-2

0

2

4

6

0 50 100 150 200time

J-step ahead forecasts historical y

8/12/2019 Lecture Note 11(2014 S)

20/42

20

Example: Different commands for estimating ARIMA models.

Use simulated 10,000 data points from , where

Stata A> arima y, arima(1,0,0)

Stata B> arima y, ar(1) /* If it is an arma model, then we can type > arima, ar(1/p) ma(1/q) */

/sigma 1.006175 .0069936 143.87 0.000 .9924676 1.019882L1. .4988331 .0086065 57.96 0.000 .4819647 .5157015 ar

ARMA_cons 4.017689 .0200645 200.24 0.000 3.978363 4.057015y

y Coef. Std. Err. z P>|z| [95 Conf. Interval] OPGLog likelihood = -14251.09 Prob > chi2 = 0.0000 Wald chi2(1) = 3359.37Sample: 0 - 9999 Number of obs = 10000ARIMA regression

/sigma 1.006175 .0069936 143.87 0.000 .9924676 1.019882 L1. .4988331 .0086065 57.96 0.000 .4819647 .5157015 arARMA

_cons 4.017689 .0200645 200.24 0.000 3.978363 4.057015yy Coef. Std. Err. z P>|z| [95 Conf. Interval] OPG

Log likelihood = -14251.09 Prob > chi2 = 0.0000 Wald chi2(1) = 3359.37Sample: 0 - 9999 Number of obs = 10000ARIMA regression

8/12/2019 Lecture Note 11(2014 S)

21/42

21

Stata C > reg y L1.y /* note that if we have 3 lags of y, we can type reg y L1.y L2.y L3.y */

Stata D > arima y L1.y /*

Note: Command C gives us the information but we cant use the_______command.

_cons 2.015774 .0362439 55.62 0.000 1.944729 2.08682L1. .4984087 .0086667 57.51 0.000 .4814201 .5153972 y

y Coef. Std. Err. t P>|t| [95 Conf. Interval]Total 13460.234 9998 1.34629266 Root MSE = 1.0058 Adj R-squared = 0.2485

Residual 10114.2468 9997 1.0117282 R-squared = 0.2486 Model 3345.9872 1 3345.9872 Prob > F = 0.0000 F( 1, 9997) = 3307.20 Source SS df MS Number of obs = 9999. reg y L1.y

/sigma 1.005746 .0069965 143.75 0.000 .9920334 1.019459 _cons 2.015774 .0357956 56.31 0.000 1.945616 2.085932

L1. .4984087 .0086088 57.90 0.000 .4815358 .5152816 yy

y Coef. Std. Err. z P>|z| [95 Conf. Interval] OPGLog likelihood = -14245.26 Prob > chi2 = 0.0000 Wald chi2(1) = 3351.87Sample: 1 - 9999 Number of obs = 9999ARIMA regression

8/12/2019 Lecture Note 11(2014 S)

22/42

22

11.3 Box Jenkins Methodology

We have discussed the , , . , , .

Several assumptions are imposed into the models.

Time Series is weakly stationary. Speaking differently, the time series is invariant withrespect to time. Thus,

o The constructed model (AR, or MA, or ARMA) is _____.

o If the series is not stationary, the system will explode.

o Cant do forecasting exercise.

is a white noise.o Mean is zero, variance is , and the errors are uncorrelated across time periods.

8/12/2019 Lecture Note 11(2014 S)

23/42

23

Example:

Suppose we are given a time series raw data, say . How do we model (or forecast) ? Pleasestate the procedure step by step.

(1) _____________

o Stationary or Non-Stationary? (If non-stationary, take the difference on variable

first, . Some time we might want to remove the seasonal effect,by using )

o If the series is stationary, then we might use correlograms to decide the models and

# of lags(2) Estimation

o Run regressions/ If we cant decide the model in the stage (1), we can use AIC (or

BIC) criterion here

(3) ________ Checking

We need to include enough AR & MA terms to make sure the residual terms are the

white noise in the models.

o The coefficients of thepand qlags must be significant, but the interior one needsnot be. We can skip the interim terms if they are not useful.

o Test the residuals; if the residual is white noise, the model is considered ok.

(4) Forecasting (either one step ahead or j step ahead)

8/12/2019 Lecture Note 11(2014 S)

24/42

24

11.4 Unit Root Tests

Stationary is ___________ important for time series analysis, (not only just for the

AR, MA, and ARMA models).

If a time series is not stationary, we must transform the non stationary series to be a

stationary series before any regression analysis. Otherwise, the regression result is

not reliable. Usually, taking the (first) difference on the variable is one of the

possible ways to ________ the series, i.e.

.

Graphically, if the ACF diagram of the series does not die out _____ enough aftermany lags, then the series is very likely a nonstationary series.

8/12/2019 Lecture Note 11(2014 S)

25/42

25

Example: We analyze the hp (UK housing price) variable.

Graphically, looking at PACF we might conclude that the hp variable can be modeled as

____. However, if we look at the ACF , it dies out very slow. (very likely nonstationary!)

8/12/2019 Lecture Note 11(2014 S)

26/42

26

What if you still intend to use the AR(1) model?

Dependent Variable: HP

Method: Least SquaresDate: 10/25/11 Time: 17:54

Sample (adjusted): 1991M02 2007M05

Included observations: 196 after adjustments

Convergence achieved after 10 iterations

Variable Coefficient Std. Error t-Statistic Prob.

C 25800.90 12064.50 2.138580 0.0337

AR(1) 1.010519 0.001691 597.4181 0.0000

R-squared 0.999457 Mean dependent var 88796.29

Adjusted R-squared 0.999454 S.D. dependent var 42311.82

S.E. of regression 988.7413 Akaike info criterion 16.64089

Sum squared resid 1.90E+08 Schwarz criterion 16.67434

Log likelihood -1628.808 F-statistic 356908.3

Durbin-Watson stat 1.394712 Prob(F-statistic) 0.000000

Inverted AR Roots 1.01

Estimated AR process is nonstationary

The statistical report shows that the variable is not stationary.

___________

___________

Therefore, the model is not reliable and will be eventually explosive.

8/12/2019 Lecture Note 11(2014 S)

27/42

27

Weve seen that if the AR root is 1 or larger than 1. The variable is nonstationary.

And therefore the regression result is not reliable. This type of regressions is called

________ regression. Usually R-squared is very high but it is misleading.

Graphically, for non stationary variables, the ACF does not die out easily. (very

persistent!).

On the other hand, for stationary variables the ACF decline exponentially.

How do we statistically test whether the time series data is stationary?Suppose we have an AR(1) model, A unit root ____ if .

Therefore, we can test if is significantly different from 1 or not.

Alternatively, we can test the regression: andexamine whether ( is zero.

8/12/2019 Lecture Note 11(2014 S)

28/42

28

11.4.1 Dickey Fuller (DF) Test

(there is a unit root) versus _________________

There are three versions of the DF test

Case 1: o Case 1 is the simplest form

Case 2: o We include an intercept to the model

Case 3: o We include an intercept and a trend to the model

Whether to include the intercept and/or time trend is an ___________question.

The DF test is very restricted. Because it applies only on AR(1).

8/12/2019 Lecture Note 11(2014 S)

29/42

29

Example: Fertility Rate

Example: Singapore Inflation Rate (data file can be found in edventure)

-5

0

5

10

inf

1980m1 1990m1 2000m1 2010m1time

8/12/2019 Lecture Note 11(2014 S)

30/42

30

Stata> ______ inf, regress _____

/* inf (inflation) is the variable name; I include time trend and 1 lag in the unit root testing */

Stata> dfuller inf, regress /* without including trend */

MacKinnon approximate p-value for Z(t) = 0.0174Z(t) -3.784 -3.986 -3.426 -3.130 Statistic Value Value Value Test 1 Critical 5 Critical 10 Critical Interpolated Dickey-FullerDickey-Fuller test for unit root Number of obs = 357

MacKinnon approximate p-value for Z(t) = 0.0029Z(t) -3.798 -3.451 -2.876 -2.570

Statistic Value Value Value Test 1 Critical 5 Critical 10 Critical Interpolated Dickey-FullerDickey-Fuller test for unit root Number of obs = 357

8/12/2019 Lecture Note 11(2014 S)

31/42

31

Augmented Dickey Fuller (ADF) Test

As discussed previously, DF test applies only on AR(1). But AR(1) might not

capture ___ serial correlation in in which case AR(p) is more appropriate. ADF is a more general unit root test compared to DF test as it can be used to test

AR(p) models.

The unit test model1is as follow: (there is a unit root) versus is wrong

Similarly, we can include an intercept and/or a time dummy into the model.

Case (1): Case (2): Case (3): 1It is called augmented DF test because the test is augmented by lags of______

8/12/2019 Lecture Note 11(2014 S)

32/42

32

We can graph the series and decide which model (Case 1, 2, or 3) to use. Anothernatural question raised is: how many lags of should we include?

Schwert (1989) suggested that we can set the number of lags not ______ than where

Example:

If we have 358 monthly inflation data points, then we set the number of lags in the ADF

test at most up to .

If the estimate of is not significant, then we perform the unit root test again byusing 15 lags. Keep reducing the # of lags until the estimate of is significant.

8/12/2019 Lecture Note 11(2014 S)

33/42

33

At first, we include 16 lags. Stata > dfuller inf, regress lags(16)

Since the estimate of the is insignificant (P-value 0.312), we can rerun the testby including only up to 15 lags. It is almost sure that the ADF test rejects the null

hypothesis.

The inflation rate variable doesnt contain unit root. Therefore, it is a stationary process.

_cons .1012828 .0377618 2.68 0.008 .0269927 .1755729L16D. .0526748 .0519866 1.01 0.312 -.0496001 .1549498 L15D. .071637 .0514199 1.39 0.165 -.0295233 .1727973

L14D. .1053225 .0521724 2.02 0.044 .0026819 .2079632 L13D. -.0278045 .0527845 -0.53 0.599 -.1316493 .0760403 L12D. -.456482 .0468371 -9.75 0.000 -.5486263 -.3643378 L11D. .0423739 .0475525 0.89 0.374 -.0511779 .1359256 L10D. .0263241 .0488399 0.54 0.590 -.0697603 .1224086 L9D. .1000233 .0489448 2.04 0.042 .0037324 .1963141 L8D. .0068582 .0490985 0.14 0.889 -.089735 .1034514 L7D. .0525277 .0494198 1.06 0.289 -.0446976 .149753 L6D. .0498249 .0494312 1.01 0.314 -.0474228 .1470726 L5D. .1268932 .0488078 2.60 0.010 .0308719 .2229146 L4D. .0803618 .0567872 1.42 0.158 -.0313576 .1920813 L3D. .1720609 .0560726 3.07 0.002 .0617474 .2823744 L2D. .167 8053 .0553966 3 .03 0.003 .0 588216 .27 6789 LD. .0012893 .0555281 0.02 0.981 -.1079531 .1105318 L1. -.0587156 .0183713 -3.20 0.002 -.0948581 -.0225731 inf

D.inf Coef. Std. Err. t P>|t| [95 Conf. Interval]MacKinnon approxima te p-value for Z(t) = 0.0202Z(t) -3.196 -3.453 -2.876 -2.570

Statistic Value Value Value Test 1 Critical 5 Critical 10 Critical Interpolated Dickey-Fulle rAugmented Dickey-Fuller test for unit root Number of obs = 341

8/12/2019 Lecture Note 11(2014 S)

34/42

34

11.4.2 Other Unit root tests

Phillips-Perron (PP) Unit Root Test is very popular in the analysis of financial time series.

Unlike (A)DF tests, PP test corrects for any ___________ and _____________ in the errors,

Thus, practically PP is better than ADF test because the test is more robust to the violations of

classical linear assumptions. Another advantage is we do not need to specify the number of

lags.

Example: Inflation rate

Stata>pperron inf

The test statistic is larger than the critical values. We reject the null hypotheis.

MacKinnon approximate p-value for Z(t) = 0.0006Z(t) -4.243 -3.451 -2.876 -2.570

Z(rho) -27.674 -20.386 -14.000 -11.200Statistic Value Value Value Test 1 Critical 5 Critical 10 Critical

Interpolated Dickey-FullerNewey-West lags = 5Phillips-Perron test for unit root Number of obs = 357

8/12/2019 Lecture Note 11(2014 S)

35/42

35

Kwiatkowski, Phillips, Schmidt and Shin (KPSS) had developed to ________ to traditional

unit root tests, such as ADF and PP tests.

For ADF, PP tests --

(there is a unit root) versus

For KPSS test -- there is not unit root versus there is a unit root

Statistically speaking, ADF (or PP) is to find evidence to reject unit root, and KPSS is to find

evidence to reject no unit root.

Example: Stata> kpss inf

16 .129 15 .132 14 .135 13 .139 12 .143 11 .149 10 .156 9 .165 8 .176 7 .19 6 .21 5 .236 4 .274 3 .332 2 .43 1 .628 0 1.22Lag order Test statistic10 : 0.119 5 : 0.146 2.5 : 0.176 1 : 0.216Critical values for H0: inf is trend stationaryAutocovariances weighted by Bartlett kernelMaxlag = 16 chosen by Schwert criterionKPSS test for inf

8/12/2019 Lecture Note 11(2014 S)

36/42

36

Appendix

[A] ARIMA and Other Models

If we perform unit root tests and find that the time series

is not stationary, then we must take a first difference on series, i.e.

.

To make sure the first difference variable is stationary, we perform unit root tests again on the first difference variable . Ifis stationary, then we can start the B-J approach. Otherwise, take the difference on again.

We call this is an ARIMA2(p, 1, q) model. 1 indicates that the series is stationary after it is taken the first difference.

Example:

What does 2 mean in ARIMA(p, 2, q)?

[Ans] Recall: Box & Jenkins Approach

StationaryIdentificationEstimationCheckingForecasting.

Sometimes, nonstationary is due to the seasonal effects, i.e. some seasons have different pattern than the others. Therefore, we

can take first order differences with a seasonal difference at lag 4.

Example: energy consumption /electrical usageFor example, the demand is very high during summer in Taiwan. But the high demand happens during winter in Chicago, NYC, etc.

The electricity demand usually showsseasonalvariation.

2AutoRegessive Integrated Moving Average(ARIMA)

8/12/2019 Lecture Note 11(2014 S)

37/42

37

ARIMA is a pure classical statistical technique. We dont really need to know the (economic) structure.

Therefore, this model has been used in many fields, for example Physics, Biology , environmental issues, etc.

We can extend the ARMA models to the models that have more economic meanings.

Example:

AR(p): , Vector AutoRegression (VAR): Multiple time series in the generalized AR models.

[B] Parameter Instability and Structural Change (* not tested in the exam)

The time series data are often nonstationary. It can be due to a time trend, and/or seasonal component, and/or structural change.

Trend (stationary): Once the trend is removed, the series is stationary process.

How do we test the structural change (or break)? How do we model the change?

Know the date of event, for instance global financial crisis, 9/11, etc that changes the (econometric) system.

Suppose you suspect there is a structural change at a particular date. It is straight forward to use the Chow test.

(1) (2)

We use the data before the change, for example 9/11 or July 1997, to estimate Model 1 and data on and after 9/11 to estimate

Model 2.

vs is wrong.

8/12/2019 Lecture Note 11(2014 S)

38/42

38

, where is the sum of squared residuals,

Example:

We simulated 100 data points from , where In addition, we simulated another 113data points from where

In practice, we plot the graph.

We suspect there is a structural change at date 101 because after date 100, the y increases dramatically. In this simpleexercise, we also suspect that the model is the AR(2) because of the ac and pac graphs, (if we are lucky enough).

We can see the main difference between and after date 100, is (perhaps) only the intercept. There is a jump.

So we can run the regression using the AR(2) model for whole date set to get the . In addition, we run tworegression models. One uses 1-100 data points, and the other uses 101 -213 data points.

-30

-20

-10

0

10

y

0 50 100 150 200

time

8/12/2019 Lecture Note 11(2014 S)

39/42

39

The whole data set (1-213 points)

For Model (1)

_cons .0184165 .1335337 0.14 0.890 -.2448218 .2816548L2. .0716431 .0690029 1.04 0.300 -.064384 .2076702 L1. .9205832 .06888 13.37 0.000 .7847984 1.056368

yy Coef. Std. Err. t P>|t| [95 Conf. Interval]

Total 37383.9386 212 176.339333 Root MSE = 1.882 Adj R-squared = 0.9799 Residual 743.777018 210 3.54179533 R-squared = 0.9801 Model 36640.1616 2 18320.0808 Prob > F = 0.0000 F( 2, 210) = 5172.54 Source SS df MS Number of obs = 213. reg y l1.y l2.y

_cons -1.618651 .3933838 -4.11 0.000 -2.399617 -.8376855 L2. .211723 .0948349 2.23 0.028 .0234518 .3999942 L1. .7065436 .0993953 7.11 0.000 .509219 .9038683 y

y Coef. Std. Err. t P>|t| [95 Conf. Interval]Total 2024.5908 97 20.8720701 Root MSE = 1.0395 Adj R-squared = 0.9482

Residual 102.646597 95 1.0804905 R-squared = 0.9493 Model 1921.9442 2 960.972099 Prob > F = 0.0000 F( 2, 95) = 889.39 Source SS df MS Number of obs = 98. reg y l1.y l2.y if t

8/12/2019 Lecture Note 11(2014 S)

40/42

40

For Model (2)

, , We calculate the F statistic value and compare the value to the critical F value.

At 5 % significance level, .Therefore, we reject the null hypothesis, where

To use the Chow test, we need to specify the date for the structural change and to assume that the change fully manifests itself

at that date. But this may not always be appropriate, for example there is no particular data at which we can say that significant

climate change has occurred.

In addition, we need to have enough observations are included in each subsample. Otherwise, the estimated coefficients have

little precisions.

_cons 1.806572 .4558879 3.96 0.000 .9030167 2.710127L2. .0949843 .090681 1.05 0.297 -.0847425 .274711 L1. .6979468 .0948083 7.36 0.000 .5100398 .8858538

yy Coef. Std. Err. t P>|t| [95 Conf. Interval]

Total 405.868817 111 3.65647583 Root MSE = 1.0905 Adj R-squared = 0.6748 Residual 129.619879 109 1.1891732 R-squared = 0.6806 Model 276.248939 2 138.124469 Prob > F = 0.0000 F( 2, 109) = 116.15 Source SS df MS Number of obs = 112. reg y l1.y l2.y if t>(103)

8/12/2019 Lecture Note 11(2014 S)

41/42

41

We can use recursive estimation to detect if the estimated coefficients change abruptly.

/* note that STATA will clear all the result except the rolling estimates */

/* first set: 1- #, second set 1- (#+1), third set 1-(#+2), etc */

>rolling, recursive window(#) clear: regress y l1.y/* we need to retell STATA that this is time series data, as the previous data were cleared.*/ >tsset end /* we always use this

command */

>tsline coefficient name1, coefficient name2, etc

Example:

, where

In addition, we simulated another 113 data points from

,

t=1, ,200. Observations 1-100 was simulated from Model 1and the other observations were simulated from Model 2.

> In total, 131 set of estimates were estimated. Each time, the number of observations was increased by 1 every time.

> since all the results were eliminated except the 131 sets of estimates we need to use tsset command again. Then plot the intercept

estimates and slope estimates against time.

................................................................................. 100

.................................................. 50 1 2 3 4 5Rolling replications (131)(running regress on estimation sample). rolling, recursive window (70) clear: reg y l1.y

. tsline _b_cons _stat_1 delta: 1 unit time variable: end, 70 to 200. tsset end

8/12/2019 Lecture Note 11(2014 S)

42/42

42

The intercept estimates (_b[_cons]) change dramatically after 100 thdata point. This signals that there might be a structural

change.

Similarly, the slope estimates seems to increase after 100thdata point. But the change is roughly just within (0.75, 0.95).

The estimated intercepts are within (1.55, 2) for the first 30 sets of the estimates then after 100thpoint, the estimated intercept

converges to 0.5. This is because for the very last few set of the estimates, the model estimated the data simulated from a

model with an intercept 0.5

.5

1

1.

5

2

50 100 150 200end

_b[_cons] _b[L.y]

lecture note 11(2014 s)

Documents