chapter 4x

8/11/2019 Chapter 4X

1/96

4. Stochastic Time Series Model

In Chapter 2 and 3 we consider forecast models of the form

zt= f(xt;) +t

In these forecasting models it is usually assumed that the error t areuncorrelated, which in turn implies that the observationszt are uncorrelated.However this assumption is rarely met in Practice

Thus models that capture the correlation structure have to be considered.

Such as AR(P), MA(q), ARMA(p,q) and ARIMA(d,p,q) will be studied inthis chapter.

1


2/96

4.1 Stochastic Processes

An observed time series (z1, z2, . . . , zn) can be thought of as a particularrealization of a stochastic process.

Stochastic processes in general can be described by an n-dimensionalprobability distribution p(z1, z2, . . . , zn).

Assuming joint normality, then there are onlynobservations butn+(n+1)n/2unknown parameters. To infer such a general probability structure from justone realizations of the stochastic process will be impossible.

Hence Stationary Assumption is used to simplify the probability structure.

2


3/96

3


4/96

Stationary Stochastic Process

The joint distribution are stable with respect to time:

p(zt1, . . . , ztm) =p(zt1+k, . . . , ztm+k)

Autocorrelation Functions (ACF)

The autocovariances

k= Cov(zt, ztk) = E(zt )(ztk )

The autocorrelations

k= Cov(zt, ztk)

[Var(zt) Var(ztk)]12

=k0

4


5/96

Sample Autocorrelation Function (SACF)

rk=

nt=k+1

(zt z)(ztk z)

n

t=1(zt z)2

, k= 0, 1, 2, . . .

For uncorrelated observations, the variance ofrk is approximately given by

Var(rk) 1

n

5


6/96


7/96

7


8/96

4.2 Stochastic Difference Equation Models

A time series in which successive values are autocorrelated can be representedas a linear combination (linear filter) of a sequence of uncorrelated randomvariables. The linear filter representation is given by

zt = at+1at1+2at2+ =

j=0

jatj, 0= 1.

The random variable {at; t= 0,1,2, . . .} are a sequence of uncorrelatedrandom variable from a fixed distribution with mean E(at) = 0, Var(at) =E(a2t ) =

2, and Cov(at, atk) = E(akatk) = 0 for all k = 0.

Ezt= , 0= E(zt )2

=2

j=0

2

j ,

k = E[(zt )(ztk )] =2

j=1

jj+k,

k =

j=0

jj+k /

j=0

2j . 8


9/96

Autoregressive Processes

First-Order Autoregressive Process (AR(1))

(zt ) (zt1 ) =at (1 B)(zt ) =at

We have

zt= (1B)1at= (1+B +

2B2+ )at= at+at1+2at2+

Application of the general results for linear filter lead to

0 =

2j=0

2

j =

j=0

2j

=

2

1 2

k = 2

j=0

jj+k=2

j=0

jj+k = 2k

1 2=k0, k= 0, 1, 2, . . .

k = k, k= 0, 1, 2, . . .

9


10/96

Another approach to get the ACF for AR(1). This approach is more suitablefor high order situations. Without loss of generality, we assume = 0 andzt= zt1+at, then we have

k= E(ztztk) =E(zt1ztk) + E(atztk) =k1+ E(atztk)

where

k= 0, E(atzt) = E[at(at+at1+2at2+ )] =

2

k >0, E(atztk) = 0.

Thus

0 = 1+2 =1+

2

k = k1, (k >1), (1=0)

0 = 20+

2 = 2

1 2,

k = k0,

k =

k

10


11/96

Second-Order Autoregression Process [AR(2)]

zt= 1zt1+2zt2+at

(1 1B 2B2)zt= at

and

zt= 1

1 1B 2B2at= (1 +1B+2B2 + )at

The coefficients (1, 2, . . .) could be determined as follows:

1 = (1 1B 2B2)(1 +1B+2B

2 + ),

B : 1 1= 0 1=1;B2 : 2 12 21= 0 2=

21+2;

B3 : 3 12 21= 0 3= 31+ 212;

j = 1j1+2j2, j 2.

11


12/96

12


13/96

Stationarity

AR(p) : (1 1B 2B2 pB

p) = (1 G1B) (1 GpB).

The stationarity relates to the conditions: |G1i | >1, i= 1, 2, . . . , p ..Particulary for AR(2), the stationarity corresponds to

1+2


14/96

Actually the general solution of the difference equation:

(1 1B 2B

2

)k= 0

is

k=

A1Gk1+A2G

k2 (G1 =G2),

(A1+A2k)Gk, (G1= G2=G),

{A1 sin(k) +A2 cos(k)}(p2 +q2)k/2, p qi = G11,2,= arcsin(p2/

p2 +q2)

So, when stationary, the correlogram shows damped exponentials/sinusoidalwaves.

14


15/96

15


16/96

The general AR(p) model

0=11+ +pp+2 =

2

1 11 pp=

2

(B)0

k = 1k1+ +pkp, k >0

k = 1k1+ +pkp.

Letting k= 1, 2, . . . , p, we have the Yule-walker equations:

= P,

P =

1 1 2 p1

1 1 1 p2... . . . . . . ... ...

p1 p2 p3 1

Moments Methods:

= P1

16


17/96

Partial Autocorrelations

When considering two random variables correlation coefficient, it would be fairif we could get ride of the influence of other random variables related to them.In theAR(p) model: assuming the stability

zt=k1zk1+k2zt2+ +kkztk+at zt= at

implies (B) has all the roots outside the unit circle. Therefore |kk|


18/96

AR(p): Using Yule-walker Equations

12...

k

=

1 1 k11 1 k2... ... . . . ...

k1 k2 1

k1k2

...kk

kk =

1 1 k2 11 1 k3 2... ... . . . ... ...

k1 k2 1 k

1 1 k11 1 k2... ... . . . ...

k1 k2 1

1

The basic features of AR(p):

ACF: Combinations of damped exponentials/sinusodal waves; PACF: zero for lags larger than p.

18


19/96

Sample Partial Autocorrelation Functions (SPAF)

1. Use the sample correlation coefficient rj in place of j and solve theYule-Walker equation for kk. This scheme is impractical if lag k is large.

2. Recursive forms:

kk =

rk k1

j=1k1,jrkj

1k1j=1

k1,jrj,

k,j = k1,j kkk1,kj, j = 1, 2, . . . , k 1.

3. Standard Errors ofkkIf the data follow an AR(p) process, then for lags greater than p the

variance ofkk can be approximated by

Var() n1 fork > p.

Approximate standard error for kk are given by n1/2. 19


20/96

20


21/96

21


22/96

Moving Average MA(q) Model

zt = (1 1B 2B2 qBq)at= (B)at

Autocovariances and Autocorrelations

0 = (1 +21+ +

2q)

2

k = (k+1k+1+ +qkq)2, k= 1, 2, . . . , q

k = 0, k > q

k = k+1k+1+ +qkq

1 +21+ +2q

, k= 1, 2, . . . , q

k = 0, k > q

Partial Autocorrelation Function for MA Process

The ACF of a moving average process of order q cuts off after lag q(k= 0, k > q). However, its PACF is infinite in extent (it tails off).

In General, the PACF is dominated by combinations of damped exponentials

and/or damped sine waves. 22


23/96

Example: MA(1) Model: zt = at at1

0 = (1 +2)2

1 = 2

1 =

1 +2; (|| 1

Invertibility condition: since both (, 1/) are roots of the equation

21++1= 0,

so we restrict ourselves to the situation with ||


24/96

24


25/96

Autoregressive Moving Average (ARMA(p,q)) Processes

(1 1B pBp)(zt ) = (1 1B qBq)at

or

zt = 1(zt1 ) + +p(ztp )

+ at 1at1 qatq

k > qp, ACF dominated by AR part

k > p q, PACF dominated by MA part

SummaryThe ACF and PACF of a mixed ARMA(p,q) process are both infinite in extentand tail off (die down) as the lag k increases. Eventually (for k > qp), theACF is determined from the autoregressive part of the model. The PACF iseventually (for k > p q) determined from moving average part of model.

25


26/96

ARMA(1,1) Process

(1 B)(zt ) = (1 B)at

ACF:

0 = 1 +2 2

1 2 2,

1 = (1 )( )

1 2 2,

1 = (1 )( )

1 +2

2k = k1, k >1.

PACF: 11=1, then similar to MA(1)

26


27/96


28/96

4.3 Nonstationary processes

Nonstationary, Differencing, and Transformations

M1: zt=0+1t+atM2: (1 B)zt= 1+at orzt=Zt1+1+at.M3: (1 B)2zt=at orzt= 2zt1 zt2+at.

Nonstationary Variance

Examples: 1. Growth rates. 2. Demand for repair parts.

28


29/96

29


30/96

30


31/96

31


32/96

32


33/96

An Alternative Argument for Differencing

Let us assume that data zt are described by a nonstationary level

t= E(zt) =0+1t

and that we wish to entertain an ARMA model for a function ofzt that has a

level independent of time. Consider the model p+1(B)zt= 0+(B)at

0 represent the level of p+1(B). To achieve a 0 that is constant,independent of time,

p+1(B) = (1B)(B)and

0=p+1(B)(0+1t) =(B)(1B)(0+1) =(B)1

33


34/96

Ift= 0+ 1t + + dtd, we have to choose the autoregressive operator

in p+d(B)zt= 0+(B)at such thatp+d(B) = (1 B)d(B)

and then

0 = p+d(B)(0+1t+ +dtd)

= (B)(1 B)d(0+1t+ +dtd)

= (B)dd!.

34


35/96

Autoregressive Integrated Moving Average (ARIMA) Models

Transform a nonstationary series zt into a stationary one by consideringrelevant differenceswt=

dzt= (1 B)dzt.

Then use the ARMA models to describe wt. The corresponding model canbe written as

(B)wt=0+(B)at

or(B)(1 B)dzt= 0+(B)at

ARIMA(p,d,q) The order p,d,q are usually small. The model is called

integrated, since zt can be thought of as the summation (integration) ofthe stationary series wt. For example, ifd= 1

zt= (1B)1wt=

t

k=wk

35


36/96

Choice of the proper degree of differencing

Plot of the series

Examination of the SACFA stationary AR(p) process requires that all Gi in

(1 1B pBp) = (1 G1B) (1 GpB)

are such that |Gi| < 1 (i = 1, 2, . . . , p), Now assume one of them, say G1,approaches 1, that is G1 = 1 , where is a small positive number. Thenthe autocorrelations

k=A1Gk1+ +ApGkp =A1Gk1.

So the exponential decay will be show and almost linear [i.e. A1Gk1 =

A1(1 )k=A1(1 k)]

36


37/96

Consider the ARMA(1,1) model

(1 B)zt= (1 B)at

The ACF of this process is given by

1 = (1 )( )

1 +2 2

k = k1

If we let 1, then k 1 for any finite k >0. Again the autocorrelation

decay very slowly when is near 1.

It should be pointed out that the slow decay in the SACF can start at valuesofr1 considerably smaller that 1.

37


38/96

Dangers of Overdifferencing:

Although further differences of a stationary series will again be stationary,

overdifferencing can lead to serious difficulties.

Effect of overdifferencing on the autocorrelation structure.

Consider the stationary MA(1) process zt= (1 B)at.The autocorrelation function of this process is given by

k=

1+2

, k= 1

0, k >1

The first difference of the process is

(1B)zt= (1 (1 +)B+B2

)at

and the autocorrelation of this difference is given by

k=

(1+)2

2(1++2), k= 1

2

2(1++2), k= 2

0, k >2 38


39/96

Effect of overdifferencing on variance.

Consider the same MA(1) process. Its variance is given by

0(z) = (1 +)22

The variance of the overdifferenced series wt = (1 B)zt, which follows

an MA(2) process, is given by

0(w) = 2(1 ++2)2

Hence

0(w) 0(z) = (1 +2

)2

>0

39


40/96

Considered the stationary process AR(1) process (1B)zt = at with

variance0=

2

1 2

The first difference wt follows the ARMA(1,1) process (1 B)wt = (1 B)at. The variance of this process is given by

0(w) =2(1 )2

1 2

and

0(w) 0(z) =(1 2)2

1 2

Hence when < 12, overdifferencing will increasing the variance.

40


41/96

Regression and ARIMA Models

ARIMA(0,1,1) model (1 B)zt= (1 B)a

t

Since(B) = (1 B)/(1B) = 1 + (1 )B + (1 )B2 + , we canrelated the variable zt+l at time t+l to a fixed origin t and express it as

zt+l=(t)0 +et(l)

whereet(l) =at+l+(1)[at+l1+ +at+1]is a function of the random

shocks that enter the system after time t, and (t)0 = (1 )t

j= ajdepends on all the shock (observations) up to and including time t.

The level (t)0 can be updated according to

(t+1)0 =

(t)0 + (1 )at+1

In the special case when = 1,et(l) =at+land(l+1)0 =

(t)0 =0. In this

case, the model is equivalent to the constant mean model zt+l=0+at+l.

41


42/96

ARIMA(0,2,2) model (1 B)2zt+l= (1 1B 2B2)at+l

Using an argument similar to that followed for ARIMA(0,1,1) model, theARIMA(0,2,2) can be written as

zt+l=(t)0 +

(t)1 l+et(l)

where et(l) are correlated and given by

et(l) =at+l+1at+l1+ +l1at+1

where the weights may be obtained from

(1 +1B+2B2 + )(1B)2 = 1 1B 2B

2

These weights can be given as j = (1 +2) +j(1 1 2), j 1.

42


43/96

Furthermore, the trend parameter (t)0 and

(t)1 adapt with time according

to

(t+1)0 =

(t)0 +

(t)1 + (1 +2)at+1

(t+1)1 =

(t)1 + (1 1 2)at+1

Again the model above can be thought of as a generalization of thedeterministic trend regression model

zt+l=0+1l+at+l

In the special case when 1 2 and2 1, the two models above areequivalent

43


44/96

44


45/96

45


46/96

4.4 Forecasting

ARIMA model (B)(1B)dzt=(B)at

Autoregressive representation

zt=1zt1+2zt2+ +at

where the weights are given by

1 B B2 =(B)(1 B)d

(B)

l-step forecastzl=0zn+1zn1+2zn2+

orzl=0an+1an1+2an2+

46


47/96

Minimum mean square error (MMSE) forecasts: The mean square errorE[zn+l zn(l)]

2 is minimized.

Using the weights in (B) = (B)1(B)(1 B)d to write the modelin its linear filter representation

zn+l=an+l+1an+l1+ +l1an+1+lan+l+1an1+

Then the mean square error is expressed as

E[zn+l zn(l)] = E[an+l+an+l1+ +l1an+1+ (l 0)an

+(l+1 1)an1+ ]2

= (1 +21+ +2l1)

2 +j=0

(l+j j)22.

47


48/96

The minimum mean square error (MMSE) forecast

zn(l) =lan+l+1an1+

The forecast error is given by

en(l) =zn+l zn(l) =an+l+1an+l1+ +l1an+1

and its variance by

Var[en(l)] =2(1 +21+

22+ +

2l1)

E(an+j|zn, zn1, . . .) =

an+j j 00 j >0

Hence

E(zn+l|zn, zn1, . . .) = E[(an+l+1an+l1+ +l1an+1

+lan+l+1an1+ )|zn, zn1, . . .]

= lan+l+1an1+ 48


49/96

Examples:

AR(1) process: zt = (zt1 ) +at

AR(2) process: zt=1zt1+2zt2+at

ARIMA(0,1,1) process: zt=zt1+at at1

ARIMA(1,1,1) process: (1 B)(1 B)zt= 0+ (1 B)at or

zt= 0+ (1 +)zt1 zt2+at at1

ARIMA(0,2,2) process: (1B)

2

zt= (1 1B 2B

2

)at or

zt= 2zt1 zt2+at 1at1 2at2

49


50/96

Prediction Limits

It is usually not sufficient to calculate point forecast alone. The uncertaintyassociated with a forecast should be expressed, and the prediction limitsshould always be computed.

It is possible, but not very easy, to calculate a simultaneous predictionregion for a set of future observations. Therefore, we give probability limits

only for individual forecasts.

Assuming that ats (and hence the zs) are normally distributed, we cancalculate 100(1 ) percent prediction limits form

zn(l) u/2{V(en(l))}

1/2

where u/2 is the 100(1 /2) percentage point of standard normaldistribution.

50


51/96

Because zn(l) andV(en(l)) are unknown exactly, and need estimate. Hence

the estimated prediction limits are given by

zn(l) u/2{V(en(l))}1/2

Consider AR(1) process for Yield data, we found z156(1) = 0.56, z156(2) =

0.62, z156(3) = 0.68, V[e156(1)] = 0.024, V[e156(2)] = 0.041, V[e156(1)] =0.054. Thus the 95 percent prediction limits forz156+l, l= 1, 2, 3 are givenby

z157: z156(1) 1.96{V[e156(1)]}1/2 =.56 1.96(.15) =.56 .29

z158: z156(2) 1.96{V[e156(2)]}1/2

=.62 1.96(.20) =.62 .39z159: z156(3) 1.96{V[e156(3)]}

1/2 =.68 1.96(.23) =.68 .45

51


52/96

Forecast Updating

Suppose we are at time n and we are predicting l+ 1 steps ahead. Theforecast is given by

zn(l+ 1) =l+1an+l+2an1+

Afterzn+1has become available, we need to update our prediction ofzn+l+1.On the other handzn+1(l) can be written as

zn+l=lan+1+l+1an+l+2an1+

we find that

zn+1(l) =zn(l+ 1) +lan+1= zn(l+ 1) +l(zn+1 zn(l))

The updated forecast is a linear combination of the previous forecast ofzn+l+1 made at time n and the most recent one-step-ahead forecast error

en(l) =zn+1 zn(l) =an+1. 52


53/96

Use Yield data example as shown as before. We already obtained the forecastforz157, z158, z159 from time origin n= 156. Suppose now that z157= 0.74 isobserved and we need to update the forecasts for z158 andz159.

Since the model under consideration is AR(1) with the parameter estimate= 0.85, the estimated weights are given by j =

j = 0.85j.

Then the updated forecasts are given by

z158: z156(2) + 1[z157 z156(1)] = 0.62 + 0.85(0.74 0.56) = 0.77

z159: z156(3) + 2[z157 z156(1)] = 0.62 + 0.852(0.74 0.56) = 0.81

The updated 95 percent prediction limits for z158 andz159 become

z158: z157(1) 1.96{V[e157(1)]}1/2 =.77 .29

z159: z157(2) 1.96{V[e157(2)]}1/2 =.81 .39

53


54/96

4.5 Model Specification

Step 1. Variance-stabilizing transformations. If the variance of the series changeswith the level, then a logarithmic transformation will often stabilize thevariance. If the logarithmic transformation still not stabilize the variance, amore general approach and employ the class of power transformations canbe adopted.

Step 2. Degree of differencing. If the series or its appropriate transformation is notmean stationary, then the proper degree of differencing has to determined.For the series and its differences, we can examine

1. Plots of the time series

2. Plot of the SACF3. Sample variances of the successive differences

54


55/96


56/96

56


57/96

4.6 Model Estiamtion

After specifying the form of the model, it is necessary to estimate its parameters.Let (z1, z2, . . . , zN) represent the vector of the original observations and w =(w1, . . . , wn)

the vector of the n= N d stationary differences.

Maximum Likelihood Estimates

The ARIMA model can be written as

at= 1at1+ +qatq+wt 1wt1 pwtp

The joint probability density function ofa= (a1, a2, . . . , an)

is given by

p(a|,,2) = (22)n/2 exp

1

22

nt=1

a2t

57


58/96


59/96

Exact likelihood function for three important models

1. AR(1): at= wt wt1

L(, 2|w) = (22)n/2(1 2)1/2

exp

1

22

(1 2)w21+

nt=2

(wt wt1)2

2. MA(1): at= at1+wt

L(, 2|w) = (22)n/2

1 2

1 2(n+1)

1/2exp

1

22

nt=0

E2(at|w)

3. ARMA(1,1): at= at1+wt wt1

L(,,2|w) = (22)n/2|ZZ|1/2 exp

1

22S(, )

where |ZZ| =

(1 2)(1 2) + (1 2n)( )2

(1 2)(1 2)

and

S(, ) = E2(a0|w) + 1 2

( )2[E(w0|w) E(a0|w)]2 +

nt=1

E2(at|w) 59


60/96

Maximum likelihood Estimate (MLE) of parameters (,,2) for theARIMA model can be obtained by maximizing the likelihood function. Ingeneral, closed-form solutions cannot be found. However, various algorithms

are available to compute the MLEs or close approximations.

1. AR(1): Difficult to get closed-form solution, and iterative methods have tobe employed.

2. MA(1): Need to calculate E(at|w), t= 1, 2, . . . , n. These can be calculatedusing the recurrence relation

E(at|w) =E(at1|w) +wt, t= 1, 2, . . . , n

provideE(a0|w). The difficulties of obtaining the MLE are thus twofold: (1)Calculate E(a0|w) and (2) the likelihood function is a complicated functionof.

3. ARMA(1,1) Similar as MA(1), to calculate the likelihood function, we needE(a0|w) andE(w0|w). From these we can calculate

E(at|w) =E(at1|w) +wt E(wt1|w) 60


61/96

In summary, there are two major difficulties with maximum likelihoodestimation approach.

1. The present of the function g1(,,2) in likelihood function makes themaximization difficult.

2. This approach requires the calculation of the conditional expectationsE(u|w), where u = (u1pq, . . . , u1, u0)

. This in turn requires the

calculation of

E(at|w), t= 1 q , . . . ,1, 0 and E(wt|w), t= 1p, . . . ,1, 0

61


62/96


63/96

Evaluation of the Initial ExpectationThere are two methods available for the evaluation of E(ut|w)(t 0), themethod of least squares and backforecasting.

Method of least squares.

For the ARMA(1,1) model, it is true in general that E(u|w) =u where uis the least squares estimator ofu in

Lw= Zu+u

where Land Zare(n +p + q) nand(n +p + q) (p + q)matrices involvingthe parameters and (For detail See Appendix 5 in the textbook).

For given values of(, ), we can calculate u, and hence S(, ). Then wecan search for a minimum value ofS(, ) in the space of(, )

63


64/96

Method of backforecasting.

The MMSE forecast of zn+l made at time n is given by the conditionalexpectation E(zn+l|zn, zn1, . . .). This result may be used to computethe unknown quantities E(at|w)(t = 1 q , . . . ,1, 0) and E(wt|w)(t =1p, . . . ,1, 0).

The MMSE forecast of at and wt(t 0) can be found by considering the

backward model

et=1et+1+2et+2+ +qet+q+wt 1wt+1 pwt+p.

The ACF of this backward model is same as the original model. Furthermore,

et is a white-noise sequence with Var(et) =2

.

This method of evaluating E(at|w)andE(wt|w)is calledbackforecasting(or backcasting).

64


65/96

The method of backforecasting with a simple MA(1) process at =at1+wt

This process can be expressed in the backward form as

et= et+1+wt

When the process is going backwards, then e0, e1, . . . are random shocksthat are future to the observations wn, wn1, . . . , w1. Hence,

E(et|w) = 0, t 0

Furthermore, since the MA(1) process has a memory of only one period, thecorrelation between a1 andw1, w2, . . . , wn is zero, and

E(at|w) = 0, t 1

65


66/96


68/96

68


69/96

Conditional Least Squares Estimates

Computationally simpler estimates can be obtained by minimizing theconditional sum of squares Sc(, ) =

nt=p+1

a2t , where the starting values

ap, ap1, . . . , ap+1q in the calculation of the appropriate ats are set equalto zero. The resulting estimates are called conditional least squaresestimates(CLSE).

69


70/96

In the AR(p) case, the CLSE is obtained by minimizing

Sc() =

nt=p+1

a2t =

nt=p+1

(wt 1wt1 pwtp)2

which leads to the least square estimate

c= (XX)1Xy

wherey= (wp+1, wp+2, . . . , wn)

and

X=

wp wp1 w1wp+1 wp w2

... ... ...wn1 wn2 wnp

The special case of AR(1) process 70


71/96

Nonlinear Estimation

If the model contains moving average terms, then the conditional andunconditional sums of squares are not quadratic functions of the parameters(, ). This is becauseE(ut|w)in S(, )is nonlinear in the parameters. Hencenonlinear least squares procedures must be used to minimizeS(, )orSc(, ).

The nonlinear estimation procedure and drive the unconditional least

squares estimates of the parameters in the ARMA(1,1) model

at= at1+wt wt1

71


72/96

Step 1. Computation at the starting point. For given w and a startingpoint(, ) = (0, 0), using backforecasting or the method of least squares, wecan get

u10= E(a0|w, 0, 0)and

u00=(1 0)

1/2

0 0[E(w0|w, 0, 0) E(a0|w, 0, 0)]

The remaining elements in u0= (u10,u00,u10, . . . ,un0) can be obtained from

ut0= E(at|w, 0, 0) fort= 1, 2, . . . , n.Step 2. Linearization. Regard ut = E(at|w, , )(t = 1, 0, 1, , n) as a function of = (, ) and linearize this function at the starting point0= (0, 0)

in a first-order Taylor series expansion:

ut = ut0+ ( 0)ut=0

+ ( 0)ut=0

= ut0 x1,t( 0) x2,t( 0)

where

x1,t=

ut

=0

and x2,t=

ut

=0

72


73/96

So we have

ut0= (x1,t, x2,t)

0

0

+ ut

Considering this equation for t= 1, 0, . . . , n, we obtain

u0=X( 0) +u

where u= (u1,u0, . . . ,un) and

X=

x1,1 x2,1x1,0 x2,0

... ...x1,n x2,n

The least squares estimate of = 0 may be obtained from =(XX)1Xu0. Hence we obtain a modified estimator of as

1= 0+

73


74/96

Step 3. Repeat step 1 and step 2 with 1 as the new point of expansion in theTaylor series and continue the process until convergence occurs. Convergenceis usually determined by one of the following criteria:

1. The relative change in the sum of squaresn

t=1u2t from two consecutive

iteration is less than a prespecified small value.

2. The largest relative change in the parameter estimate of two consecutive

iterations is less than a prespecified small value.

When convergence occurs, we have an estimate based on local linear leastsquares, and this will be taken as the unconditional least squares estimate of.The conditional least squares estimate can be obtained in a similar fashion by

replacing the vector u bya= (a2, a3, . . . , an) and settinga1= 0.

74


75/96


76/96


77/96

For small k, the true standard error can be much smaller. In general, thetrue standard error depend on

1. the form of the fitted model,2. the true parameter values,3. the value ofk

77


78/96

Portmanteau Test

Under null hypothesis of model adequacy, the large-sample distribution ofrais multivariate normal and that

Q =nKk=1

r2a(k)

has a large-sample chi-square distribution withKp qdegrees of freedom.

For small sample size, this test is quite conservative; that is, the chance ofincorrectly rejecting the null hypothesis of model adequacy is smaller thanthe chosen significance level. The modified test statistics is

Q= n(n+ 2)

Kk=1

r2a(k)

n k

78


79/96

4.8 Examples

Yield dataThis series consists of monthly differences between the yield on mortgagesand the yield on government loans in the Netherlands from January 1961 toDecember 1973. The observations for January to March 1974 are also available,but they are held back to illustrating the forecasting and updating.

Figures: Time series plot and SACF

AR(1) modelzt = (zt1 ) +at

The maximum likelihood estimates are

= 0.97(0.08), = 0.85(0.04), 2 = 0.024

79


80/96

80


81/96


82/96

Forecasting

z156(1) = +(z156 ) =.97 +.85(.49 .97) =.56

Prediction limits

z156(1) 1.96= 0.56 1.96(.024)1/2 = 0.56 0.29

z156(2) = +2(z156 ) =.97 +.852(.49 .97) =.62

Prediction limits

z156(2) 1.96(1 + 21)

1/2 = 0.62 1.96(.024)1/2(1 +.852)1/2 = 0.62 0.39

z156(3) = +3(z156 ) =.97 +.853(.49 .97) =.68

Prediction limits

z156(3) 1.96(1 + 21+

22)1/2 = 0.68 1.96(.024)1/2(1 +.852 +.854)1/2

= 0.68 0.45 82


83/96

Suppose z157= 0.74 has now become available. Forecasts for z158 andz159are updated according to

z158: z157(1) = z156(2) + 1[z157 z156(1)] = 0.62 +.85(.74 .56) =.77

z159: z157(2) = z156(3) + 2[z157 z156(1)] = 0.62 + .852(.74 .56) =.81

The prediction interval forz158 interval forz158 is0.77 0.29, and that forz159 is 0.81 0.39.

83


84/96

Growth Rates

Quarterly growth of Iowa nonfarm income from the second quarter of 1948

to the fourth quarter of 1978 (N= 123) are shown in Figure 5.8a

The plot of the series indicates that no variance-stabilizing transformation isnecessary.

zt is nonstationary and the first difference zt should be analyzed. Thesecond difference is also stationary but would lead to overdifferencing.This is also indicated by the increase in the sample variance [s2(zt) =1.58, s2(2zt) = 4.82].

The SACF ofzt in Figure 5.11b indicates a cutoff after lag 1. The SPACFin Figure 5.15 shows a rough exponential decay. These patterns lead us toconsider the model

zt= 0+ (1 B)at

84


85/96

85


86/96


87/96

87


88/96

Foreasting

The predictions z123() are described below

z123() = z123(1) =z123 a123= 3.38 .88(.718) = 2.75

Note that a123= .718 is the last residual.

The estimated variance of the -step-ahead forecast error is

V[e123()] = [1 + ( 1)(1 )2]2 = [1 + ( 1)(1 0.88)2](.95)

after a new observationz124= 1.55has become available, update the forecast

z124() = z123(+1)+[z124 z123] = 2.75+(10.88)(1.552.75) = 2.61

Furthermore, after z125= 2.93 has become available,

z125() = z124(+1)+[z125 z124] = 2.61+(10.88)(2.932.61) = 2.65

88


89/96

89


90/96

Demand for Pepair Parts This series consists of the monthly demand forrepair parts for a large Midwestern production company from January 1972 toJune 1979 (N= 90). The series is shown in Figure 5.9a.

A logarithmic transformation need to stabilize the variance and that firstdifferences lnZt are stationary. Further differencing is not necessary sinceit results in an increase in the sample variance [s2(zt) =.034, s

2(2lnzt) =.099]

The sample autocorrelation and partial autocorrelation are shown in Figure5.12b and 5.17. They indicate a cutoff in the SACF after lag 1 and a roughexponential decay in the SPACF. Thus we consider the ARIMA(0,1,1) model

(1 B)yt= (1 B)at

where yt= ln zt. The ML estiamtes are

= 0.57(.09),2 = 0.27

90


91/96

91


92/96


93/96

93

F i


94/96

Forecasting

The forecasting model is given in terms of the transformed yt = ln zt. The

MMSE forecasts y() and the associated prediction interval c1, c2 for yn+can be derive as usual.

However, to derive the forecasts of original observation zn+ = exp(yn+),we have to back transform the forecasts and the prediction interval. They

are given by

zn() = exp[yn()] and [exp(c1), exp(c2)]

The forecasts of the transformed seriesyn+are MMSE forecasts. However if

the transformation is nonlinear, as in the case the logarithmic transformation,the MMSE will not preserved for zn() = exp[yn()].

Nevertheless, the forecast zn() will be the median of the predictivedistribution ofzn+.

94


95/96

For the present example, the prediction y90() and the corresponding 95percent prediction limits are given by

y90() = y90(1) =y90 a90= 7.864 .57(.234) = 7.731

and

y90()1.96[1+(1)(1)2]1/2= 7.7311.96[1+(1)(1.57)2]1/2(.027)1/2.

Forecasts and 95 percent prediction interval s for y91, y92 and z91, z92 aregiven in Table 5.6.

The prediction intervals are quite large. This is due partially to the largeuncertainty (2) in the past data and partly to the unexplained correlationat lag 12.

95


96/96

chapter 4x

Documents