chapter 4x

Upload: andrei-firte

Post on 02-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Chapter 4X

    1/96

    4. Stochastic Time Series Model

    In Chapter 2 and 3 we consider forecast models of the form

    zt= f(xt;) +t

    In these forecasting models it is usually assumed that the error t areuncorrelated, which in turn implies that the observationszt are uncorrelated.However this assumption is rarely met in Practice

    Thus models that capture the correlation structure have to be considered.

    Such as AR(P), MA(q), ARMA(p,q) and ARIMA(d,p,q) will be studied inthis chapter.

    1

  • 8/11/2019 Chapter 4X

    2/96

    4.1 Stochastic Processes

    An observed time series (z1, z2, . . . , zn) can be thought of as a particularrealization of a stochastic process.

    Stochastic processes in general can be described by an n-dimensionalprobability distribution p(z1, z2, . . . , zn).

    Assuming joint normality, then there are onlynobservations butn+(n+1)n/2unknown parameters. To infer such a general probability structure from justone realizations of the stochastic process will be impossible.

    Hence Stationary Assumption is used to simplify the probability structure.

    2

  • 8/11/2019 Chapter 4X

    3/96

    3

  • 8/11/2019 Chapter 4X

    4/96

    Stationary Stochastic Process

    The joint distribution are stable with respect to time:

    p(zt1, . . . , ztm) =p(zt1+k, . . . , ztm+k)

    Autocorrelation Functions (ACF)

    The autocovariances

    k= Cov(zt, ztk) = E(zt )(ztk )

    The autocorrelations

    k= Cov(zt, ztk)

    [Var(zt) Var(ztk)]12

    =k0

    4

  • 8/11/2019 Chapter 4X

    5/96

    Sample Autocorrelation Function (SACF)

    rk=

    nt=k+1

    (zt z)(ztk z)

    n

    t=1(zt z)2

    , k= 0, 1, 2, . . .

    For uncorrelated observations, the variance ofrk is approximately given by

    Var(rk) 1

    n

    5

  • 8/11/2019 Chapter 4X

    6/96

  • 8/11/2019 Chapter 4X

    7/96

    7

  • 8/11/2019 Chapter 4X

    8/96

    4.2 Stochastic Difference Equation Models

    A time series in which successive values are autocorrelated can be representedas a linear combination (linear filter) of a sequence of uncorrelated randomvariables. The linear filter representation is given by

    zt = at+1at1+2at2+ =

    j=0

    jatj, 0= 1.

    The random variable {at; t= 0,1,2, . . .} are a sequence of uncorrelatedrandom variable from a fixed distribution with mean E(at) = 0, Var(at) =E(a2t ) =

    2, and Cov(at, atk) = E(akatk) = 0 for all k = 0.

    Ezt= , 0= E(zt )2

    =2

    j=0

    2

    j ,

    k = E[(zt )(ztk )] =2

    j=1

    jj+k,

    k =

    j=0

    jj+k /

    j=0

    2j . 8

  • 8/11/2019 Chapter 4X

    9/96

    Autoregressive Processes

    First-Order Autoregressive Process (AR(1))

    (zt ) (zt1 ) =at (1 B)(zt ) =at

    We have

    zt= (1B)1at= (1+B +

    2B2+ )at= at+at1+2at2+

    Application of the general results for linear filter lead to

    0 =

    2j=0

    2

    j =

    j=0

    2j

    =

    2

    1 2

    k = 2

    j=0

    jj+k=2

    j=0

    jj+k = 2k

    1 2=k0, k= 0, 1, 2, . . .

    k = k, k= 0, 1, 2, . . .

    9

  • 8/11/2019 Chapter 4X

    10/96

    Another approach to get the ACF for AR(1). This approach is more suitablefor high order situations. Without loss of generality, we assume = 0 andzt= zt1+at, then we have

    k= E(ztztk) =E(zt1ztk) + E(atztk) =k1+ E(atztk)

    where

    k= 0, E(atzt) = E[at(at+at1+2at2+ )] =

    2

    k >0, E(atztk) = 0.

    Thus

    0 = 1+2 =1+

    2

    k = k1, (k >1), (1=0)

    0 = 20+

    2 = 2

    1 2,

    k = k0,

    k =

    k

    10

  • 8/11/2019 Chapter 4X

    11/96

    Second-Order Autoregression Process [AR(2)]

    zt= 1zt1+2zt2+at

    (1 1B 2B2)zt= at

    and

    zt= 1

    1 1B 2B2at= (1 +1B+2B2 + )at

    The coefficients (1, 2, . . .) could be determined as follows:

    1 = (1 1B 2B2)(1 +1B+2B

    2 + ),

    B : 1 1= 0 1=1;B2 : 2 12 21= 0 2=

    21+2;

    B3 : 3 12 21= 0 3= 31+ 212;

    j = 1j1+2j2, j 2.

    11

  • 8/11/2019 Chapter 4X

    12/96

    12

  • 8/11/2019 Chapter 4X

    13/96

    Stationarity

    AR(p) : (1 1B 2B2 pB

    p) = (1 G1B) (1 GpB).

    The stationarity relates to the conditions: |G1i | >1, i= 1, 2, . . . , p ..Particulary for AR(2), the stationarity corresponds to

    1+2

  • 8/11/2019 Chapter 4X

    14/96

    Actually the general solution of the difference equation:

    (1 1B 2B

    2

    )k= 0

    is

    k=

    A1Gk1+A2G

    k2 (G1 =G2),

    (A1+A2k)Gk, (G1= G2=G),

    {A1 sin(k) +A2 cos(k)}(p2 +q2)k/2, p qi = G11,2,= arcsin(p2/

    p2 +q2)

    So, when stationary, the correlogram shows damped exponentials/sinusoidalwaves.

    14

  • 8/11/2019 Chapter 4X

    15/96

    15

  • 8/11/2019 Chapter 4X

    16/96

    The general AR(p) model

    0=11+ +pp+2 =

    2

    1 11 pp=

    2

    (B)0

    k = 1k1+ +pkp, k >0

    k = 1k1+ +pkp.

    Letting k= 1, 2, . . . , p, we have the Yule-walker equations:

    = P,

    P =

    1 1 2 p1

    1 1 1 p2... . . . . . . ... ...

    p1 p2 p3 1

    Moments Methods:

    = P1

    16

  • 8/11/2019 Chapter 4X

    17/96

    Partial Autocorrelations

    When considering two random variables correlation coefficient, it would be fairif we could get ride of the influence of other random variables related to them.In theAR(p) model: assuming the stability

    zt=k1zk1+k2zt2+ +kkztk+at zt= at

    implies (B) has all the roots outside the unit circle. Therefore |kk|

  • 8/11/2019 Chapter 4X

    18/96

    AR(p): Using Yule-walker Equations

    12...

    k

    =

    1 1 k11 1 k2... ... . . . ...

    k1 k2 1

    k1k2

    ...kk

    kk =

    1 1 k2 11 1 k3 2... ... . . . ... ...

    k1 k2 1 k

    1 1 k11 1 k2... ... . . . ...

    k1 k2 1

    1

    The basic features of AR(p):

    ACF: Combinations of damped exponentials/sinusodal waves; PACF: zero for lags larger than p.

    18

  • 8/11/2019 Chapter 4X

    19/96

    Sample Partial Autocorrelation Functions (SPAF)

    1. Use the sample correlation coefficient rj in place of j and solve theYule-Walker equation for kk. This scheme is impractical if lag k is large.

    2. Recursive forms:

    kk =

    rk k1

    j=1k1,jrkj

    1k1j=1

    k1,jrj,

    k,j = k1,j kkk1,kj, j = 1, 2, . . . , k 1.

    3. Standard Errors ofkkIf the data follow an AR(p) process, then for lags greater than p the

    variance ofkk can be approximated by

    Var() n1 fork > p.

    Approximate standard error for kk are given by n1/2. 19

  • 8/11/2019 Chapter 4X

    20/96

    20

  • 8/11/2019 Chapter 4X

    21/96

    21

  • 8/11/2019 Chapter 4X

    22/96

    Moving Average MA(q) Model

    zt = (1 1B 2B2 qBq)at= (B)at

    Autocovariances and Autocorrelations

    0 = (1 +21+ +

    2q)

    2

    k = (k+1k+1+ +qkq)2, k= 1, 2, . . . , q

    k = 0, k > q

    k = k+1k+1+ +qkq

    1 +21+ +2q

    , k= 1, 2, . . . , q

    k = 0, k > q

    Partial Autocorrelation Function for MA Process

    The ACF of a moving average process of order q cuts off after lag q(k= 0, k > q). However, its PACF is infinite in extent (it tails off).

    In General, the PACF is dominated by combinations of damped exponentials

    and/or damped sine waves. 22

  • 8/11/2019 Chapter 4X

    23/96

    Example: MA(1) Model: zt = at at1

    0 = (1 +2)2

    1 = 2

    1 =

    1 +2; (|| 1

    Invertibility condition: since both (, 1/) are roots of the equation

    21++1= 0,

    so we restrict ourselves to the situation with ||

  • 8/11/2019 Chapter 4X

    24/96

    24

  • 8/11/2019 Chapter 4X

    25/96

    Autoregressive Moving Average (ARMA(p,q)) Processes

    (1 1B pBp)(zt ) = (1 1B qBq)at

    or

    zt = 1(zt1 ) + +p(ztp )

    + at 1at1 qatq

    k > qp, ACF dominated by AR part

    k > p q, PACF dominated by MA part

    SummaryThe ACF and PACF of a mixed ARMA(p,q) process are both infinite in extentand tail off (die down) as the lag k increases. Eventually (for k > qp), theACF is determined from the autoregressive part of the model. The PACF iseventually (for k > p q) determined from moving average part of model.

    25

  • 8/11/2019 Chapter 4X

    26/96

    ARMA(1,1) Process

    (1 B)(zt ) = (1 B)at

    ACF:

    0 = 1 +2 2

    1 2 2,

    1 = (1 )( )

    1 2 2,

    1 = (1 )( )

    1 +2

    2k = k1, k >1.

    PACF: 11=1, then similar to MA(1)

    26

  • 8/11/2019 Chapter 4X

    27/96

  • 8/11/2019 Chapter 4X

    28/96

    4.3 Nonstationary processes

    Nonstationary, Differencing, and Transformations

    M1: zt=0+1t+atM2: (1 B)zt= 1+at orzt=Zt1+1+at.M3: (1 B)2zt=at orzt= 2zt1 zt2+at.

    Nonstationary Variance

    Examples: 1. Growth rates. 2. Demand for repair parts.

    28

  • 8/11/2019 Chapter 4X

    29/96

    29

  • 8/11/2019 Chapter 4X

    30/96

    30

  • 8/11/2019 Chapter 4X

    31/96

    31

  • 8/11/2019 Chapter 4X

    32/96

    32

  • 8/11/2019 Chapter 4X

    33/96

    An Alternative Argument for Differencing

    Let us assume that data zt are described by a nonstationary level

    t= E(zt) =0+1t

    and that we wish to entertain an ARMA model for a function ofzt that has a

    level independent of time. Consider the model p+1(B)zt= 0+(B)at

    0 represent the level of p+1(B). To achieve a 0 that is constant,independent of time,

    p+1(B) = (1B)(B)and

    0=p+1(B)(0+1t) =(B)(1B)(0+1) =(B)1

    33

  • 8/11/2019 Chapter 4X

    34/96

    Ift= 0+ 1t + + dtd, we have to choose the autoregressive operator

    in p+d(B)zt= 0+(B)at such thatp+d(B) = (1 B)d(B)

    and then

    0 = p+d(B)(0+1t+ +dtd)

    = (B)(1 B)d(0+1t+ +dtd)

    = (B)dd!.

    34

  • 8/11/2019 Chapter 4X

    35/96

    Autoregressive Integrated Moving Average (ARIMA) Models

    Transform a nonstationary series zt into a stationary one by consideringrelevant differenceswt=

    dzt= (1 B)dzt.

    Then use the ARMA models to describe wt. The corresponding model canbe written as

    (B)wt=0+(B)at

    or(B)(1 B)dzt= 0+(B)at

    ARIMA(p,d,q) The order p,d,q are usually small. The model is called

    integrated, since zt can be thought of as the summation (integration) ofthe stationary series wt. For example, ifd= 1

    zt= (1B)1wt=

    t

    k=wk

    35

  • 8/11/2019 Chapter 4X

    36/96

    Choice of the proper degree of differencing

    Plot of the series

    Examination of the SACFA stationary AR(p) process requires that all Gi in

    (1 1B pBp) = (1 G1B) (1 GpB)

    are such that |Gi| < 1 (i = 1, 2, . . . , p), Now assume one of them, say G1,approaches 1, that is G1 = 1 , where is a small positive number. Thenthe autocorrelations

    k=A1Gk1+ +ApGkp =A1Gk1.

    So the exponential decay will be show and almost linear [i.e. A1Gk1 =

    A1(1 )k=A1(1 k)]

    36

  • 8/11/2019 Chapter 4X

    37/96

    Consider the ARMA(1,1) model

    (1 B)zt= (1 B)at

    The ACF of this process is given by

    1 = (1 )( )

    1 +2 2

    k = k1

    If we let 1, then k 1 for any finite k >0. Again the autocorrelation

    decay very slowly when is near 1.

    It should be pointed out that the slow decay in the SACF can start at valuesofr1 considerably smaller that 1.

    37

  • 8/11/2019 Chapter 4X

    38/96

    Dangers of Overdifferencing:

    Although further differences of a stationary series will again be stationary,

    overdifferencing can lead to serious difficulties.

    Effect of overdifferencing on the autocorrelation structure.

    Consider the stationary MA(1) process zt= (1 B)at.The autocorrelation function of this process is given by

    k=

    1+2

    , k= 1

    0, k >1

    The first difference of the process is

    (1B)zt= (1 (1 +)B+B2

    )at

    and the autocorrelation of this difference is given by

    k=

    (1+)2

    2(1++2), k= 1

    2

    2(1++2), k= 2

    0, k >2 38

  • 8/11/2019 Chapter 4X

    39/96

    Effect of overdifferencing on variance.

    Consider the same MA(1) process. Its variance is given by

    0(z) = (1 +)22

    The variance of the overdifferenced series wt = (1 B)zt, which follows

    an MA(2) process, is given by

    0(w) = 2(1 ++2)2

    Hence

    0(w) 0(z) = (1 +2

    )2

    >0

    39

  • 8/11/2019 Chapter 4X

    40/96

    Considered the stationary process AR(1) process (1B)zt = at with

    variance0=

    2

    1 2

    The first difference wt follows the ARMA(1,1) process (1 B)wt = (1 B)at. The variance of this process is given by

    0(w) =2(1 )2

    1 2

    and

    0(w) 0(z) =(1 2)2

    1 2

    Hence when < 12, overdifferencing will increasing the variance.

    40

  • 8/11/2019 Chapter 4X

    41/96

    Regression and ARIMA Models

    ARIMA(0,1,1) model (1 B)zt= (1 B)a

    t

    Since(B) = (1 B)/(1B) = 1 + (1 )B + (1 )B2 + , we canrelated the variable zt+l at time t+l to a fixed origin t and express it as

    zt+l=(t)0 +et(l)

    whereet(l) =at+l+(1)[at+l1+ +at+1]is a function of the random

    shocks that enter the system after time t, and (t)0 = (1 )t

    j= ajdepends on all the shock (observations) up to and including time t.

    The level (t)0 can be updated according to

    (t+1)0 =

    (t)0 + (1 )at+1

    In the special case when = 1,et(l) =at+land(l+1)0 =

    (t)0 =0. In this

    case, the model is equivalent to the constant mean model zt+l=0+at+l.

    41

  • 8/11/2019 Chapter 4X

    42/96

    ARIMA(0,2,2) model (1 B)2zt+l= (1 1B 2B2)at+l

    Using an argument similar to that followed for ARIMA(0,1,1) model, theARIMA(0,2,2) can be written as

    zt+l=(t)0 +

    (t)1 l+et(l)

    where et(l) are correlated and given by

    et(l) =at+l+1at+l1+ +l1at+1

    where the weights may be obtained from

    (1 +1B+2B2 + )(1B)2 = 1 1B 2B

    2

    These weights can be given as j = (1 +2) +j(1 1 2), j 1.

    42

  • 8/11/2019 Chapter 4X

    43/96

    Furthermore, the trend parameter (t)0 and

    (t)1 adapt with time according

    to

    (t+1)0 =

    (t)0 +

    (t)1 + (1 +2)at+1

    (t+1)1 =

    (t)1 + (1 1 2)at+1

    Again the model above can be thought of as a generalization of thedeterministic trend regression model

    zt+l=0+1l+at+l

    In the special case when 1 2 and2 1, the two models above areequivalent

    43

  • 8/11/2019 Chapter 4X

    44/96

    44

  • 8/11/2019 Chapter 4X

    45/96

    45

  • 8/11/2019 Chapter 4X

    46/96

    4.4 Forecasting

    ARIMA model (B)(1B)dzt=(B)at

    Autoregressive representation

    zt=1zt1+2zt2+ +at

    where the weights are given by

    1 B B2 =(B)(1 B)d

    (B)

    l-step forecastzl=0zn+1zn1+2zn2+

    orzl=0an+1an1+2an2+

    46

  • 8/11/2019 Chapter 4X

    47/96

    Minimum mean square error (MMSE) forecasts: The mean square errorE[zn+l zn(l)]

    2 is minimized.

    Using the weights in (B) = (B)1(B)(1 B)d to write the modelin its linear filter representation

    zn+l=an+l+1an+l1+ +l1an+1+lan+l+1an1+

    Then the mean square error is expressed as

    E[zn+l zn(l)] = E[an+l+an+l1+ +l1an+1+ (l 0)an

    +(l+1 1)an1+ ]2

    = (1 +21+ +2l1)

    2 +j=0

    (l+j j)22.

    47

  • 8/11/2019 Chapter 4X

    48/96

    The minimum mean square error (MMSE) forecast

    zn(l) =lan+l+1an1+

    The forecast error is given by

    en(l) =zn+l zn(l) =an+l+1an+l1+ +l1an+1

    and its variance by

    Var[en(l)] =2(1 +21+

    22+ +

    2l1)

    E(an+j|zn, zn1, . . .) =

    an+j j 00 j >0

    Hence

    E(zn+l|zn, zn1, . . .) = E[(an+l+1an+l1+ +l1an+1

    +lan+l+1an1+ )|zn, zn1, . . .]

    = lan+l+1an1+ 48

  • 8/11/2019 Chapter 4X

    49/96

    Examples:

    AR(1) process: zt = (zt1 ) +at

    AR(2) process: zt=1zt1+2zt2+at

    ARIMA(0,1,1) process: zt=zt1+at at1

    ARIMA(1,1,1) process: (1 B)(1 B)zt= 0+ (1 B)at or

    zt= 0+ (1 +)zt1 zt2+at at1

    ARIMA(0,2,2) process: (1B)

    2

    zt= (1 1B 2B

    2

    )at or

    zt= 2zt1 zt2+at 1at1 2at2

    49

  • 8/11/2019 Chapter 4X

    50/96

    Prediction Limits

    It is usually not sufficient to calculate point forecast alone. The uncertaintyassociated with a forecast should be expressed, and the prediction limitsshould always be computed.

    It is possible, but not very easy, to calculate a simultaneous predictionregion for a set of future observations. Therefore, we give probability limits

    only for individual forecasts.

    Assuming that ats (and hence the zs) are normally distributed, we cancalculate 100(1 ) percent prediction limits form

    zn(l) u/2{V(en(l))}

    1/2

    where u/2 is the 100(1 /2) percentage point of standard normaldistribution.

    50

  • 8/11/2019 Chapter 4X

    51/96

    Because zn(l) andV(en(l)) are unknown exactly, and need estimate. Hence

    the estimated prediction limits are given by

    zn(l) u/2{V(en(l))}1/2

    Consider AR(1) process for Yield data, we found z156(1) = 0.56, z156(2) =

    0.62, z156(3) = 0.68, V[e156(1)] = 0.024, V[e156(2)] = 0.041, V[e156(1)] =0.054. Thus the 95 percent prediction limits forz156+l, l= 1, 2, 3 are givenby

    z157: z156(1) 1.96{V[e156(1)]}1/2 =.56 1.96(.15) =.56 .29

    z158: z156(2) 1.96{V[e156(2)]}1/2

    =.62 1.96(.20) =.62 .39z159: z156(3) 1.96{V[e156(3)]}

    1/2 =.68 1.96(.23) =.68 .45

    51

  • 8/11/2019 Chapter 4X

    52/96

    Forecast Updating

    Suppose we are at time n and we are predicting l+ 1 steps ahead. Theforecast is given by

    zn(l+ 1) =l+1an+l+2an1+

    Afterzn+1has become available, we need to update our prediction ofzn+l+1.On the other handzn+1(l) can be written as

    zn+l=lan+1+l+1an+l+2an1+

    we find that

    zn+1(l) =zn(l+ 1) +lan+1= zn(l+ 1) +l(zn+1 zn(l))

    The updated forecast is a linear combination of the previous forecast ofzn+l+1 made at time n and the most recent one-step-ahead forecast error

    en(l) =zn+1 zn(l) =an+1. 52

  • 8/11/2019 Chapter 4X

    53/96

    Use Yield data example as shown as before. We already obtained the forecastforz157, z158, z159 from time origin n= 156. Suppose now that z157= 0.74 isobserved and we need to update the forecasts for z158 andz159.

    Since the model under consideration is AR(1) with the parameter estimate= 0.85, the estimated weights are given by j =

    j = 0.85j.

    Then the updated forecasts are given by

    z158: z156(2) + 1[z157 z156(1)] = 0.62 + 0.85(0.74 0.56) = 0.77

    z159: z156(3) + 2[z157 z156(1)] = 0.62 + 0.852(0.74 0.56) = 0.81

    The updated 95 percent prediction limits for z158 andz159 become

    z158: z157(1) 1.96{V[e157(1)]}1/2 =.77 .29

    z159: z157(2) 1.96{V[e157(2)]}1/2 =.81 .39

    53

  • 8/11/2019 Chapter 4X

    54/96

    4.5 Model Specification

    Step 1. Variance-stabilizing transformations. If the variance of the series changeswith the level, then a logarithmic transformation will often stabilize thevariance. If the logarithmic transformation still not stabilize the variance, amore general approach and employ the class of power transformations canbe adopted.

    Step 2. Degree of differencing. If the series or its appropriate transformation is notmean stationary, then the proper degree of differencing has to determined.For the series and its differences, we can examine

    1. Plots of the time series

    2. Plot of the SACF3. Sample variances of the successive differences

    54

  • 8/11/2019 Chapter 4X

    55/96

  • 8/11/2019 Chapter 4X

    56/96

    56

  • 8/11/2019 Chapter 4X

    57/96

    4.6 Model Estiamtion

    After specifying the form of the model, it is necessary to estimate its parameters.Let (z1, z2, . . . , zN) represent the vector of the original observations and w =(w1, . . . , wn)

    the vector of the n= N d stationary differences.

    Maximum Likelihood Estimates

    The ARIMA model can be written as

    at= 1at1+ +qatq+wt 1wt1 pwtp

    The joint probability density function ofa= (a1, a2, . . . , an)

    is given by

    p(a|,,2) = (22)n/2 exp

    1

    22

    nt=1

    a2t

    57

  • 8/11/2019 Chapter 4X

    58/96

  • 8/11/2019 Chapter 4X

    59/96

    Exact likelihood function for three important models

    1. AR(1): at= wt wt1

    L(, 2|w) = (22)n/2(1 2)1/2

    exp

    1

    22

    (1 2)w21+

    nt=2

    (wt wt1)2

    2. MA(1): at= at1+wt

    L(, 2|w) = (22)n/2

    1 2

    1 2(n+1)

    1/2exp

    1

    22

    nt=0

    E2(at|w)

    3. ARMA(1,1): at= at1+wt wt1

    L(,,2|w) = (22)n/2|ZZ|1/2 exp

    1

    22S(, )

    where |ZZ| =

    (1 2)(1 2) + (1 2n)( )2

    (1 2)(1 2)

    and

    S(, ) = E2(a0|w) + 1 2

    ( )2[E(w0|w) E(a0|w)]2 +

    nt=1

    E2(at|w) 59

  • 8/11/2019 Chapter 4X

    60/96

    Maximum likelihood Estimate (MLE) of parameters (,,2) for theARIMA model can be obtained by maximizing the likelihood function. Ingeneral, closed-form solutions cannot be found. However, various algorithms

    are available to compute the MLEs or close approximations.

    1. AR(1): Difficult to get closed-form solution, and iterative methods have tobe employed.

    2. MA(1): Need to calculate E(at|w), t= 1, 2, . . . , n. These can be calculatedusing the recurrence relation

    E(at|w) =E(at1|w) +wt, t= 1, 2, . . . , n

    provideE(a0|w). The difficulties of obtaining the MLE are thus twofold: (1)Calculate E(a0|w) and (2) the likelihood function is a complicated functionof.

    3. ARMA(1,1) Similar as MA(1), to calculate the likelihood function, we needE(a0|w) andE(w0|w). From these we can calculate

    E(at|w) =E(at1|w) +wt E(wt1|w) 60

  • 8/11/2019 Chapter 4X

    61/96

    In summary, there are two major difficulties with maximum likelihoodestimation approach.

    1. The present of the function g1(,,2) in likelihood function makes themaximization difficult.

    2. This approach requires the calculation of the conditional expectationsE(u|w), where u = (u1pq, . . . , u1, u0)

    . This in turn requires the

    calculation of

    E(at|w), t= 1 q , . . . ,1, 0 and E(wt|w), t= 1p, . . . ,1, 0

    61

  • 8/11/2019 Chapter 4X

    62/96

  • 8/11/2019 Chapter 4X

    63/96

    Evaluation of the Initial ExpectationThere are two methods available for the evaluation of E(ut|w)(t 0), themethod of least squares and backforecasting.

    Method of least squares.

    For the ARMA(1,1) model, it is true in general that E(u|w) =u where uis the least squares estimator ofu in

    Lw= Zu+u

    where Land Zare(n +p + q) nand(n +p + q) (p + q)matrices involvingthe parameters and (For detail See Appendix 5 in the textbook).

    For given values of(, ), we can calculate u, and hence S(, ). Then wecan search for a minimum value ofS(, ) in the space of(, )

    63

  • 8/11/2019 Chapter 4X

    64/96

    Method of backforecasting.

    The MMSE forecast of zn+l made at time n is given by the conditionalexpectation E(zn+l|zn, zn1, . . .). This result may be used to computethe unknown quantities E(at|w)(t = 1 q , . . . ,1, 0) and E(wt|w)(t =1p, . . . ,1, 0).

    The MMSE forecast of at and wt(t 0) can be found by considering the

    backward model

    et=1et+1+2et+2+ +qet+q+wt 1wt+1 pwt+p.

    The ACF of this backward model is same as the original model. Furthermore,

    et is a white-noise sequence with Var(et) =2

    .

    This method of evaluating E(at|w)andE(wt|w)is calledbackforecasting(or backcasting).

    64

  • 8/11/2019 Chapter 4X

    65/96

    The method of backforecasting with a simple MA(1) process at =at1+wt

    This process can be expressed in the backward form as

    et= et+1+wt

    When the process is going backwards, then e0, e1, . . . are random shocksthat are future to the observations wn, wn1, . . . , w1. Hence,

    E(et|w) = 0, t 0

    Furthermore, since the MA(1) process has a memory of only one period, thecorrelation between a1 andw1, w2, . . . , wn is zero, and

    E(at|w) = 0, t 1

    65

  • 8/11/2019 Chapter 4X

    66/96

  • 8/11/2019 Chapter 4X

    67/96

    with the value of E(w0|w), same as before we can compute E(at|w)(t =1, 2, . . . , n)

    E(a0|w) = E(a1|w) + E(w0|w)

    E(a1|w) = E(a0|w) +w1...

    E(an|w) = E(an1|w) +wn

    The sum of squares S() for any can be obtained by

    S() =

    nt=0

    E2(at|w),

    and its minimum can be found.

    67

  • 8/11/2019 Chapter 4X

    68/96

    68

  • 8/11/2019 Chapter 4X

    69/96

    Conditional Least Squares Estimates

    Computationally simpler estimates can be obtained by minimizing theconditional sum of squares Sc(, ) =

    nt=p+1

    a2t , where the starting values

    ap, ap1, . . . , ap+1q in the calculation of the appropriate ats are set equalto zero. The resulting estimates are called conditional least squaresestimates(CLSE).

    69

  • 8/11/2019 Chapter 4X

    70/96

    In the AR(p) case, the CLSE is obtained by minimizing

    Sc() =

    nt=p+1

    a2t =

    nt=p+1

    (wt 1wt1 pwtp)2

    which leads to the least square estimate

    c= (XX)1Xy

    wherey= (wp+1, wp+2, . . . , wn)

    and

    X=

    wp wp1 w1wp+1 wp w2

    ... ... ...wn1 wn2 wnp

    The special case of AR(1) process 70

  • 8/11/2019 Chapter 4X

    71/96

    Nonlinear Estimation

    If the model contains moving average terms, then the conditional andunconditional sums of squares are not quadratic functions of the parameters(, ). This is becauseE(ut|w)in S(, )is nonlinear in the parameters. Hencenonlinear least squares procedures must be used to minimizeS(, )orSc(, ).

    The nonlinear estimation procedure and drive the unconditional least

    squares estimates of the parameters in the ARMA(1,1) model

    at= at1+wt wt1

    71

  • 8/11/2019 Chapter 4X

    72/96

    Step 1. Computation at the starting point. For given w and a startingpoint(, ) = (0, 0), using backforecasting or the method of least squares, wecan get

    u10= E(a0|w, 0, 0)and

    u00=(1 0)

    1/2

    0 0[E(w0|w, 0, 0) E(a0|w, 0, 0)]

    The remaining elements in u0= (u10,u00,u10, . . . ,un0) can be obtained from

    ut0= E(at|w, 0, 0) fort= 1, 2, . . . , n.Step 2. Linearization. Regard ut = E(at|w, , )(t = 1, 0, 1, , n) as a function of = (, ) and linearize this function at the starting point0= (0, 0)

    in a first-order Taylor series expansion:

    ut = ut0+ ( 0)ut=0

    + ( 0)ut=0

    = ut0 x1,t( 0) x2,t( 0)

    where

    x1,t=

    ut

    =0

    and x2,t=

    ut

    =0

    72

  • 8/11/2019 Chapter 4X

    73/96

    So we have

    ut0= (x1,t, x2,t)

    0

    0

    + ut

    Considering this equation for t= 1, 0, . . . , n, we obtain

    u0=X( 0) +u

    where u= (u1,u0, . . . ,un) and

    X=

    x1,1 x2,1x1,0 x2,0

    ... ...x1,n x2,n

    The least squares estimate of = 0 may be obtained from =(XX)1Xu0. Hence we obtain a modified estimator of as

    1= 0+

    73

  • 8/11/2019 Chapter 4X

    74/96

    Step 3. Repeat step 1 and step 2 with 1 as the new point of expansion in theTaylor series and continue the process until convergence occurs. Convergenceis usually determined by one of the following criteria:

    1. The relative change in the sum of squaresn

    t=1u2t from two consecutive

    iteration is less than a prespecified small value.

    2. The largest relative change in the parameter estimate of two consecutive

    iterations is less than a prespecified small value.

    When convergence occurs, we have an estimate based on local linear leastsquares, and this will be taken as the unconditional least squares estimate of.The conditional least squares estimate can be obtained in a similar fashion by

    replacing the vector u bya= (a2, a3, . . . , an) and settinga1= 0.

    74

  • 8/11/2019 Chapter 4X

    75/96

  • 8/11/2019 Chapter 4X

    76/96

  • 8/11/2019 Chapter 4X

    77/96

    For small k, the true standard error can be much smaller. In general, thetrue standard error depend on

    1. the form of the fitted model,2. the true parameter values,3. the value ofk

    77

  • 8/11/2019 Chapter 4X

    78/96

    Portmanteau Test

    Under null hypothesis of model adequacy, the large-sample distribution ofrais multivariate normal and that

    Q =nKk=1

    r2a(k)

    has a large-sample chi-square distribution withKp qdegrees of freedom.

    For small sample size, this test is quite conservative; that is, the chance ofincorrectly rejecting the null hypothesis of model adequacy is smaller thanthe chosen significance level. The modified test statistics is

    Q= n(n+ 2)

    Kk=1

    r2a(k)

    n k

    78

  • 8/11/2019 Chapter 4X

    79/96

    4.8 Examples

    Yield dataThis series consists of monthly differences between the yield on mortgagesand the yield on government loans in the Netherlands from January 1961 toDecember 1973. The observations for January to March 1974 are also available,but they are held back to illustrating the forecasting and updating.

    Figures: Time series plot and SACF

    AR(1) modelzt = (zt1 ) +at

    The maximum likelihood estimates are

    = 0.97(0.08), = 0.85(0.04), 2 = 0.024

    79

  • 8/11/2019 Chapter 4X

    80/96

    80

  • 8/11/2019 Chapter 4X

    81/96

  • 8/11/2019 Chapter 4X

    82/96

    Forecasting

    z156(1) = +(z156 ) =.97 +.85(.49 .97) =.56

    Prediction limits

    z156(1) 1.96= 0.56 1.96(.024)1/2 = 0.56 0.29

    z156(2) = +2(z156 ) =.97 +.852(.49 .97) =.62

    Prediction limits

    z156(2) 1.96(1 + 21)

    1/2 = 0.62 1.96(.024)1/2(1 +.852)1/2 = 0.62 0.39

    z156(3) = +3(z156 ) =.97 +.853(.49 .97) =.68

    Prediction limits

    z156(3) 1.96(1 + 21+

    22)1/2 = 0.68 1.96(.024)1/2(1 +.852 +.854)1/2

    = 0.68 0.45 82

  • 8/11/2019 Chapter 4X

    83/96

    Suppose z157= 0.74 has now become available. Forecasts for z158 andz159are updated according to

    z158: z157(1) = z156(2) + 1[z157 z156(1)] = 0.62 +.85(.74 .56) =.77

    z159: z157(2) = z156(3) + 2[z157 z156(1)] = 0.62 + .852(.74 .56) =.81

    The prediction interval forz158 interval forz158 is0.77 0.29, and that forz159 is 0.81 0.39.

    83

  • 8/11/2019 Chapter 4X

    84/96

    Growth Rates

    Quarterly growth of Iowa nonfarm income from the second quarter of 1948

    to the fourth quarter of 1978 (N= 123) are shown in Figure 5.8a

    The plot of the series indicates that no variance-stabilizing transformation isnecessary.

    zt is nonstationary and the first difference zt should be analyzed. Thesecond difference is also stationary but would lead to overdifferencing.This is also indicated by the increase in the sample variance [s2(zt) =1.58, s2(2zt) = 4.82].

    The SACF ofzt in Figure 5.11b indicates a cutoff after lag 1. The SPACFin Figure 5.15 shows a rough exponential decay. These patterns lead us toconsider the model

    zt= 0+ (1 B)at

    84

  • 8/11/2019 Chapter 4X

    85/96

    85

  • 8/11/2019 Chapter 4X

    86/96

  • 8/11/2019 Chapter 4X

    87/96

    87

  • 8/11/2019 Chapter 4X

    88/96

    Foreasting

    The predictions z123() are described below

    z123() = z123(1) =z123 a123= 3.38 .88(.718) = 2.75

    Note that a123= .718 is the last residual.

    The estimated variance of the -step-ahead forecast error is

    V[e123()] = [1 + ( 1)(1 )2]2 = [1 + ( 1)(1 0.88)2](.95)

    after a new observationz124= 1.55has become available, update the forecast

    z124() = z123(+1)+[z124 z123] = 2.75+(10.88)(1.552.75) = 2.61

    Furthermore, after z125= 2.93 has become available,

    z125() = z124(+1)+[z125 z124] = 2.61+(10.88)(2.932.61) = 2.65

    88

  • 8/11/2019 Chapter 4X

    89/96

    89

  • 8/11/2019 Chapter 4X

    90/96

    Demand for Pepair Parts This series consists of the monthly demand forrepair parts for a large Midwestern production company from January 1972 toJune 1979 (N= 90). The series is shown in Figure 5.9a.

    A logarithmic transformation need to stabilize the variance and that firstdifferences lnZt are stationary. Further differencing is not necessary sinceit results in an increase in the sample variance [s2(zt) =.034, s

    2(2lnzt) =.099]

    The sample autocorrelation and partial autocorrelation are shown in Figure5.12b and 5.17. They indicate a cutoff in the SACF after lag 1 and a roughexponential decay in the SPACF. Thus we consider the ARIMA(0,1,1) model

    (1 B)yt= (1 B)at

    where yt= ln zt. The ML estiamtes are

    = 0.57(.09),2 = 0.27

    90

  • 8/11/2019 Chapter 4X

    91/96

    91

  • 8/11/2019 Chapter 4X

    92/96

  • 8/11/2019 Chapter 4X

    93/96

    93

    F i

  • 8/11/2019 Chapter 4X

    94/96

    Forecasting

    The forecasting model is given in terms of the transformed yt = ln zt. The

    MMSE forecasts y() and the associated prediction interval c1, c2 for yn+can be derive as usual.

    However, to derive the forecasts of original observation zn+ = exp(yn+),we have to back transform the forecasts and the prediction interval. They

    are given by

    zn() = exp[yn()] and [exp(c1), exp(c2)]

    The forecasts of the transformed seriesyn+are MMSE forecasts. However if

    the transformation is nonlinear, as in the case the logarithmic transformation,the MMSE will not preserved for zn() = exp[yn()].

    Nevertheless, the forecast zn() will be the median of the predictivedistribution ofzn+.

    94

  • 8/11/2019 Chapter 4X

    95/96

    For the present example, the prediction y90() and the corresponding 95percent prediction limits are given by

    y90() = y90(1) =y90 a90= 7.864 .57(.234) = 7.731

    and

    y90()1.96[1+(1)(1)2]1/2= 7.7311.96[1+(1)(1.57)2]1/2(.027)1/2.

    Forecasts and 95 percent prediction interval s for y91, y92 and z91, z92 aregiven in Table 5.6.

    The prediction intervals are quite large. This is due partially to the largeuncertainty (2) in the past data and partly to the unexplained correlationat lag 12.

    95

  • 8/11/2019 Chapter 4X

    96/96