Download - Chapter 4X
-
8/11/2019 Chapter 4X
1/96
4. Stochastic Time Series Model
In Chapter 2 and 3 we consider forecast models of the form
zt= f(xt;) +t
In these forecasting models it is usually assumed that the error t areuncorrelated, which in turn implies that the observationszt are uncorrelated.However this assumption is rarely met in Practice
Thus models that capture the correlation structure have to be considered.
Such as AR(P), MA(q), ARMA(p,q) and ARIMA(d,p,q) will be studied inthis chapter.
1
-
8/11/2019 Chapter 4X
2/96
4.1 Stochastic Processes
An observed time series (z1, z2, . . . , zn) can be thought of as a particularrealization of a stochastic process.
Stochastic processes in general can be described by an n-dimensionalprobability distribution p(z1, z2, . . . , zn).
Assuming joint normality, then there are onlynobservations butn+(n+1)n/2unknown parameters. To infer such a general probability structure from justone realizations of the stochastic process will be impossible.
Hence Stationary Assumption is used to simplify the probability structure.
2
-
8/11/2019 Chapter 4X
3/96
3
-
8/11/2019 Chapter 4X
4/96
Stationary Stochastic Process
The joint distribution are stable with respect to time:
p(zt1, . . . , ztm) =p(zt1+k, . . . , ztm+k)
Autocorrelation Functions (ACF)
The autocovariances
k= Cov(zt, ztk) = E(zt )(ztk )
The autocorrelations
k= Cov(zt, ztk)
[Var(zt) Var(ztk)]12
=k0
4
-
8/11/2019 Chapter 4X
5/96
Sample Autocorrelation Function (SACF)
rk=
nt=k+1
(zt z)(ztk z)
n
t=1(zt z)2
, k= 0, 1, 2, . . .
For uncorrelated observations, the variance ofrk is approximately given by
Var(rk) 1
n
5
-
8/11/2019 Chapter 4X
6/96
-
8/11/2019 Chapter 4X
7/96
7
-
8/11/2019 Chapter 4X
8/96
4.2 Stochastic Difference Equation Models
A time series in which successive values are autocorrelated can be representedas a linear combination (linear filter) of a sequence of uncorrelated randomvariables. The linear filter representation is given by
zt = at+1at1+2at2+ =
j=0
jatj, 0= 1.
The random variable {at; t= 0,1,2, . . .} are a sequence of uncorrelatedrandom variable from a fixed distribution with mean E(at) = 0, Var(at) =E(a2t ) =
2, and Cov(at, atk) = E(akatk) = 0 for all k = 0.
Ezt= , 0= E(zt )2
=2
j=0
2
j ,
k = E[(zt )(ztk )] =2
j=1
jj+k,
k =
j=0
jj+k /
j=0
2j . 8
-
8/11/2019 Chapter 4X
9/96
Autoregressive Processes
First-Order Autoregressive Process (AR(1))
(zt ) (zt1 ) =at (1 B)(zt ) =at
We have
zt= (1B)1at= (1+B +
2B2+ )at= at+at1+2at2+
Application of the general results for linear filter lead to
0 =
2j=0
2
j =
j=0
2j
=
2
1 2
k = 2
j=0
jj+k=2
j=0
jj+k = 2k
1 2=k0, k= 0, 1, 2, . . .
k = k, k= 0, 1, 2, . . .
9
-
8/11/2019 Chapter 4X
10/96
Another approach to get the ACF for AR(1). This approach is more suitablefor high order situations. Without loss of generality, we assume = 0 andzt= zt1+at, then we have
k= E(ztztk) =E(zt1ztk) + E(atztk) =k1+ E(atztk)
where
k= 0, E(atzt) = E[at(at+at1+2at2+ )] =
2
k >0, E(atztk) = 0.
Thus
0 = 1+2 =1+
2
k = k1, (k >1), (1=0)
0 = 20+
2 = 2
1 2,
k = k0,
k =
k
10
-
8/11/2019 Chapter 4X
11/96
Second-Order Autoregression Process [AR(2)]
zt= 1zt1+2zt2+at
(1 1B 2B2)zt= at
and
zt= 1
1 1B 2B2at= (1 +1B+2B2 + )at
The coefficients (1, 2, . . .) could be determined as follows:
1 = (1 1B 2B2)(1 +1B+2B
2 + ),
B : 1 1= 0 1=1;B2 : 2 12 21= 0 2=
21+2;
B3 : 3 12 21= 0 3= 31+ 212;
j = 1j1+2j2, j 2.
11
-
8/11/2019 Chapter 4X
12/96
12
-
8/11/2019 Chapter 4X
13/96
Stationarity
AR(p) : (1 1B 2B2 pB
p) = (1 G1B) (1 GpB).
The stationarity relates to the conditions: |G1i | >1, i= 1, 2, . . . , p ..Particulary for AR(2), the stationarity corresponds to
1+2
-
8/11/2019 Chapter 4X
14/96
Actually the general solution of the difference equation:
(1 1B 2B
2
)k= 0
is
k=
A1Gk1+A2G
k2 (G1 =G2),
(A1+A2k)Gk, (G1= G2=G),
{A1 sin(k) +A2 cos(k)}(p2 +q2)k/2, p qi = G11,2,= arcsin(p2/
p2 +q2)
So, when stationary, the correlogram shows damped exponentials/sinusoidalwaves.
14
-
8/11/2019 Chapter 4X
15/96
15
-
8/11/2019 Chapter 4X
16/96
The general AR(p) model
0=11+ +pp+2 =
2
1 11 pp=
2
(B)0
k = 1k1+ +pkp, k >0
k = 1k1+ +pkp.
Letting k= 1, 2, . . . , p, we have the Yule-walker equations:
= P,
P =
1 1 2 p1
1 1 1 p2... . . . . . . ... ...
p1 p2 p3 1
Moments Methods:
= P1
16
-
8/11/2019 Chapter 4X
17/96
Partial Autocorrelations
When considering two random variables correlation coefficient, it would be fairif we could get ride of the influence of other random variables related to them.In theAR(p) model: assuming the stability
zt=k1zk1+k2zt2+ +kkztk+at zt= at
implies (B) has all the roots outside the unit circle. Therefore |kk|
-
8/11/2019 Chapter 4X
18/96
AR(p): Using Yule-walker Equations
12...
k
=
1 1 k11 1 k2... ... . . . ...
k1 k2 1
k1k2
...kk
kk =
1 1 k2 11 1 k3 2... ... . . . ... ...
k1 k2 1 k
1 1 k11 1 k2... ... . . . ...
k1 k2 1
1
The basic features of AR(p):
ACF: Combinations of damped exponentials/sinusodal waves; PACF: zero for lags larger than p.
18
-
8/11/2019 Chapter 4X
19/96
Sample Partial Autocorrelation Functions (SPAF)
1. Use the sample correlation coefficient rj in place of j and solve theYule-Walker equation for kk. This scheme is impractical if lag k is large.
2. Recursive forms:
kk =
rk k1
j=1k1,jrkj
1k1j=1
k1,jrj,
k,j = k1,j kkk1,kj, j = 1, 2, . . . , k 1.
3. Standard Errors ofkkIf the data follow an AR(p) process, then for lags greater than p the
variance ofkk can be approximated by
Var() n1 fork > p.
Approximate standard error for kk are given by n1/2. 19
-
8/11/2019 Chapter 4X
20/96
20
-
8/11/2019 Chapter 4X
21/96
21
-
8/11/2019 Chapter 4X
22/96
Moving Average MA(q) Model
zt = (1 1B 2B2 qBq)at= (B)at
Autocovariances and Autocorrelations
0 = (1 +21+ +
2q)
2
k = (k+1k+1+ +qkq)2, k= 1, 2, . . . , q
k = 0, k > q
k = k+1k+1+ +qkq
1 +21+ +2q
, k= 1, 2, . . . , q
k = 0, k > q
Partial Autocorrelation Function for MA Process
The ACF of a moving average process of order q cuts off after lag q(k= 0, k > q). However, its PACF is infinite in extent (it tails off).
In General, the PACF is dominated by combinations of damped exponentials
and/or damped sine waves. 22
-
8/11/2019 Chapter 4X
23/96
Example: MA(1) Model: zt = at at1
0 = (1 +2)2
1 = 2
1 =
1 +2; (|| 1
Invertibility condition: since both (, 1/) are roots of the equation
21++1= 0,
so we restrict ourselves to the situation with ||
-
8/11/2019 Chapter 4X
24/96
24
-
8/11/2019 Chapter 4X
25/96
Autoregressive Moving Average (ARMA(p,q)) Processes
(1 1B pBp)(zt ) = (1 1B qBq)at
or
zt = 1(zt1 ) + +p(ztp )
+ at 1at1 qatq
k > qp, ACF dominated by AR part
k > p q, PACF dominated by MA part
SummaryThe ACF and PACF of a mixed ARMA(p,q) process are both infinite in extentand tail off (die down) as the lag k increases. Eventually (for k > qp), theACF is determined from the autoregressive part of the model. The PACF iseventually (for k > p q) determined from moving average part of model.
25
-
8/11/2019 Chapter 4X
26/96
ARMA(1,1) Process
(1 B)(zt ) = (1 B)at
ACF:
0 = 1 +2 2
1 2 2,
1 = (1 )( )
1 2 2,
1 = (1 )( )
1 +2
2k = k1, k >1.
PACF: 11=1, then similar to MA(1)
26
-
8/11/2019 Chapter 4X
27/96
-
8/11/2019 Chapter 4X
28/96
4.3 Nonstationary processes
Nonstationary, Differencing, and Transformations
M1: zt=0+1t+atM2: (1 B)zt= 1+at orzt=Zt1+1+at.M3: (1 B)2zt=at orzt= 2zt1 zt2+at.
Nonstationary Variance
Examples: 1. Growth rates. 2. Demand for repair parts.
28
-
8/11/2019 Chapter 4X
29/96
29
-
8/11/2019 Chapter 4X
30/96
30
-
8/11/2019 Chapter 4X
31/96
31
-
8/11/2019 Chapter 4X
32/96
32
-
8/11/2019 Chapter 4X
33/96
An Alternative Argument for Differencing
Let us assume that data zt are described by a nonstationary level
t= E(zt) =0+1t
and that we wish to entertain an ARMA model for a function ofzt that has a
level independent of time. Consider the model p+1(B)zt= 0+(B)at
0 represent the level of p+1(B). To achieve a 0 that is constant,independent of time,
p+1(B) = (1B)(B)and
0=p+1(B)(0+1t) =(B)(1B)(0+1) =(B)1
33
-
8/11/2019 Chapter 4X
34/96
Ift= 0+ 1t + + dtd, we have to choose the autoregressive operator
in p+d(B)zt= 0+(B)at such thatp+d(B) = (1 B)d(B)
and then
0 = p+d(B)(0+1t+ +dtd)
= (B)(1 B)d(0+1t+ +dtd)
= (B)dd!.
34
-
8/11/2019 Chapter 4X
35/96
Autoregressive Integrated Moving Average (ARIMA) Models
Transform a nonstationary series zt into a stationary one by consideringrelevant differenceswt=
dzt= (1 B)dzt.
Then use the ARMA models to describe wt. The corresponding model canbe written as
(B)wt=0+(B)at
or(B)(1 B)dzt= 0+(B)at
ARIMA(p,d,q) The order p,d,q are usually small. The model is called
integrated, since zt can be thought of as the summation (integration) ofthe stationary series wt. For example, ifd= 1
zt= (1B)1wt=
t
k=wk
35
-
8/11/2019 Chapter 4X
36/96
Choice of the proper degree of differencing
Plot of the series
Examination of the SACFA stationary AR(p) process requires that all Gi in
(1 1B pBp) = (1 G1B) (1 GpB)
are such that |Gi| < 1 (i = 1, 2, . . . , p), Now assume one of them, say G1,approaches 1, that is G1 = 1 , where is a small positive number. Thenthe autocorrelations
k=A1Gk1+ +ApGkp =A1Gk1.
So the exponential decay will be show and almost linear [i.e. A1Gk1 =
A1(1 )k=A1(1 k)]
36
-
8/11/2019 Chapter 4X
37/96
Consider the ARMA(1,1) model
(1 B)zt= (1 B)at
The ACF of this process is given by
1 = (1 )( )
1 +2 2
k = k1
If we let 1, then k 1 for any finite k >0. Again the autocorrelation
decay very slowly when is near 1.
It should be pointed out that the slow decay in the SACF can start at valuesofr1 considerably smaller that 1.
37
-
8/11/2019 Chapter 4X
38/96
Dangers of Overdifferencing:
Although further differences of a stationary series will again be stationary,
overdifferencing can lead to serious difficulties.
Effect of overdifferencing on the autocorrelation structure.
Consider the stationary MA(1) process zt= (1 B)at.The autocorrelation function of this process is given by
k=
1+2
, k= 1
0, k >1
The first difference of the process is
(1B)zt= (1 (1 +)B+B2
)at
and the autocorrelation of this difference is given by
k=
(1+)2
2(1++2), k= 1
2
2(1++2), k= 2
0, k >2 38
-
8/11/2019 Chapter 4X
39/96
Effect of overdifferencing on variance.
Consider the same MA(1) process. Its variance is given by
0(z) = (1 +)22
The variance of the overdifferenced series wt = (1 B)zt, which follows
an MA(2) process, is given by
0(w) = 2(1 ++2)2
Hence
0(w) 0(z) = (1 +2
)2
>0
39
-
8/11/2019 Chapter 4X
40/96
Considered the stationary process AR(1) process (1B)zt = at with
variance0=
2
1 2
The first difference wt follows the ARMA(1,1) process (1 B)wt = (1 B)at. The variance of this process is given by
0(w) =2(1 )2
1 2
and
0(w) 0(z) =(1 2)2
1 2
Hence when < 12, overdifferencing will increasing the variance.
40
-
8/11/2019 Chapter 4X
41/96
Regression and ARIMA Models
ARIMA(0,1,1) model (1 B)zt= (1 B)a
t
Since(B) = (1 B)/(1B) = 1 + (1 )B + (1 )B2 + , we canrelated the variable zt+l at time t+l to a fixed origin t and express it as
zt+l=(t)0 +et(l)
whereet(l) =at+l+(1)[at+l1+ +at+1]is a function of the random
shocks that enter the system after time t, and (t)0 = (1 )t
j= ajdepends on all the shock (observations) up to and including time t.
The level (t)0 can be updated according to
(t+1)0 =
(t)0 + (1 )at+1
In the special case when = 1,et(l) =at+land(l+1)0 =
(t)0 =0. In this
case, the model is equivalent to the constant mean model zt+l=0+at+l.
41
-
8/11/2019 Chapter 4X
42/96
ARIMA(0,2,2) model (1 B)2zt+l= (1 1B 2B2)at+l
Using an argument similar to that followed for ARIMA(0,1,1) model, theARIMA(0,2,2) can be written as
zt+l=(t)0 +
(t)1 l+et(l)
where et(l) are correlated and given by
et(l) =at+l+1at+l1+ +l1at+1
where the weights may be obtained from
(1 +1B+2B2 + )(1B)2 = 1 1B 2B
2
These weights can be given as j = (1 +2) +j(1 1 2), j 1.
42
-
8/11/2019 Chapter 4X
43/96
Furthermore, the trend parameter (t)0 and
(t)1 adapt with time according
to
(t+1)0 =
(t)0 +
(t)1 + (1 +2)at+1
(t+1)1 =
(t)1 + (1 1 2)at+1
Again the model above can be thought of as a generalization of thedeterministic trend regression model
zt+l=0+1l+at+l
In the special case when 1 2 and2 1, the two models above areequivalent
43
-
8/11/2019 Chapter 4X
44/96
44
-
8/11/2019 Chapter 4X
45/96
45
-
8/11/2019 Chapter 4X
46/96
4.4 Forecasting
ARIMA model (B)(1B)dzt=(B)at
Autoregressive representation
zt=1zt1+2zt2+ +at
where the weights are given by
1 B B2 =(B)(1 B)d
(B)
l-step forecastzl=0zn+1zn1+2zn2+
orzl=0an+1an1+2an2+
46
-
8/11/2019 Chapter 4X
47/96
Minimum mean square error (MMSE) forecasts: The mean square errorE[zn+l zn(l)]
2 is minimized.
Using the weights in (B) = (B)1(B)(1 B)d to write the modelin its linear filter representation
zn+l=an+l+1an+l1+ +l1an+1+lan+l+1an1+
Then the mean square error is expressed as
E[zn+l zn(l)] = E[an+l+an+l1+ +l1an+1+ (l 0)an
+(l+1 1)an1+ ]2
= (1 +21+ +2l1)
2 +j=0
(l+j j)22.
47
-
8/11/2019 Chapter 4X
48/96
The minimum mean square error (MMSE) forecast
zn(l) =lan+l+1an1+
The forecast error is given by
en(l) =zn+l zn(l) =an+l+1an+l1+ +l1an+1
and its variance by
Var[en(l)] =2(1 +21+
22+ +
2l1)
E(an+j|zn, zn1, . . .) =
an+j j 00 j >0
Hence
E(zn+l|zn, zn1, . . .) = E[(an+l+1an+l1+ +l1an+1
+lan+l+1an1+ )|zn, zn1, . . .]
= lan+l+1an1+ 48
-
8/11/2019 Chapter 4X
49/96
Examples:
AR(1) process: zt = (zt1 ) +at
AR(2) process: zt=1zt1+2zt2+at
ARIMA(0,1,1) process: zt=zt1+at at1
ARIMA(1,1,1) process: (1 B)(1 B)zt= 0+ (1 B)at or
zt= 0+ (1 +)zt1 zt2+at at1
ARIMA(0,2,2) process: (1B)
2
zt= (1 1B 2B
2
)at or
zt= 2zt1 zt2+at 1at1 2at2
49
-
8/11/2019 Chapter 4X
50/96
Prediction Limits
It is usually not sufficient to calculate point forecast alone. The uncertaintyassociated with a forecast should be expressed, and the prediction limitsshould always be computed.
It is possible, but not very easy, to calculate a simultaneous predictionregion for a set of future observations. Therefore, we give probability limits
only for individual forecasts.
Assuming that ats (and hence the zs) are normally distributed, we cancalculate 100(1 ) percent prediction limits form
zn(l) u/2{V(en(l))}
1/2
where u/2 is the 100(1 /2) percentage point of standard normaldistribution.
50
-
8/11/2019 Chapter 4X
51/96
Because zn(l) andV(en(l)) are unknown exactly, and need estimate. Hence
the estimated prediction limits are given by
zn(l) u/2{V(en(l))}1/2
Consider AR(1) process for Yield data, we found z156(1) = 0.56, z156(2) =
0.62, z156(3) = 0.68, V[e156(1)] = 0.024, V[e156(2)] = 0.041, V[e156(1)] =0.054. Thus the 95 percent prediction limits forz156+l, l= 1, 2, 3 are givenby
z157: z156(1) 1.96{V[e156(1)]}1/2 =.56 1.96(.15) =.56 .29
z158: z156(2) 1.96{V[e156(2)]}1/2
=.62 1.96(.20) =.62 .39z159: z156(3) 1.96{V[e156(3)]}
1/2 =.68 1.96(.23) =.68 .45
51
-
8/11/2019 Chapter 4X
52/96
Forecast Updating
Suppose we are at time n and we are predicting l+ 1 steps ahead. Theforecast is given by
zn(l+ 1) =l+1an+l+2an1+
Afterzn+1has become available, we need to update our prediction ofzn+l+1.On the other handzn+1(l) can be written as
zn+l=lan+1+l+1an+l+2an1+
we find that
zn+1(l) =zn(l+ 1) +lan+1= zn(l+ 1) +l(zn+1 zn(l))
The updated forecast is a linear combination of the previous forecast ofzn+l+1 made at time n and the most recent one-step-ahead forecast error
en(l) =zn+1 zn(l) =an+1. 52
-
8/11/2019 Chapter 4X
53/96
Use Yield data example as shown as before. We already obtained the forecastforz157, z158, z159 from time origin n= 156. Suppose now that z157= 0.74 isobserved and we need to update the forecasts for z158 andz159.
Since the model under consideration is AR(1) with the parameter estimate= 0.85, the estimated weights are given by j =
j = 0.85j.
Then the updated forecasts are given by
z158: z156(2) + 1[z157 z156(1)] = 0.62 + 0.85(0.74 0.56) = 0.77
z159: z156(3) + 2[z157 z156(1)] = 0.62 + 0.852(0.74 0.56) = 0.81
The updated 95 percent prediction limits for z158 andz159 become
z158: z157(1) 1.96{V[e157(1)]}1/2 =.77 .29
z159: z157(2) 1.96{V[e157(2)]}1/2 =.81 .39
53
-
8/11/2019 Chapter 4X
54/96
4.5 Model Specification
Step 1. Variance-stabilizing transformations. If the variance of the series changeswith the level, then a logarithmic transformation will often stabilize thevariance. If the logarithmic transformation still not stabilize the variance, amore general approach and employ the class of power transformations canbe adopted.
Step 2. Degree of differencing. If the series or its appropriate transformation is notmean stationary, then the proper degree of differencing has to determined.For the series and its differences, we can examine
1. Plots of the time series
2. Plot of the SACF3. Sample variances of the successive differences
54
-
8/11/2019 Chapter 4X
55/96
-
8/11/2019 Chapter 4X
56/96
56
-
8/11/2019 Chapter 4X
57/96
4.6 Model Estiamtion
After specifying the form of the model, it is necessary to estimate its parameters.Let (z1, z2, . . . , zN) represent the vector of the original observations and w =(w1, . . . , wn)
the vector of the n= N d stationary differences.
Maximum Likelihood Estimates
The ARIMA model can be written as
at= 1at1+ +qatq+wt 1wt1 pwtp
The joint probability density function ofa= (a1, a2, . . . , an)
is given by
p(a|,,2) = (22)n/2 exp
1
22
nt=1
a2t
57
-
8/11/2019 Chapter 4X
58/96
-
8/11/2019 Chapter 4X
59/96
Exact likelihood function for three important models
1. AR(1): at= wt wt1
L(, 2|w) = (22)n/2(1 2)1/2
exp
1
22
(1 2)w21+
nt=2
(wt wt1)2
2. MA(1): at= at1+wt
L(, 2|w) = (22)n/2
1 2
1 2(n+1)
1/2exp
1
22
nt=0
E2(at|w)
3. ARMA(1,1): at= at1+wt wt1
L(,,2|w) = (22)n/2|ZZ|1/2 exp
1
22S(, )
where |ZZ| =
(1 2)(1 2) + (1 2n)( )2
(1 2)(1 2)
and
S(, ) = E2(a0|w) + 1 2
( )2[E(w0|w) E(a0|w)]2 +
nt=1
E2(at|w) 59
-
8/11/2019 Chapter 4X
60/96
Maximum likelihood Estimate (MLE) of parameters (,,2) for theARIMA model can be obtained by maximizing the likelihood function. Ingeneral, closed-form solutions cannot be found. However, various algorithms
are available to compute the MLEs or close approximations.
1. AR(1): Difficult to get closed-form solution, and iterative methods have tobe employed.
2. MA(1): Need to calculate E(at|w), t= 1, 2, . . . , n. These can be calculatedusing the recurrence relation
E(at|w) =E(at1|w) +wt, t= 1, 2, . . . , n
provideE(a0|w). The difficulties of obtaining the MLE are thus twofold: (1)Calculate E(a0|w) and (2) the likelihood function is a complicated functionof.
3. ARMA(1,1) Similar as MA(1), to calculate the likelihood function, we needE(a0|w) andE(w0|w). From these we can calculate
E(at|w) =E(at1|w) +wt E(wt1|w) 60
-
8/11/2019 Chapter 4X
61/96
In summary, there are two major difficulties with maximum likelihoodestimation approach.
1. The present of the function g1(,,2) in likelihood function makes themaximization difficult.
2. This approach requires the calculation of the conditional expectationsE(u|w), where u = (u1pq, . . . , u1, u0)
. This in turn requires the
calculation of
E(at|w), t= 1 q , . . . ,1, 0 and E(wt|w), t= 1p, . . . ,1, 0
61
-
8/11/2019 Chapter 4X
62/96
-
8/11/2019 Chapter 4X
63/96
Evaluation of the Initial ExpectationThere are two methods available for the evaluation of E(ut|w)(t 0), themethod of least squares and backforecasting.
Method of least squares.
For the ARMA(1,1) model, it is true in general that E(u|w) =u where uis the least squares estimator ofu in
Lw= Zu+u
where Land Zare(n +p + q) nand(n +p + q) (p + q)matrices involvingthe parameters and (For detail See Appendix 5 in the textbook).
For given values of(, ), we can calculate u, and hence S(, ). Then wecan search for a minimum value ofS(, ) in the space of(, )
63
-
8/11/2019 Chapter 4X
64/96
Method of backforecasting.
The MMSE forecast of zn+l made at time n is given by the conditionalexpectation E(zn+l|zn, zn1, . . .). This result may be used to computethe unknown quantities E(at|w)(t = 1 q , . . . ,1, 0) and E(wt|w)(t =1p, . . . ,1, 0).
The MMSE forecast of at and wt(t 0) can be found by considering the
backward model
et=1et+1+2et+2+ +qet+q+wt 1wt+1 pwt+p.
The ACF of this backward model is same as the original model. Furthermore,
et is a white-noise sequence with Var(et) =2
.
This method of evaluating E(at|w)andE(wt|w)is calledbackforecasting(or backcasting).
64
-
8/11/2019 Chapter 4X
65/96
The method of backforecasting with a simple MA(1) process at =at1+wt
This process can be expressed in the backward form as
et= et+1+wt
When the process is going backwards, then e0, e1, . . . are random shocksthat are future to the observations wn, wn1, . . . , w1. Hence,
E(et|w) = 0, t 0
Furthermore, since the MA(1) process has a memory of only one period, thecorrelation between a1 andw1, w2, . . . , wn is zero, and
E(at|w) = 0, t 1
65
-
8/11/2019 Chapter 4X
66/96
-
8/11/2019 Chapter 4X
67/96
with the value of E(w0|w), same as before we can compute E(at|w)(t =1, 2, . . . , n)
E(a0|w) = E(a1|w) + E(w0|w)
E(a1|w) = E(a0|w) +w1...
E(an|w) = E(an1|w) +wn
The sum of squares S() for any can be obtained by
S() =
nt=0
E2(at|w),
and its minimum can be found.
67
-
8/11/2019 Chapter 4X
68/96
68
-
8/11/2019 Chapter 4X
69/96
Conditional Least Squares Estimates
Computationally simpler estimates can be obtained by minimizing theconditional sum of squares Sc(, ) =
nt=p+1
a2t , where the starting values
ap, ap1, . . . , ap+1q in the calculation of the appropriate ats are set equalto zero. The resulting estimates are called conditional least squaresestimates(CLSE).
69
-
8/11/2019 Chapter 4X
70/96
In the AR(p) case, the CLSE is obtained by minimizing
Sc() =
nt=p+1
a2t =
nt=p+1
(wt 1wt1 pwtp)2
which leads to the least square estimate
c= (XX)1Xy
wherey= (wp+1, wp+2, . . . , wn)
and
X=
wp wp1 w1wp+1 wp w2
... ... ...wn1 wn2 wnp
The special case of AR(1) process 70
-
8/11/2019 Chapter 4X
71/96
Nonlinear Estimation
If the model contains moving average terms, then the conditional andunconditional sums of squares are not quadratic functions of the parameters(, ). This is becauseE(ut|w)in S(, )is nonlinear in the parameters. Hencenonlinear least squares procedures must be used to minimizeS(, )orSc(, ).
The nonlinear estimation procedure and drive the unconditional least
squares estimates of the parameters in the ARMA(1,1) model
at= at1+wt wt1
71
-
8/11/2019 Chapter 4X
72/96
Step 1. Computation at the starting point. For given w and a startingpoint(, ) = (0, 0), using backforecasting or the method of least squares, wecan get
u10= E(a0|w, 0, 0)and
u00=(1 0)
1/2
0 0[E(w0|w, 0, 0) E(a0|w, 0, 0)]
The remaining elements in u0= (u10,u00,u10, . . . ,un0) can be obtained from
ut0= E(at|w, 0, 0) fort= 1, 2, . . . , n.Step 2. Linearization. Regard ut = E(at|w, , )(t = 1, 0, 1, , n) as a function of = (, ) and linearize this function at the starting point0= (0, 0)
in a first-order Taylor series expansion:
ut = ut0+ ( 0)ut=0
+ ( 0)ut=0
= ut0 x1,t( 0) x2,t( 0)
where
x1,t=
ut
=0
and x2,t=
ut
=0
72
-
8/11/2019 Chapter 4X
73/96
So we have
ut0= (x1,t, x2,t)
0
0
+ ut
Considering this equation for t= 1, 0, . . . , n, we obtain
u0=X( 0) +u
where u= (u1,u0, . . . ,un) and
X=
x1,1 x2,1x1,0 x2,0
... ...x1,n x2,n
The least squares estimate of = 0 may be obtained from =(XX)1Xu0. Hence we obtain a modified estimator of as
1= 0+
73
-
8/11/2019 Chapter 4X
74/96
Step 3. Repeat step 1 and step 2 with 1 as the new point of expansion in theTaylor series and continue the process until convergence occurs. Convergenceis usually determined by one of the following criteria:
1. The relative change in the sum of squaresn
t=1u2t from two consecutive
iteration is less than a prespecified small value.
2. The largest relative change in the parameter estimate of two consecutive
iterations is less than a prespecified small value.
When convergence occurs, we have an estimate based on local linear leastsquares, and this will be taken as the unconditional least squares estimate of.The conditional least squares estimate can be obtained in a similar fashion by
replacing the vector u bya= (a2, a3, . . . , an) and settinga1= 0.
74
-
8/11/2019 Chapter 4X
75/96
-
8/11/2019 Chapter 4X
76/96
-
8/11/2019 Chapter 4X
77/96
For small k, the true standard error can be much smaller. In general, thetrue standard error depend on
1. the form of the fitted model,2. the true parameter values,3. the value ofk
77
-
8/11/2019 Chapter 4X
78/96
Portmanteau Test
Under null hypothesis of model adequacy, the large-sample distribution ofrais multivariate normal and that
Q =nKk=1
r2a(k)
has a large-sample chi-square distribution withKp qdegrees of freedom.
For small sample size, this test is quite conservative; that is, the chance ofincorrectly rejecting the null hypothesis of model adequacy is smaller thanthe chosen significance level. The modified test statistics is
Q= n(n+ 2)
Kk=1
r2a(k)
n k
78
-
8/11/2019 Chapter 4X
79/96
4.8 Examples
Yield dataThis series consists of monthly differences between the yield on mortgagesand the yield on government loans in the Netherlands from January 1961 toDecember 1973. The observations for January to March 1974 are also available,but they are held back to illustrating the forecasting and updating.
Figures: Time series plot and SACF
AR(1) modelzt = (zt1 ) +at
The maximum likelihood estimates are
= 0.97(0.08), = 0.85(0.04), 2 = 0.024
79
-
8/11/2019 Chapter 4X
80/96
80
-
8/11/2019 Chapter 4X
81/96
-
8/11/2019 Chapter 4X
82/96
Forecasting
z156(1) = +(z156 ) =.97 +.85(.49 .97) =.56
Prediction limits
z156(1) 1.96= 0.56 1.96(.024)1/2 = 0.56 0.29
z156(2) = +2(z156 ) =.97 +.852(.49 .97) =.62
Prediction limits
z156(2) 1.96(1 + 21)
1/2 = 0.62 1.96(.024)1/2(1 +.852)1/2 = 0.62 0.39
z156(3) = +3(z156 ) =.97 +.853(.49 .97) =.68
Prediction limits
z156(3) 1.96(1 + 21+
22)1/2 = 0.68 1.96(.024)1/2(1 +.852 +.854)1/2
= 0.68 0.45 82
-
8/11/2019 Chapter 4X
83/96
Suppose z157= 0.74 has now become available. Forecasts for z158 andz159are updated according to
z158: z157(1) = z156(2) + 1[z157 z156(1)] = 0.62 +.85(.74 .56) =.77
z159: z157(2) = z156(3) + 2[z157 z156(1)] = 0.62 + .852(.74 .56) =.81
The prediction interval forz158 interval forz158 is0.77 0.29, and that forz159 is 0.81 0.39.
83
-
8/11/2019 Chapter 4X
84/96
Growth Rates
Quarterly growth of Iowa nonfarm income from the second quarter of 1948
to the fourth quarter of 1978 (N= 123) are shown in Figure 5.8a
The plot of the series indicates that no variance-stabilizing transformation isnecessary.
zt is nonstationary and the first difference zt should be analyzed. Thesecond difference is also stationary but would lead to overdifferencing.This is also indicated by the increase in the sample variance [s2(zt) =1.58, s2(2zt) = 4.82].
The SACF ofzt in Figure 5.11b indicates a cutoff after lag 1. The SPACFin Figure 5.15 shows a rough exponential decay. These patterns lead us toconsider the model
zt= 0+ (1 B)at
84
-
8/11/2019 Chapter 4X
85/96
85
-
8/11/2019 Chapter 4X
86/96
-
8/11/2019 Chapter 4X
87/96
87
-
8/11/2019 Chapter 4X
88/96
Foreasting
The predictions z123() are described below
z123() = z123(1) =z123 a123= 3.38 .88(.718) = 2.75
Note that a123= .718 is the last residual.
The estimated variance of the -step-ahead forecast error is
V[e123()] = [1 + ( 1)(1 )2]2 = [1 + ( 1)(1 0.88)2](.95)
after a new observationz124= 1.55has become available, update the forecast
z124() = z123(+1)+[z124 z123] = 2.75+(10.88)(1.552.75) = 2.61
Furthermore, after z125= 2.93 has become available,
z125() = z124(+1)+[z125 z124] = 2.61+(10.88)(2.932.61) = 2.65
88
-
8/11/2019 Chapter 4X
89/96
89
-
8/11/2019 Chapter 4X
90/96
Demand for Pepair Parts This series consists of the monthly demand forrepair parts for a large Midwestern production company from January 1972 toJune 1979 (N= 90). The series is shown in Figure 5.9a.
A logarithmic transformation need to stabilize the variance and that firstdifferences lnZt are stationary. Further differencing is not necessary sinceit results in an increase in the sample variance [s2(zt) =.034, s
2(2lnzt) =.099]
The sample autocorrelation and partial autocorrelation are shown in Figure5.12b and 5.17. They indicate a cutoff in the SACF after lag 1 and a roughexponential decay in the SPACF. Thus we consider the ARIMA(0,1,1) model
(1 B)yt= (1 B)at
where yt= ln zt. The ML estiamtes are
= 0.57(.09),2 = 0.27
90
-
8/11/2019 Chapter 4X
91/96
91
-
8/11/2019 Chapter 4X
92/96
-
8/11/2019 Chapter 4X
93/96
93
F i
-
8/11/2019 Chapter 4X
94/96
Forecasting
The forecasting model is given in terms of the transformed yt = ln zt. The
MMSE forecasts y() and the associated prediction interval c1, c2 for yn+can be derive as usual.
However, to derive the forecasts of original observation zn+ = exp(yn+),we have to back transform the forecasts and the prediction interval. They
are given by
zn() = exp[yn()] and [exp(c1), exp(c2)]
The forecasts of the transformed seriesyn+are MMSE forecasts. However if
the transformation is nonlinear, as in the case the logarithmic transformation,the MMSE will not preserved for zn() = exp[yn()].
Nevertheless, the forecast zn() will be the median of the predictivedistribution ofzn+.
94
-
8/11/2019 Chapter 4X
95/96
For the present example, the prediction y90() and the corresponding 95percent prediction limits are given by
y90() = y90(1) =y90 a90= 7.864 .57(.234) = 7.731
and
y90()1.96[1+(1)(1)2]1/2= 7.7311.96[1+(1)(1.57)2]1/2(.027)1/2.
Forecasts and 95 percent prediction interval s for y91, y92 and z91, z92 aregiven in Table 5.6.
The prediction intervals are quite large. This is due partially to the largeuncertainty (2) in the past data and partly to the unexplained correlationat lag 12.
95
-
8/11/2019 Chapter 4X
96/96