1 chapter 3:box-jenkins seasonal modelling 3.1stationarity transformation “pre-differencing...
TRANSCRIPT
1
Chapter 3:Chapter 3: Box-Jenkins SeasonalBox-Jenkins SeasonalModellingModelling
3.1 Stationarity Transformation
“Pre-differencing transformation” is often used to stablize the seasonal variation of the time series. A common transformation is of the form:
)(*tt yny
2
*1
* ttt yyZ
• “Differencing transformation”:
1) (first non-seasonal difference)
2)(first seasonal difference, where L is the
number of seasons in a year)
3)(first seasonal and first non- seasonal
difference)
Of course, one can also obtain second and higher order differences by simply applying the same rule.
**Lttt yyZ
* * * *1 1t t t t L t LZ y y y y
3
3.2 Autocorrelation and Partial Autocorrelation
To determine if the data are stationary, we examine the behaviour of the autocorrelation and partial autocorrelation of the series at both the seasonal and non-seasonal level.
The behaviour of the SAC and SPAC functions at lags 1 to L-3 is often considered as the behaviour of these functions at the non-seasonal level.
A spike (significant memory) is said to exist if the corresponding SAC or SPAC are greater than twice their respective standard deviations.
The time series is considered to be stationary if the SAC of the series cuts off or dies down reasonably quickly at both the seasonal & non-seasonal levels.
4
Example 3.1
• Figure 3.1 shows the monthly passenger totals (yt) in thousands of passengers from 1949-59. The plot levels patterns of increasing seasonal variations.
• Figure 3.2 shows , which seems to have equalized the seasonal variations.
)(*tt yny
5
Figure 3.1Figure 3.1
Monthly total international airline passengers (in thousands), 1949-1959
0
100
200
300
400
500
600
Dec-48 May-50 Sep-51 Jan-53 Jun-54 Oct-55 Mar-57 Jul-58 Dec-59
no. of passengers
6
Figure 3.2Figure 3.2
Natural logarithms of monthly total international airline passengers,1949-1959
4.5
4.7
4.9
5.1
5.3
5.5
5.7
5.9
6.1
6.3
6.5
Sep-48 Feb-50 Jun-51 Nov-52 Mar-54 Jul-55 Dec-56 Apr-58 Sep-59
no. ofpassengers
7
• The following SAS output shows the SAC’s of , its first difference at the non-seasonal level, at the seasonal level and at both the non-seasonal and seasonal levels.
• On the basis of the SAC’s, it appears that first difference at either seasonal level, or at both seasonal and non-seasonal levels are necessary to ensure the stationarity of the data.
*ty
8
ARIMA Procedure Name of variable = LY. Mean of working series = 5.486478 Standard deviation = 0.414728 Number of observations = 132 Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 0.171999 1.00000 | |********************| 1 0.163124 0.94840 | . |******************* | 2 0.152803 0.88839 | . |****************** | 3 0.143954 0.83694 | . |***************** | 4 0.136137 0.79150 | . |**************** | 5 0.130741 0.76013 | . |*************** | 6 0.126696 0.73661 | . |*************** | 7 0.123230 0.71646 | . |************** | 8 0.121237 0.70487 | . |************** | 9 0.122719 0.71349 | . |************** | 10 0.124451 0.72355 | . |************** | 11 0.127306 0.74015 | . |*************** | 12 0.128377 0.74638 | . |*************** | 13 0.120171 0.69867 | . |************** | 14 0.110539 0.64267 | . |*************. | 15 0.102490 0.59587 | . |************ . | 16 0.094860 0.55151 | . |*********** . | 17 0.089022 0.51757 | . |********** . | 18 0.084737 0.49266 | . |********** . | 19 0.081216 0.47219 | . |********* . | 20 0.079499 0.46220 | . |********* . | 21 0.080921 0.47047 | . |********* . | 22 0.082292 0.47845 | . |********** . | 23 0.084129 0.48913 | . |********** . | 24 0.084738 0.49267 | . |********** . | "." marks two standard errors
9
ARIMA Procedure Name of variable = LY. Period(s) of Differencing = 1. Mean of working series = 0.009812 Standard deviation = 0.106038 Number of observations = 131 NOTE: The first observation was eliminated by differencing. Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 0.011244 1.00000 | |********************| 1 0.0021211 0.18864 | . |**** | 2 -0.0014190 -0.12620 | .***| . | 3 -0.0017381 -0.15458 | .***| . | 4 -0.0036763 -0.32696 | *******| . | 5 -0.000749 -0.06661 | . *| . | 6 0.00045338 0.04032 | . |* . | 7 -0.0011063 -0.09839 | . **| . | 8 -0.0038510 -0.34250 | *******| . | 9 -0.0012310 -0.10948 | . **| . | 10 -0.0013408 -0.11925 | . **| . | 11 0.0022435 0.19953 | . |****. | 12 0.0093677 0.83312 | . |***************** | 13 0.0022267 0.19803 | . |**** . | 14 -0.0015966 -0.14200 | . ***| . | 15 -0.0012365 -0.10996 | . **| . | 16 -0.0032543 -0.28942 | ******| . | 17 -0.0005262 -0.04680 | . *| . | 18 0.00039747 0.03535 | . |* . | 19 -0.0011731 -0.10433 | . **| . | 20 -0.0035000 -0.31128 | .******| . | 21 -0.0012046 -0.10713 | . **| . | 22 -0.000954 -0.08485 . **| . | 23 0.0020942 0.18625 . |**** . | 24 0.0080211 0.71337 | . |************** | "." marks two standard errors
10
ARIMA Procedure Name of variable = LY. Period(s) of Differencing = 12. Mean of working series = 0.121282 Standard deviation = 0.063215 Number of observations = 120 NOTE: The first 12 observations were eliminated by differencing. Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 0.0039962 1.00000 | |********************| 1 0.0029424 0.73631 | . |*************** | 2 0.0025646 0.64176 | . |************* | 3 0.0019980 0.49997 | . |********** | 4 0.0018314 0.45830 | . |********* | 5 0.0015802 0.39543 | . |******** | 6 0.0013155 0.32920 | . |******* | 7 0.0010092 0.25255 | . |***** . | 8 0.00079972 0.20012 | . |**** . | 9 0.00058932 0.14747 | . |*** . | 10 0.00003062 0.00766 | . | . | 11 -0.0004257 -0.10653 | . **| . | 12 -0.0009502 -0.23779 | . *****| . | 13 -0.0005842 -0.14618 | . ***| . | 14 -0.0005817 -0.14556 | . ***| . | 15 -0.0004511 -0.11287 | . **| . | 16 -0.0006197 -0.15507 | . ***| . | 17 -0.0004318 -0.10805 | . **| . | 18 -0.0005272 -0.13193 | . ***| . | 19 -0.0005622 -0.14069 | . ***| . | 20 -0.0006994 -0.17501 | . ****| . | 21 -0.0005544 -0.13872 | . ***| . | 22 -0.000448 -0.11211 | . **| . | 23 -0.0001579 -0.03950 | . *| . | 24 -0.0003788 -0.09480 | . **| . | "." marks two standard errors
11
ARIMA Procedure Name of variable = LY. Period(s) of Differencing = 1,12. Mean of working series = 0.001322 Standard deviation = 0.044889 Number of observations = 119 NOTE: The first 13 observations were eliminated by differencing. Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 0.0020150 1.00000 | |********************| 1 -0.0006363 -0.31578 | ******| . | 2 0.00021167 0.10505 | . |** . | 3 -0.0004326 -0.21469 | ****| . | 4 0.00009253 0.04592 | . |* . | 5 0.00006174 0.03064 | . |* . | 6 0.00008917 0.04425 | . |* . | 7 -0.0001315 -0.06527 | . *| . | 8 0.0000188 0.00933 | . | . | 9 0.00032781 0.16268 | . |***. | 10 -0.0000966 -0.04794 | . *| . | 11 0.00014118 0.07007 . |* . | 12 -0.0008188 -0.40633 | ********| . | 13 0.00031546 0.15655 | . |*** . | 14 -0.0000898 -0.04457 | . *| . | 15 0.00028601 0.14194 | . |*** . | 16 -0.000294 -0.14590 | . ***| . | 17 0.00018672 0.09266 | . |** . | 18 -0.0000634 -0.03145 | . *| . | 19 0.00010845 0.05382 | . |* . | 20 -0.0002755 -0.13673 | . ***| . | 21 0.00006769 0.03359 | . |* . | 22 -0.0001636 -0.08119 | . **| . | 23 0.00044341 0.22005 | . |****. | 24 -0.0000687 -0.03409 | . *| . | "." marks two standard errors
12
Example 3.2
• Figure 3.3 shows the monthly values of the number of people (X
t) in Wisconsin employed in trade from 1961 to 1975. No predifferencing transformation appears to be necessary.
13
Figure 3.3Figure 3.3
Number of employees (in thousands), 1961-1975
220
240
260
280
300
320
340
360
380
400
Mar-60 Dec-62 Sep-65 Jun-68 Mar-71 Dec-73 Aug-76
no. of employees
14
• Next, let’s examine the SAC’s of Xt, its first difference at the non-seasonal level, at the seasonal level and at both the seasonal and non-seasonal levels.
15
ARIMA Procedure Name of variable = X. Mean of working series = 307.5584 Standard deviation = 46.62852 Number of observations = 178 Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 2174.219 1.00000 | |********************| 1 2111.301 0.97106 | . |******************* | 2 2046.143 0.94109 | . |******************* | 3 1990.467 0.91549 | . |****************** | 4 1953.651 0.89855 | . |****************** | 5 1923.082 0.88449 | . |****************** | 6 1894.387 0.87130 | . |***************** | 7 1857.165 0.85418 | . |***************** | 8 1822.990 0.83846 | . |***************** | 9 1795.368 0.82575 | . |***************** | 10 1781.604 0.81942 | . |**************** | 11 1766.588 0.81252 | . |**************** | 12 1754.960 0.80717 | . |**************** | 13 1689.253 0.77695 | . |**************** | 14 1622.604 0.74629 | . |*************** | 15 1565.605 0.72008 | . |************** | 16 1526.444 0.70207 | . |************** | 17 1493.548 0.68694 | . |**************. | 18 1462.579 0.67269 | . |************* . | 19 1424.437 0.65515 | . |************* . | 20 1390.875 0.63971 | . |************* . | 21 1363.633 0.62718 | . |************* . | 22 1347.737 0.61987 | . |************ . | 23 1328.662 0.61110 | . |************ . | 24 1312.463 0.60365 | . |************ . | "." marks two standard errors
16
ARIMA Procedure Name of variable = X. Period(s) of Differencing = 1. Mean of working series = 0.902825 Standard deviation = 7.210001 Number of observations = 177 NOTE: The first observation was eliminated by differencing. Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 51.984116 1.00000 | |********************| 1 1.341360 0.02580 | . |* . | 2 -10.104648 -0.19438 | ****| . | 3 -16.397040 -0.31542 | ******| . | 4 -6.537721 -0.12576 | ***| . | 5 0.720104 0.01385 | . | . | 6 11.646511 0.22404 | . |**** | 7 0.382655 0.00736 | . | . | 8 -5.583873 -0.10741 | . **| . | 9 -15.804044 -0.30402 | ******| . | 10 -9.291756 -0.17874 | ****| . | 11 2.139864 0.04116 | . |* . | 12 46.868231 0.90159 | . |****************** | 13 0.801322 0.01541 | . | . | 14 -9.690318 -0.18641 | .****| . | 15 -15.285807 -0.29405 | ******| . | 16 -6.236594 -0.11997 | . **| . | 17 0.881801 0.01696 | . | . | 18 10.680823 0.20546 | . |**** . | 19 0.496121 0.00954 | . | . | 20 -4.968756 -0.09558 | . **| . | 21 -14.320935 -0.27549 | ******| . | 22 -8.286359 -0.15940 | . ***| . | 23 1.685671 0.03243 | . |* . | 24 42.361435 0.81489 | . |**************** | "." marks two standard errors
17
ARIMA Procedure Name of variable = X. Period(s) of Differencing = 12. Mean of working series = 10.3759 Standard deviation = 5.005722 Number of observations = 166 NOTE: The first 12 observations were eliminated by differencing. Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 25.057251 1.00000 | |********************| 1 23.551046 0.93989 | . |******************* | 2 21.750363 0.86803 | . |***************** | 3 19.984942 0.79757 | . |**************** | 4 18.383410 0.73366 | . |*************** | 5 17.031926 0.67972 | . |************** | 6 15.647808 0.62448 | . |************ | 7 14.141135 0.56435 | . |*********** | 8 12.707374 0.50713 | . |********** | 9 11.123315 0.44392 | . |*********. | 10 9.421701 0.37601 | . |******** . | 11 7.755107 0.30950 | . |****** . | 12 6.024674 0.24044 | . |***** . | 13 5.018099 0.20027 | . |**** . | 14 4.119250 0.16439 | . |*** . | 15 3.165849 0.12634 | . |*** . | 16 2.245328 0.08961 | . |** . | 17 1.057665 0.04221 | . |* . | 18 -0.103884 -0.00415 | . | . | 19 -0.936067 -0.03736 | . *| . | 20 -1.623877 -0.06481 | . *| . | 21 -2.257332 -0.09009 | . **| . | 22 -2.941722 -0.11740 | . **| . | 23 -3.670260 -0.14647 | . ***| . | 24 -4.472118 -0.17848 | . ****| . | "." marks two standard errors
18
ARIMA Procedure Name of variable = X. Period(s) of Differencing = 1,12. Mean of working series = 0.087273 Standard deviation = 1.438735 Number of observations = 165 NOTE: The first 13 observations were eliminated by differencing. Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 2.069959 1.00000 | |********************| 1 0.380397 0.18377 | . |**** | 2 -0.056837 -0.02746 | . *| . | 3 -0.021478 -0.01038 | . | . | 4 -0.290834 -0.14050 | ***| . | 5 -0.0045074 -0.00218 | . | . | 6 0.200142 0.09669 | . |**. | 7 0.041474 0.02004 | . | . | 8 0.187094 0.09039 | . |**. | 9 0.197702 0.09551 | . |**. | 10 0.0004563 0.00022 | . | . | 11 -0.144889 -0.07000 | . *| . | 12 -0.572732 -0.27669 | ******| . | 13 -0.200208 -0.09672 | . **| . | 14 0.056730 0.02741 | . |* . | 15 0.0061858 0.00299 | . | . | 16 0.287759 0.13902 | . |***. | 17 0.049923 0.02412 | . | . | 18 -0.209991 -0.10145 | . **| . | 19 -0.198252 -0.09578 | . **| . | 20 -0.113819 -0.05499 | . *| . | 21 -0.039443 -0.01906 | . | . | 22 -0.039793 -0.01922 | . | . | 23 0.106062 0.05124 | . |* . | 24 -0.165247 -0.07983 | . **| . | "." marks two standard errors
19
Notations
Now, suppose that is a pre-differencing transformed series, the general stationarity transformation is:
where B is the lag (backward shift) operator, D is the degree of seasonal differencing and d is the degree of non-seasonal differencing.
*ty
*
*
)1()1( tdDL
tdD
Lt
yBB
yZ
20
3.3 Estimation and Diagnostic Checking
The general seasonal Box-Jenkins model can be written in the form,
p(B)p(BL)Zt = δ+θq(B)Q(BL)t
where
p(B) = (1 1B 2B2 … pBp)
is the non-seasonal autoregressive operator of order p,
p(BL) = (1 1,LBL 2,LB2L … p,LBpL)
is the seasonal autoregressive operator of order P,
q(B) = (1 1B 2B2 … pBq)
is the non-seasonal moving average operator of order q,
Q(BL) = (1 1,LBL 2,LB2L … Q,LBQL)
is the seasonal moving average operator of order Q,
= p(B)P(BL)
The ARIMA notation is usually written as ARIMA (p, d, q) (P, D, Q)L.
21
Identification of the order p, q, P and Q are basically the same as in non-seasonal Box-Jenkins models. The following table provides some guidelines for choosing non-seasonal and seasonal operators
22
23
• Estimation is usually carried out using maximum likelihood, as in the case of non-seasonal Box-Jenkins analysis.
• As an example, consider the SPAC of the time series of example 3.2, after first difference at both seasonal and non-seasonal levels.
24
Partial Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.18377 | . |**** | 2 -0.06337 | . *| . | 3 0.00689 | . | . | 4 -0.14706 | ***| . | 5 0.05599 | . |* . | 6 0.07694 | . |**. | 7 -0.00961 | . | . | 8 0.08102 | . |**. | 9 0.07084 | . |* . | 10 0.00110 | . | . | 11 -0.07361 | . *| . | 12 -0.25948 | *****| . | 13 0.01555 | . | . | 14 0.00615 | . | . | 15 -0.03042 | . *| . | 16 0.09184 | . |**. | 17 -0.01900 | . | . | 18 -0.04354 | . *| . | 19 -0.08056 | .**| . | 20 0.03107 | . |* . | 21 0.03687 | . |* . | 22 -0.06723 | . *| . | 23 0.03476 | . |* . | 24 -0.18999 | ****| . |
25
• At the non-seasonal level, both the SAC and SPAC appear to have a significant spike at lag 1 and cuts off after lag 1.
• One can tentatively identify an AR(1), MA(1) or ARMA(1, 1) models for the non-seasonal part of the series.
• At the seasonal level, the SPAL appears to be dying down, while the SAC cuts off after lag 12. Hence a seasonal MA(1) model is identified.
• Combining both the seasonal & non-seasonal levels, we have the following tentative models:
ARIMA(1, 1, 0) (0, 1, 1)12,ARIMA(0, 1, 1) (0, 1, 1)12,ARIMA(1, 1, 1) (0, 1, 1)12
26
• The SAS program for estimating these models is as follows:
• data employ;• input @7 x;• cards;• 239.6• 236.4• 236.8• 241.5• etc.• ;• proc arima data=employ;• identify var=x(1,12);• estimate p=1 q=(12) printall plot method=ml;• estimate q=(1) (12) printall plot method=ml;• estimate p=1 q=(1) (12) printall plot method=ml;• run;
27
Estimation results of an modelEstimation results of an model12(1,1,0)(0,1,1)ARIMA
Maximum Likelihood Estimation Approx. Parameter Estimate Std Error T Ratio Lag MU 0.07694 0.07647 1.01 0 MA1,1 0.41307 0.07493 5.51 12 AR1,1 0.16005 0.07702 2.08 1 Constant Estimate = 0.0646288 Variance Estimate = 1.80012263 Std Error Estimate = 1.34168649 AIC = 570.489073 SBC = 579.80691 Number of Residuals= 165 Autocorrelation Check of Residuals To Chi Autocorrelations Lag Square DF Prob 6 4.42 4 0.352 0.011 -0.048 0.039 -0.103 -0.001 0.106 12 7.61 10 0.667 -0.059 0.024 0.100 -0.020 0.013 0.059 18 11.92 16 0.750 -0.085 0.050 -0.035 0.082 0.041 -0.063 24 19.71 22 0.601 -0.146 -0.046 0.027 -0.020 0.103 -0.075 30 24.38 28 0.662 -0.090 0.061 -0.064 -0.044 0.068 -0.030
28
Estimation results of an modelEstimation results of an model12(0,1,1)(0,1,1)ARIMA
Maximum Likelihood Estimation Approx. Parameter Estimate Std Error T Ratio Lag MU 0.07723 0.07570 1.02 0 MA1,1 -0.17261 0.07695 -2.24 1 MA2,1 0.40941 0.07502 5.46 12 Constant Estimate = 0.07723315 Variance Estimate = 1.79723527 Std Error Estimate = 1.34061004 AIC = 570.185014 SBC = 579.50285 Number of Residuals= 165 Autocorrelation Check of Residuals To Chi Autocorrelations Lag Square DF Prob 6 4.04 4 0.400 0.000 -0.025 0.040 -0.101 -0.002 0.105 12 7.15 10 0.711 -0.058 0.026 0.098 -0.020 0.013 0.058 18 11.57 16 0.773 -0.087 0.055 -0.039 0.083 0.036 -0.061 24 19.20 22 0.633 -0.142 -0.046 0.027 -0.026 0.102 -0.076 30 23.68 28 0.698 -0.089 0.059 -0.065 -0.043 0.063 -0.031
29
Estimation results of an modelEstimation results of an model Maximum Likelihood Estimation Approx. Parameter Estimate Std Error T Ratio Lag MU 0.07778 0.07374 1.05 0 MA1,1 -0.45907 0.37080 -1.24 1 MA2,1 0.40276 0.07533 5.35 12 AR1,1 -0.29315 0.39756 -0.74 1 Constant Estimate = 0.10058242 Variance Estimate = 1.80600827 Std Error Estimate = 1.34387807 AIC = 571.896463 SBC = 584.320245 Number of Residuals= 165 Autocorrelation Check of Residuals To Chi Autocorrelations Lag Square DF Prob 6 3.19 3 0.364 0.010 0.019 0.015 -0.087 -0.005 0.101 12 5.88 9 0.752 -0.050 0.031 0.093 -0.011 0.009 0.054 18 10.55 15 0.784 -0.089 0.059 -0.042 0.089 0.027 -0.060 24 18.00 21 0.649 -0.140 -0.053 0.025 -0.028 0.097 -0.076 30 21.80 27 0.747 -0.085 0.048 -0.064 -0.039 0.052 -0.033
12(1,1,1)(0,1,1)ARIMA
30
Diagnostic checking is conducted using the Ljung-Box-Pierce Statistic
where n is number of observations available after differencing.
)()2(1
*
nrnnQk z
e