problemsfromchapter3ofshumwayand stoffer’sbook · university of utah department of mathematics...
TRANSCRIPT
UNIVERSITY OF UTAH
GUIDED READING
TIME SERIES
Problems from Chapter 3 of Shumway andStoffer’s Book
Author:Curtis MILLER
Supervisor:Prof. Lajos HORVATH
November 10, 2015
UNIVERSITY OF UTAH DEPARTMENT OF MATHEMATICS
ARIMA Models
Curtis Miller
November 10, 2015
1 ESTIMATION
1.1 AR(2) MODEL FOR cmort
To estimate the AR(2) process, I first use ordinary least squares (OLS). I then use the Yule-Walker estimate. This is shown in the R code below:
# OLS esimate# demean = T results in looking at cmort - mean(cmort)# intercept = F sets the intercept to 0cmort.ar2.ols <- ar.ols(cmort, order = 2, demean = T)
# Yule-Walker estimatecmort.ar2.yw <- ar.yw(cmort, order = 2, demean = T)
1.1.1 PARAMETER ESTIMATE COMPARISON
# OLS estimatecmort.ar2.ols
#### Call:## ar.ols(x = cmort, order.max = 2, demean = T)##
2
## Coefficients:## 1 2## 0.4286 0.4418#### Intercept: -0.04672 (0.2527)#### Order selected 2 sigma^2 estimated as 32.32
# Yule-Walker estimatecmort.ar2.yw
#### Call:## ar.yw.default(x = cmort, order.max = 2, demean = T)#### Coefficients:## 1 2## 0.4339 0.4376#### Order selected 2 sigma^2 estimated as 32.84
Looking at the coefficients of the AR(2) model estimated using the two methods, I see verylittle difference. OLS and Yule-Walker estimation produce similar results.
1.1.2 STANDARD ERROR COMPARISON
# The standard error of the OLS estimatescmort.ar2.ols$asy.se.coef$ar
## [1] 0.03979433 0.03976163
# The variance matrix of the Yule-Walker estimatescmort.ar2.yw$asy.var.coef
## [,1] [,2]## [1,] 0.001601043 -0.001235314## [2,] -0.001235314 0.001601043
# Corresponding standard error of both parameterssqrt(cmort.ar2.yw$asy.var.coef[1,1])
## [1] 0.04001303
Looking at the above R output, it appears that both models have the same standard errorfor the parameters.
3
1.2 AR(1) SIMULATION AND ESTIMATION
First I generate the data:
ar1.sim <- arima.sim(n = 50, list(ar = c(.99), sd = c(1)))
I first estimate the parameter from the simulation using the Yule-Walker estimate.
ar1.sim.yw <- ar.yw(ar1.sim, order = 1)# Model estimatesar1.sim.yw
#### Call:## ar.yw.default(x = ar1.sim, order.max = 1)#### Coefficients:## 1## 0.8946#### Order selected 1 sigma^2 estimated as 1.144
# Model covariance matrixar1.sim.yw$asy.var.coef
## [,1]## [1,] 0.004158676
Here, I would perform inference on the model by assuming it is Normally distributed. Iwould use the covariance matrix listed above for estimating the standard error.
Bootstrap results in R could be done as follows:
tsboot(ar1.sim, function(d) {return(ar.yw(d, order = 1)$ar)
}, R = 2000)
#### MODEL BASED BOOTSTRAP FOR TIME SERIES###### Call:## tsboot(tseries = ar1.sim, statistic = function(d) {## return(ar.yw(d, order = 1)$ar)## }, R = 2000)##
4
#### Bootstrap Statistics :## original bias std. error## t1* 0.8946416 0 0
The bootstrap standard error is zero, while the theoretical standard error is non-zero.
2 INTEGRATED MODELS FOR NONSTATIONARY DATA
2.1 EWMA MODEL FOR GLACIAL VARVE DATA
Here I am interested in the varve dataset. In fact, I am interested in analyzing log(var ve),since I believe this may actually be a stationary process. I will be estimating a EWMA modelfor this data.
logvarve <- log(varve)# EWMA for logvarve with lambda = .25logvarve.ima.25 <- HoltWinters(logvarve[1:100],
alpha = 1 - .25,beta = FALSE,gamma = FALSE)
logvarve.ima.5 <- HoltWinters(logvarve[1:100],alpha = 1 - .5,beta = FALSE,gamma = FALSE)
logvarve.ima.75 <- HoltWinters(logvarve[1:100],alpha = 1 - .75,beta = FALSE,gamma = FALSE)
# Plotting resultspar(mfrow = c(3,1))plot(logvarve.ima.25, main = "EWMA Fit with Lambda = .25")plot(logvarve.ima.5, main = "EWMA Fit with Lambda = .5")plot(logvarve.ima.75, main = "EWMA Fit with Lambda = .75")
The results are shown in 2.1. With a small smoothing parameter (λ), the results are verysensetive to the immediate past, while a high smoothing parameter leads to more stable pre-dictions.
3 BUILDING ARIMA MODELS
5
EWMA Fit with Lambda = .25
Time
Obs
erve
d / F
itted
0 20 40 60 80 100
1.5
2.5
3.5
EWMA Fit with Lambda = .5
Time
Obs
erve
d / F
itted
0 20 40 60 80 100
1.5
2.5
3.5
EWMA Fit with Lambda = .75
Time
Obs
erve
d / F
itted
0 20 40 60 80 100
1.5
2.5
3.5
Figure 2.1: EWMA fit for different smoothing parameters
6
3.1 AR(1) MODEL FOR GNP DATA
Here I am investigating how well an AR(1) (or, more exactly, an ARIMA(1,1,0)) model fits thenatural log of U.S. GNP data. I estimate this ARIMA model.
gnpgr = diff(log(gnp)) # growth rate of GNPgnp.model <- sarima(gnpgr, 1, 0, 0, details = F) # AR(1) model fit
I see disturbing trends in the diagnostic plots shown in Figure 3.1. The residual plot shouldlook like white noise, but I see the variance decreasing as the year increases. The ACF isfine, but the Q-Q plot suggests non-normality. Fortunately, the ACF and p-values for Ljung-Box statistic look as they should be. Still, other models (probably ones that do not assumeGaussian white noise) may be better.
3.2 FITTING CRUDE OIL PRICES WITH AN ARIMA(p,d , q) MODEL
My objective is to fit an ARIMA(p,d , q) model for the oil dataset. I start by examining thedata:
# Prepare layoutold.par <- par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(4, 1),
cex.axis = .75)
plot(oil, xaxt = 'n'); mtext(text = "Oil Price", side = 2, line = 2,cex = .75)
plot(log(oil), xaxt = 'n'); mtext(text = "Natural Logarithm of Oil Price",side = 2, line = 2, cex = .75)
plot(diff(oil), xaxt = 'n'); mtext(text = "First Difference in Oil Price",side = 2, line = 2, cex = .75)
plot(diff(log(oil))); mtext(text = "Percent Change in Oil Price",side = 2, line = 2, cex = .75)
The first plot in Figure 3.2 shows that oil prices clearly are not a stationary process, andit appears that the variance of the process increases with time. Taking the natural log of oilprices helps control the increasing variability, but not the nonstationary behavior of the se-ries. When looking at the change in oil price from one period to the next, I do see a processthat looks more stationary, but the nonconstant variance is not removed. The final attemptis to look at the differences in the natural log of oil prices (which can be interpreted as thepercentage change in oil prices). This appears to be stationary and with a mostly constantvariance. However, there are large deviations around 2009, and even prior, that would leadone to conclude that the white noise is not Gaussian, which threatens estimation and infer-ence.
I now look at the ACF and PACF of ∆ log(oilt ):
7
Standardized Residuals
Time
1950 1960 1970 1980 1990 2000
−3
−1
01
23
4
1 2 3 4 5 6
−0.
20.
00.
20.
4
ACF of Residuals
LAG
AC
F
−3 −2 −1 0 1 2 3
−3
−1
01
23
4
Normal Q−Q Plot of Std Residuals
Theoretical Quantiles
Sam
ple
Qua
ntile
s
5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
p values for Ljung−Box statistic
lag
p va
lue
Figure 3.1: Diagnostic plots for the AR(1) model
8
Time
oil
2040
6080
100
120
140
Oil
Pric
e
Time
log(
oil)
3.0
3.5
4.0
4.5
5.0
Nat
ural
Log
arith
m o
f Oil
Pric
e
Time
diff(
oil)
−15
−10
−5
05
1015
Firs
t Diff
eren
ce in
Oil
Pric
e
Time
diff(
log(
oil))
2000 2002 2004 2006 2008 2010
−0.
2−
0.1
0.0
0.1
0.2
Per
cent
Cha
nge
in O
il P
rice
Figure 3.2: Basic plots of the oil series
9
par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(2, 1),cex.axis = .75)
acf(diff(log(oil)), xaxt = 'n'); mtext(text = "Sample ACF",side = 2, line = 2)
pacf(diff(log(oil))); mtext(text = "Sample PACF", side = 2,line = 2)
When looking at the sample PACF in Figure 3.3, I see that the PACF is nonzero as far out aseight lags, which may suggest that p = 8+ 1 = 9, or that we should consider lagging the ARterm out to as far as nine lags.
# noquote(capture.output(), "") used only to make presentation# easierwrite(capture.output(sarima(log(oil),
p = 9, d = 1, q = 0))[32:38],"")
#### Coefficients:## ar1 ar2 ar3 ar4 ar5 ar6## 0.1678 -0.1189 0.1844 -0.0713 0.0486 -0.0715## s.e. 0.0429 0.0432 0.0436 0.0442 0.0444 0.0443## ar7 ar8 ar9 constant## -0.0158 0.1135 0.0525 0.0017
The ninth lag does not appear statistically significant, so I drop the number of lags down toeight. I now use the following ARIMA model (with diagnostic plots shown):
oil.model <- sarima(log(oil), p = 8, d = 1, q = 0, details = F)
write(capture.output(oil.model)[8:14], "")
## Coefficients:## ar1 ar2 ar3 ar4 ar5 ar6## 0.1742 -0.1200 0.1814 -0.0689 0.0448 -0.0621## s.e. 0.0426 0.0433 0.0436 0.0442 0.0443 0.0437## ar7 ar8 constant## -0.0218 0.1224 0.0017## s.e. 0.0435 0.0428 0.0026
The residuals clearly do not appear to be Gaussian; there are large price movements thatmake this assumption doubtful, and the Q-Q plot does not support the Normality assump-tion. The ACF of the residuals can get large for some distant lags but otherwise are within theband of reasonable values. The p-values of the Ljung-Box statistics suggest that we do nothave dependence in our residuals for large lags. This may be the best fit an ARIMA model canprovide.
10
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
FS
ampl
e A
CF
0.0 0.1 0.2 0.3 0.4 0.5
−0.
10−
0.05
0.00
0.05
0.10
0.15
Lag
Par
tial A
CF
Series diff(log(oil))
Sam
ple
PAC
F
Figure 3.3: Sample ACF and PACF for percentage change in oil price
11
Standardized Residuals
Time
2000 2002 2004 2006 2008 2010
−4
−2
02
46
0.0 0.1 0.2 0.3 0.4 0.5 0.6
−0.
20.
00.
20.
4
ACF of Residuals
LAG
AC
F
−3 −2 −1 0 1 2 3
−4
−2
02
46
Normal Q−Q Plot of Std Residuals
Theoretical Quantiles
Sam
ple
Qua
ntile
s
10 12 14 16 18 20
0.0
0.2
0.4
0.6
0.8
1.0
p values for Ljung−Box statistic
lag
p va
lue
Figure 3.4: Diagnostic plots for the ARIMA(8,1,0) model for the log(oil) series
12
4 REGRESSION WITH AUTOCORRELATED ERRORS
4.1 MONTHLY SALES DATA
4.1.1 ARIMA MODEL FITTING
The problem first asks for an ARIMA model for the sales data series. I first plot the series.
par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(3, 1),cex.axis = .75)
plot(sales, xaxt = 'n'); mtext(text = "Sales", side = 2, line = 2,cex = .75)
plot(diff(sales), xaxt = 'n')mtext(text = "First Order Difference in Sales",
side = 2, line = 2, cex = .75)plot(diff(diff(sales))); mtext(text = "Second Order Difference in Sales",
side = 2, line = 2, cex = .75)
Figure 4.1 shows the plots of the sales series. Clearly, salest is not stationary. Surpris-ingly, neither is ∆salest ; this series shows periodicity. It takes a second-order differencing,∆ (∆salest ), to find a stationary series.
I next examine the ACF and PACF functions to try and identify the order of the AR and MAterms.
par(mar = c(0, 0, 0, 0), oma = c(4, 4, 1, 1), mfrow = c(2, 1),cex.axis = .75)
acf(diff(diff(sales)), xaxt = 'n'); mtext(text = "Sample ACF",side = 2, line = 2)
pacf(diff(diff(sales))); mtext(text = "Sample PACF", side = 2,line = 2)
As shown in Figure 4.2, the sample ACF cuts off after one lag and the sample PACF appearsto be trailing off, so I believe that an ARIMA(0,2,1) should provide a good fit for the data.
par(old.par)oil.model <- sarima(sales, p = 0, d = 2, q = 1, details = F)
write(capture.output(oil.model)[8:14], "")
## Coefficients:## ma1## -0.7480## s.e. 0.0662#### sigma^2 estimated as 1.866: log likelihood = -256.57, aic = 517.14
13
Time
sale
s
200
210
220
230
240
250
260
Sal
es
Time
diff(
sale
s)
−2
02
4
Firs
t Ord
er D
iffer
ence
in S
ales
Time
diff(
diff(
sale
s))
0 50 100 150
−4
−2
02
Sec
ond
Ord
er D
iffer
ence
in S
ales
Figure 4.1: Basic plots of the sales series
14
−0.
50.
00.
51.
0
Lag
AC
FS
ampl
e A
CF
5 10 15 20
−0.
5−
0.4
−0.
3−
0.2
−0.
10.
00.
1
Lag
Par
tial A
CF
Series diff(diff(sales))
Sam
ple
PAC
F
Figure 4.2: Sample ACF and PACF for second order difference in sales
15
Standardized Residuals
Time
0 50 100 150
−3
−2
−1
01
23
5 10 15 20
−0.
20.
00.
20.
4
ACF of Residuals
LAG
AC
F
−2 −1 0 1 2
−3
−2
−1
01
23
Normal Q−Q Plot of Std Residuals
Theoretical Quantiles
Sam
ple
Qua
ntile
s
5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
p values for Ljung−Box statistic
lag
p va
lue
Figure 4.3: Diagnostic plots for the ARIMA(0,2,1) model for the sales series
16
Looking at the diagnostic plots in Figure 4.3, the ARIMA(0,2,1) seems to fit well. The errorterms appear Gaussian, there are no strong autocorrelations in the residuals, and the errorterms do not appear to be dependent.
4.1.2 RELATIONSHIP BETWEEN sales AND lead
I examine the CCF of sales and lead and a lag plot of ∆salest and ∆leadt−3 to determine if aregression involving these variables is reasonable.
ccf(diff(sales), diff(lead), main = "CCF of sales and lead")
As seen in Figure 4.4, while sales and lead are often uncorrelated, around lag 3 they becomehighly correlated. This fact is emphasized by a lag plot.
lag2.plot(lead, sales, max.lag = 3)
Figure 4.5 shows a linear relationship with a third lag of lead and contemporary sales. Thiswould justify regressing ∆salest on ∆leadt−3.
4.1.3 REGRESSION WITH ARMA ERRORS
Given that the variable lead seems to provide useful information about sales, I try to regresssales on lead. More specifically, I try to regress ∆salest on ∆leadt−3, while viewing the errorterm as being some unknown ARMA process.
saleslead <- ts.intersect(diff(sales), lag(diff(lead), k = -3))salesnew <- saleslead[,1]leadnew <- saleslead[,2]fit <- lm(salesnew ~ leadnew)acf2(resid(fit))
## ACF PACF## [1,] 0.59 0.59## [2,] 0.40 0.09## [3,] 0.34 0.11## [4,] 0.31 0.10## [5,] 0.23 -0.02## [6,] 0.15 -0.04## [7,] 0.13 0.03## [8,] 0.13 0.03## [9,] 0.01 -0.15## [10,] 0.02 0.07## [11,] 0.09 0.10
17
−15 −10 −5 0 5 10 15
−0.
4−
0.2
0.0
0.2
0.4
0.6
Lag
AC
F
CCF of sales and lead
Figure 4.4: CCF of sales and lead
18
10 11 12 13 14
200
210
220
230
240
250
260
lead(t−0)
sale
s(t)
0.95
10 11 12 13 14
200
210
220
230
240
250
260
lead(t−1)
sale
s(t)
0.95
10 11 12 13 14
200
210
220
230
240
250
260
lead(t−2)
sale
s(t)
0.94
10 11 12 13 14
200
210
220
230
240
250
260
lead(t−3)
sale
s(t)
0.94
Figure 4.5: Lag plot of sales and lead
19
5 10 15 20
−0.
20.
20.
6
Series: resid(fit)
LAG
AC
F
5 10 15 20
−0.
20.
20.
6
LAG
PAC
F
Figure 4.6: Sample ACF and PACF for residuals from linear fit
20
## [12,] 0.01 -0.13## [13,] -0.01 0.03## [14,] -0.07 -0.09## [15,] -0.07 -0.04## [16,] -0.02 0.09## [17,] -0.05 -0.03## [18,] -0.03 0.02## [19,] 0.04 0.11## [20,] 0.05 0.03## [21,] 0.02 -0.07## [22,] 0.00 -0.01## [23,] -0.01 -0.04
Figure 4.6 shows the ACF and the PACF of the residuals of the "naïve" fit. The PACF cuts offat 1 and the ACF trails off, so this appears to be an AR(1) process.
arima.fit <- sarima(salesnew, 1, 0, 0, xreg=cbind(leadnew), details = F)
As shown in Figure 4.7, the diagnostic plots for the process, when interpreting the errorterms as an ARMA(1,0) process, look very good. Normality of the white noise residuals, theACF of the white noise residuals, and the tests of dependence all show desirable properties.
stargazer(arima.fit$fit,covariate.labels = c("$\\phi$", "Intercept",
"$\\Delta \\text{lead}_{t-3}$"),dep.var.labels = c("$\\Delta \\text{sales}_t$"),label = "tab:prob35h",
title = "Coefficients of the model for $\\Delta \\text{sales}_t$",table.placement = "ht")
Table 4.1 shows the estimates of the coefficients of the model. The AR(1) term (φ) is statis-tically significant and so is the intercept and the coefficient of the ∆leadt−3 term.
5 MULTIPLICATIVE SEASONAL ARIMA MODELS
5.1 ACF OF AN ARIMA(p,d , q)× (P,D,Q)s MODEL
The problem asks for a plot of the theoretical ACF of an ARIMA(1,0,0)×(0,0,1)12 model, withΦ= 0.8 and θ = 0.5. This model is:
xt = .8xt−12 +wt + .5wt−1 (5.1)
The ACF is computed and plotted below:
21
Standardized Residuals
Time
0 50 100 150
−3
−2
−1
01
2
5 10 15 20
−0.
20.
00.
20.
4
ACF of Residuals
LAG
AC
F
−2 −1 0 1 2
−3
−2
−1
01
2
Normal Q−Q Plot of Std Residuals
Theoretical Quantiles
Sam
ple
Qua
ntile
s
5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
p values for Ljung−Box statistic
lag
p va
lue
Figure 4.7: Diagnostic plots for the model for the sales series with ARMA(1,0) error terms
22
Table 4.1: Coefficients of the model for ∆salest
Dependent variable:
∆salest
φ 0.645∗∗∗
(0.063)
Intercept 0.362∗∗
(0.177)
∆leadt−3 2.788∗∗∗
(0.143)
Observations 146Log Likelihood −168.717σ2 0.588Akaike Inf. Crit. 345.433
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
ACF <- ARMAacf(ar = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .8),ma = c(.5))
plot(ACF, type = "h", xlab = "lag",xlim = c(1, 15), ylim = c(-.5,1)); abline(h=0)
Figure 5.1 shows the theoretical ACF of the process.
23
2 4 6 8 10 12 14
−0.
50.
00.
51.
0
lag
AC
F
Figure 5.1: ACF of a seasonal ARIMA process
24