intervention models something’s happened around t = 200
TRANSCRIPT
Intervention models
Something’s happened around t = 200
The first example
Seems like a series that is generally stationary, but shifts level around t = 200.
Look separately at the parts before and after the level shift.
There are in total 400 time-points. Select the first 190 and the last 190
First 190 values
Could be an AR(1) or an MA(1) or an ARMA(1,1). Quite clearly stationary!
Last 190 values
Points more towards an ARMA(1,1)
The change in level would most probably be modelled using a step function
2000
2001200
t
tSt
A complete intervention model for the times series can therefore be
ttt eB
BSY
part ARMA(1,1) The
1
10 1
1200
since there seems to be a permanent immediate constant change in levelat t = 200
How can this model be fitted using R?
strange.model <-arimax(strange,order=c(1,0,1), xtransf=data.frame(step200=1*(seq(strange)>=200)), transfer=list(c(0,0)))
The arimax command works like the arima command, but allows inclusion of covariates.
The argument xtransf is followed by a data frame in which each column correspond to a covariate time series (same number of observations as Yt ).
Here this data frame is constructed with the command 1*(seq(strange)>=200)
The command seq(strange) returns the indices of the vector strange
The command seq(strange)>=200 returns a vector (with the same length as strange in which a term is FALSE if the corresponding index of strange is less than 200 and TRUE otherwise.
Finally, the multiplication with 1 transforms FALSE into 0 and TRUE into 1 and the variable in the data frame is also given the name step200 (for convenience)Hence, the resulting column is a step function of the kind we want.
The argument transfer is followed by a list comprising one two-dimensional vector for each covariate specified by xtransf
Here we have the argument list(c(0,0)) implying that the covariate shall be included as it stands (no lagging, no filtering). Note that the argument must always be followed by a list (even if there is only one covariate).
Giving an argument c(r,s) where both r and s are > 0 will enter the term
into the model.
Since we have specified c(0,0) the term included will be
tr
r
ss XBB
BB
1
1
1
1
2002001
1tt SS
print(strange.model)
Series: strange ARIMA(1,0,1) with non-zero mean
Coefficients: ar1 ma1 intercept step200-MA0 0.9824 -1.0000 10.0026 1.9958s.e. 0.0111 0.0064 0.0350 0.0606
sigma^2 estimated as 0.9826: log likelihood=-564.82AIC=1137.64 AICc=1137.79 BIC=1157.6
Thus, the estimated model is
ttt eB
BSY
9824.01
12009958.10026.10
tsdiag(strange.model)
Seems to be some autocorrelation left in the residuals. Try an ARMA(1,2)
strange.model2 <-arimax(strange,order=c(1,0,2), xtransf=data.frame(step200=1*(seq(strange)>=200)), transfer=list(c(0,0)))
print(strange.model2)
Series: strange ARIMA(1,0,2) with non-zero mean
Coefficients: ar1 ma1 ma2 intercept step200-MA0 0.9730 -0.7781 -0.2219 10.0012 1.9972s.e. 0.0133 0.0525 0.0521 0.0317 0.0557
sigma^2 estimated as 0.9406: log likelihood=-556.28AIC=1122.56 AICc=1122.77 BIC=1146.5
Coefficients seem to be significantly different from zero (divided by s.e. and compare with 2)Log-likelihood slightly higher.
tsdiag(strange.model2)
Clear improvement!
plot(y=strange,x=seq(strange),type="l",xlab="Time")lines(y=fitted(strange.model),x=seq(strange),col="blue", lwd=2)lines(y=fitted(strange.model2),x=seq(strange),col="red", lwd=1)legend("bottomright",legend=c("original","model1","model2"),col=c("black","blue","red"),lty=1,lwd=c(1,2,1))
Model 2 (ARMA(1,2) is less smooth, but may follow the correlation structure better. However, this cannot be clearly seen from the plot.
The second example
Seems like a series that is from the beginning stationary, but gets a linear drift (upward trend) around t = 200.
Look at the part before .
There are in total 400 time-points. Select the first 200.
First 200 values
Looks (again) like an ARMA(1,1)
eacf(strange[1:200])
AR/MA 0 1 2 3 4 5 6 7 8 9 10 11 12 130 x o o o o o o o o o o o o o 1 o o o o o o o o o o o o o o 2 x o o o o o o o o o o o o o 3 x x x o o o o o o o o o o o 4 o x x o o o o o o o o o o o 5 o x x o o o o o o o o o o o 6 x x o o o o o o o o o o o o 7 x x o o o o o o o o o o o o
The drift in level could be modelled using a linearly increasing step function
2001 tS
B
B
A complete intervention model for the times series can therefore be
ttt eB
BS
B
BY
1
10 1
1200
1
The term
will be problematic to estimate.
However, the following holds
2001 tS
B
B
200200
2000200
1 tt
tS
B
Bt
Hence, create a covariate that is 0 until t = 200 and then 1, 2, …, 200and use it with transfer=list(c(0,0))
Alternatively, and more efficient is to include this variable as an ordinary explanatory variable (a regression predictor), using the argument xreg
strange_b.model <-arimax(strange_b,order=c(1,0,1), xreg=data.frame(x=c(rep(0,200),1:200)))
print(strange_b.model)
Call:arimax(x = strange_b, order = c(1, 0, 1), xreg = data.frame(x = c(rep(0, 200), 1:200)))
Coefficients: ar1 ma1 intercept x 0.1219 0.0382 9.9993 0.0192s.e. 0.3783 0.3827 0.0744 0.0009
sigma^2 estimated as 0.9884: log likelihood = -565.25, aic = 1138.5
Note! This can also be seen as a simple linear regression model with ARMA(1,1) error terms.
tsdiag(strange_b.model)
Satisfactory!
Transfer-function models
Consider the data set boardings referred to in Exercise 11.16
data(boardings)summary(boardings)
log.boardings log.price Min. :12.40 Min. :4.649 1st Qu.:12.49 1st Qu.:4.973 Median :12.53 Median :5.038 Mean :12.53 Mean :5.104 3rd Qu.:12.57 3rd Qu.:5.241 Max. :12.70 Max. :5.684
Two time-series, both with log-transformed values
plot.ts(boardings)
Could the price affect the boardings?
The cross-correlation function
functionn correlatio-Cross the
,,,
series stationaryFor
,, generalIn
, generalin not is ,but
,, Note!
,,,
stationary (weakly)both are and If
,,
:function covariance-Cross
,
tt
kkttk
kk
tktktt
tktktt
tktkttk
tt
stst
YVarXVar
YXYXCorrYX
YXYX
YXCovYXCov
XYCovYXCov
YXCovYXCovYX
YX
YXCovYX
…measures the degree of linear dependence between the two series
Sample cross-correlation function
22,
YYXX
YYXXYXr
tt
kttk
With R: ccf
For the boardings data set, we can try to calculate the cross-correlation function between the two series
ccf(boardings[,1],boardings[,2],main=”boardings & price”, ylab=”CCF”)
Typical look when at least one of the times series is non-stationary
Take first-order regular differences
diff_boardings<-diff(boardings[,1])diff_price<-diff(boardings[,2])ccf(diff_boardings,diff_price,ylab=”CCF”)
Still not satisfactory. Since we have monthly data, we should possibly try first-order seasonal differences as well.
diffs_boardings<-diff(diff_boardings,12)diffs_price<-diff(diff_price,12)ccf(diffs_boardings,diffs_price,ylab=”CCF”))
Better, but how do we interpret this plot?
The two significant spikes for negative lags says that the difference in price depends on the difference in boardings some months earlier.The significant spike at lag 6 says that the difference in boardings depends on the difference in price some months earlier.
What explains what?
A problem: Since both series would show autocorrelations, these are unevitably part of the cross-correlations we are estimating (cf. auto-correlation and partial auto-correlation).
To solve this we need to “remove” the autocorrelation in the two series before we investigate the cross-correlation.
We should estimate cross-correlations between residual series from modelling with ARMA-models
This procedure is known as pre-whitening
Normal procedure:
1.Find a suitable ARMA model for the (differenced) series that is assumed to constitute the covariate series.2.Fit this model to both series3.Investigate the cross-correlations between the residual series.
Could be anARMA(1,1,1,0)12
or anARMA(1,1,1,1)12
model1=arima(diffs_price,order=c(1,0,1),seasonal=list(order=c(1,0,0),lag=12))tsdiag(model1)
Could do!
model2=arima(diffs_price,order=c(1,0,1),seasonal=list(order=c(1,0,1),lag=12))tsdiag(model2)
Ljung-Box was not possible to do here!
Better!
model21=arima(diffs_boardings,order=c(1,0,1),seasonal=list(order=c(1,0,1),lag=12))
Applying the last model to the differenced boardings series
ccf(residuals(model2),residuals(model21),ylab="CCF")
Well, not that much cross-correlation left…
THE TSA package provide the command prewhiten with which prewhitening is made and the resulting CCF is plotted. The default set-up is that an AR model is fit to the covariate series (the first series specified.The AR model that minimizes AIC is chosen
The model can however be specified.
prewhiten(diffs_price,diffs_boardings,x.model=model2, ylab="CCF")
Should be the same as the manually developed CCF earlier
With the default settings
pw=prewhiten(diffs_price,diffs_boardings,ylab="CCF")
Picture is clearer?
No significant cross-correlations left
What AR model has been used?
print(pw)$ccf
Autocorrelations of series ‘X’, by lag
-1.0833 -1.0000 -0.9167 -0.8333 -0.7500 -0.6667 -0.5833 -0.5000 -0.4167 -0.3333 -0.2500 -0.1667 -0.0833 0.0000 0.0833 0.1667 0.131 0.057 -0.053 -0.167 -0.034 0.120 0.228 -0.129 -0.181 0.009 0.164 0.100 0.098 0.031 -0.065 0.019 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 0.9167 1.0000 1.0833 -0.023 -0.078 -0.349 0.027 -0.155 -0.225 0.041 -0.027 -0.167 -0.097 0.200
$model
Call:ar.ols(x = x)
Coefficients: 1 2 3 4 5 6 7 8 9 10 -0.2145 0.0361 -0.1226 -0.4786 -0.1827 0.1392 -0.0133 0.1616 -0.1462 0.1395
Intercept: 0.002233 (0.00302)
Order selected 10 sigma^2 estimated as 0.0004016
Check with a scatter plot
Reasonable that there is no significant cross-correlation
Another example
Observations of the input gas rate to a gas furnace and the percentage of carbon dioxide (CO2) in the output from the same furnace
stationary?
stationary?
Not that far from stationary. In that case an AR(2) would be the first choice.
However, we also try first-order regular differences
gasrate_diff< diff(gasrate)
gasrate series
More stationary than before?
CO2 series
Stationary.AR(2) ?
prewhiten(gasrate,CO2,ylab="CCF")