part ii: estimation and forecasting with arima models · h studya class of parametric univariate...
Post on 31-Aug-2019
12 Views
Preview:
TRANSCRIPT
Part II: Estimation and Forecasting with ARIMA models
Firmin Doko Tchatoka
firmin.dokotchatoka@adelaide.edu.au
https://www.adelaide.edu.au/directory/firmin.dokotchatoka
The University of AdelaideJuly 9, 2018
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 1 / 41
Targeted objectives of this module
Goals
H Study a class of parametric univariate time series models: ARIMA = AR/I/MA
− AR models: definition, stationarity, autocorrelation & partial
autocorrelation functions
− MA models: autocorrelation & partial autocorrelation functions
− General ARMA models
- Estimation
- Forecasting
I Forecasting SARIMA modelsFirmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 2 / 41
ARIMA Autoregressive (AR) models
ARIMA & AR processesDefinition
ARIMA = Autoregressive Integrated Moving Average. To understand ARIMA
processes, it is important to know the properties of the AR, I, and MA parts of it.
I A time series yt ∼ AR(p) ≡ linear regression of yt on a constant and first p lags
of yt :
yt = φ0 + φ1yt−1 + φ2yt−2 + . . . + φpyt−p + εt (1)
− εt is an error term ≡i.i.d.: E(εt) = 0 and E(ε2t ) = σ2
− Cannot use (1) to forecast yt+τ because φ’s and σ2 are unknown
− Lag length p is also unknown: we only have data y1, y2, . . . , yT
− AR(1): yt = φ0 + φ1yt−1 + εt , AR(2): yt = φ0 + φ1yt−1 + φ2yt−2 + εt
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 3 / 41
ARIMA Moving Average (MA) models
MA models
I A time series yt ∼ MA(q) if
yt = θ0 + εt + θ1εt−1 + θ2εt−2 + . . . + θqεt−q (2)
− εt is an error term ≡i.i.d. across t with E(εt) = 0 and E(ε2t ) = σ2
− Cannot use (2) to forecast yt+τ because both the parameters
θ0, θ1, θ2, . . . , θq, σ2 and current + lagged error terms are unknown⇒ must
be estimated
− Lag length q is also unknown⇒ must be estimated
− q = 1→ MA(1): yt = θ0 + εt + θ1εt−1 , p = 2→ MA(2):
yt = θ0 + εt + θ1εt−1 + θ2εt−2
− MA(q)≡ linear regression of yt on a constant and first q lags of error term:
cannot run OLS because independent (explanatory) variables are unobserved
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 4 / 41
ARIMA Unit roots and stationarity
Unit roots
I A time series yt has a unit root if |φ1| = 1 in the regression
yt = φ0 + φ1yt−1 + εt
- If yt contains exactly two unit roots, then the first ∆yt = yt − yt−1 contains one unit root
- A time series yt is integrated of order d if it contains exactly d unit roots. If so, we write
yt ∼ I(d)
- A time series yt is weakly (or second-order) stationary if it does not contain a unit roots, i.e.,
yt ∼ I(0)
- Unit root is usually referred to as stochastic trend
- yt is a random walk if it contains a stochastic trend:
yt = β0︸︷︷︸drift
+ β1t︸︷︷︸deterministic trend
+yt−1 + εt
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 5 / 41
ARIMA Testing for unit roots
Testing for unit roots
I Dickey-Fuller (DF) original test for unit roots involves fitting the AR(1) model:
yt = α + δt + φyt−1 + εt (3)
− Null hypothesis of unit root is: H0 : φ = 1
− Regression (3) is likely to be plagued by serial correlation
− To control for that, the augmented Dickey-Fuller (ADF) test instead fits a
model of the form
yt = α + δt + ρyt−1 + ζ1∆yt−1 + . . . + ζk∆yt−k + ut (4)
where ∆xt = xt − xt−1 for any variable xt , ρ = φ− 1→ H0 : ρ = 0 v.s. H1 : ρ < 0
− H1 : ρ < 0 is chosen as the case ρ > 0 is unlikely
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 6 / 41
ARIMA Testing for unit roots
Testing for unit roots
- We must consider one of the four cases for H0:
Case Process Restriction DF options
1 Random walk without drift α = 0, δ = 0 noconstant
2 Random walk without drift δ = 0 default
3 Random walk with drift δ = 0 drift
4 Random walk with drift & trend none trend
- Require of Optimal choice of k in (4)→ Command: ‘varsoc varname’→ use
SBIC/HQIC
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 7 / 41
ARIMA Testing for unit roots
ADF test: Application to airline data
I ADF test on airline passengers data
- Plot of data indicates both a drift and a deterministic trend
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 8 / 41
ARIMA Testing for unit roots
ADF test: Application to airline data
- Result indicates no evidence of a unit root
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 9 / 41
ARIMA Testing for unit roots
ADF test: Application to German log of consumption
- Plot of data indicates both a drift and a deterministic trend
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 10 / 41
ARIMA Testing for unit roots
ADF test: Application to German log of consumption
- Result shows strong evidence for the presence of unit root
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 11 / 41
ARIMA Testing for unit roots
Other unit root tests in STATA
I Phillips-Perron (PP) test:
- modification the ADF test statistic to account for the potential serial
correlation/heteroskedasticity in the errors
- command: pperron varname, lag(#) trend → lag(#)≡ lag length of Newey-West HAC
estimator
I GLS detrended ADF test:
- similar to the ADF test but prior to fitting the model in (4), one first transforms the actual
series via a generalized least-squares (GLS) regression
- More powerful than the ADF test
- command: dfgls varname, maxlag(#) trend
- maxlag(#) sets the value of k, the highest lag order for the first-differenced, detrended
variable in the DF regression: by default, kmax = floor[12{T+1
100 }14]→ Schwert, G. W (1989,
JBES)
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 12 / 41
ARIMA Testing for unit roots
Properties of a Random Walk
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 13 / 41
ARIMA Testing for unit roots
ARIMA(p,d,q)
I A time series yt ∼ ARIMA(p,d ,q) means that it d th difference
∆dyt ∼ ARIMA(p,q) :
− yt contains d unit roots
− ∆dyt ∼ I(0) (second-order stationary): d + 1 successive ADF tests must be
conducted to estimate d
− Once d is estimated, we can identify p and q from the transformed series
yt = ∆dyt , which is weakly stationary [yt ∼ I(0)]
− Then we can estimate the other parameters of the ARIMA specification,
and use these estimates to forecast the level series
− p and q are identified using the Autocorrelation (AC) and Partial
Autocorrelation (PAC) functions respectively.
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 14 / 41
ARIMA Autocorrelation and Partial Autocorrelation functions
Autocorrelation function (ACF)
I Autocovariance function:
γk = cov(yt , yt−k) : k = 0,1,2, . . . (5)
I Autocorrelation function:
ρk =cov(yt , yt−k)
γ0: k = 0,1,2, . . . (6)
Both γk and ρk are symmetric function of k , i.e., γ−k = γk and ρ−k = ρk . Note thatρ0 = 1 and −1 ≤ ρk ≤ 1.
- Stationary AR(1): yt = φ0 + φ1yt−1 + εt
µ =: E[yt ] =φ0
1− φ1, var (yt) =
σ2
1− φ21,
γk = φk1
σ2
1− φ21, ρk = φk
1, k = 0,1,2, . . .
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 15 / 41
ARIMA Autocorrelation and Partial Autocorrelation functions
Autocorrelation function (ACF)
- MA(1): yt = εt + θ0 + θ1εt−1
µ =: E[yt ] = θ0, var (yt) = σ2(1 + θ21),
γ1 = θ1σ2, ρ1 =
θ1
1 + θ21
and ρk = 0 ∀k > 1.
I Stationarity is a property of the AR part of the process⇒ MA processes are
always stationary and ρk = 0 for all k > q for an MA(q).
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 16 / 41
ARIMA Autocorrelation and Partial Autocorrelation functions
Partial Autocorrelation function (PACF)
I Consider the AR(k) regression:
yt = β0 + β1yt−1 + . . . + βkyt−k + ut , k = 1,2, . . . (7)
I k th-order PAC of yt for any k = 1,2,3, . . . is:
PACk = βk (8)
H For an AR(p) process, ρk is not zero after lag p but PACk = 0 for k > p ⇒ PACk is used to
identify p
H For an MA(q) process, PACk is not zero after lag q but ρk = 0 for k > q ⇒ ρk is used to
identify q
I Estimating ARMA(p,q) models requires identifying both p and q → properties
discussed above are key ingredients to do this
I In STATA, the command corrgram plots the estimated ACs and PACsFirmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 17 / 41
ARIMA Autocorrelation and Partial Autocorrelation functions
Corrgram in STATA
H Stata syntax: ‘corrgram’ tabulates autocorrelations, partial autocorrelations,
and portmanteau (Q) statistics
- Menu: Statistics→ Time series→ Graphs→ Autocorrelations & partial autocorrelations
Command: corrgram varname [if] [in] [, corrgram−options]
- We can use ‘ac’ to produce a graph of the autocorrelations→ Command: ac varname [if]
[in] [, ac−options]
- We can use ‘pac’ to produce a graph of the partial autocorrelations→ Command: pac
varname [if] [in] [, pac−options]
I Application to the international airline passengers dataset
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 18 / 41
ARIMA Autocorrelation and Partial Autocorrelation functions
Application to international airline passengers
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 19 / 41
ARIMA Autocorrelation and Partial Autocorrelation functions
Application to international airline passengers
(c) Autocorrelogram
(d) Partial Autocorrelogram
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 20 / 41
ARIMA Autocorrelation and Partial Autocorrelation functions
Application to international airline passengers
I From the PCF:- Data probably have a trend component as well as a seasonal component
- First-differencing will mitigate the effects of the trend
- Seasonal differencing will help control for seasonality
I To accomplish this goal, we can use Stata’s time-series operators: command→ pac DS12.air,
lags(20) srv
- Here we graph the partial autocorrelations after controlling for trends and seasonality
- Use ‘srv ’ to include the standardized residual variances
(e) Partial AutocorrelogramFirmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 21 / 41
ARIMA Estimating and forecasting ARIMA models
Estimation & forecasting: steps
I Steps to estimate yt ∼ ARIMA (p,d,q):
- Identify the order of integration d : must run d + 1 unit root tests
- Filter yt = ∆d(yt) ≡ transformed series is I(0)
- Plot the ACF and PACF of the filtered series yt
1 use PACF to identify p: last statistically significant lag of the autoregression
2 use ACF to identify q: last statistically significant autocorrelation
3 where there is no clear choice, select all potential candidates p and q
- Estimate all your model candidates
- Use model selection criteria– AIC/SBIC/HQIC to choose the ‘best’ model
- Proceed to forecasting
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 22 / 41
ARIMA Estimating and forecasting ARIMA models
Estimation & forecasting: command in STATA
I Syntax: arima depvar [indepvar] [if] [in] [weights] [,options]Options include:
- arima(#p, #d #q): specify ARIMA(p, d,q) model
- noconstant: suppress constant term
I Menu: Statistics→ Time series→ ARIMA and ARMAX models
I Application: log U.S. Wholesale Price Index (WPI)
- ADF tests indicate that ln−wpi ∼ I(1)
- First test is run with drift + trend, the second with only a driftFirmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 23 / 41
ARIMA Estimating and forecasting ARIMA models
Application: correlogram
(g) ACF
(h) PACF
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 24 / 41
ARIMA Estimating and forecasting ARIMA models
Application: correlogram
- Oscillations of both ACFs and PCFs may be due to seasonal variations
- ACFs→ push more on pure AR(p) processes couple (p,q) candidates lie in
{2,4} × {0}, i.e., (p,q) ∈ {(2,0); (4,0)}
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 25 / 41
ARIMA Estimating and forecasting ARIMA models
Application: estimation
I Estimation:
- Estimates store in a table with their s.e’s
- Model selection statistics are also provided: AIC & BIC
- The AIC indicates that ARIMA (4,1,0) model fits the data better
- Whereas the BIC indicates that it is ARIMA (2,1,0)
- As is often the case, different model-selection criteria have led to conflicting conclusions
- Both criteria select a pure AR process: no MA component is selected!
I Use comparative forecasting performance: which model forecasts better?
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 26 / 41
ARIMA Estimating and forecasting ARIMA models
Forecasts: 1 step-ahead
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 27 / 41
ARIMA Estimating and forecasting ARIMA models
Forecasts: 1 step-ahead
- Static > Dynamic: with dynamic forecasts, prior forecast errors accumulate over
time
- Problem with static: cannot obtain out-of-sample forecasts at T + 2, T + 3,
. . .→ Static forecast y (S)T+1 can be generated using yT , but generating y (S)
T+2
requires observing yT+1, which we don’t⇒ Dynamic forecasting is more realistic
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 28 / 41
ARIMA Estimating and forecasting ARIMA models
1-step ahead: ARIMA (2,1,0) vs. ARIMA (4,1,0)
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 29 / 41
ARIMA Estimating and forecasting ARIMA models
1-step ahead: ARIMA (2,1,0) vs. ARIMA (4,1,0)
- Static forecasts: ARIMA (2,1,0) & ARIMA (4,1,0) perform similarly: both do
well in mimicking the real data
- Dynamic forecasts: ARIMA (2,1,0) out performs ARIMA (4,1,0). As this case is
more realistic⇒ ARIMA (2,1,0) should be retained ≡ same choice of model as
the BIC
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 30 / 41
ARIMA Seasonal ARIMA models
Seasonal Adjustment techniques
I Seasonality in a time series:
- regular pattern of changes that repeats over S time periods, where S defines the number of
time periods until the pattern repeats again
- Monthly data for for which high values tend always to occur in some particular months & low
values tend always to occur in other particular months⇒ S = 12. If quarterly data⇒ S = 4
I Seasonal ARIMA model:
- seasonal AR and MA terms predict the series using data values and errors at times with lags
that are multiples of S
- with monthly data (and S = 12), a seasonal first-order autoregressive model would use
yt−12 to predict yt . A seasonal second-order autoregressive model would use yt−12 and
yt−24 to predict yt
- a seasonal first-order MA(1) model (with S = 12) would use εt−12 as a predictor. A
seasonal second-order MA(2) model would use εt−12 and εt−24.
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 31 / 41
ARIMA Seasonal ARIMA models
Seasonal Adjustment techniques
I Seasonality usually causes non-stationarity:
- average values at some particular times within the seasonal span may be different than the
average values at other times
- Seasonal differencing renders the series stationary: With S = 12, (1− L12)yt = yt − yt−12 is
purged of seasonal variations.
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 32 / 41
ARIMA Seasonal ARIMA models
Seasonal Adjustment techniques
I Non-seasonal differencing:
- If trend is present in the data, we may also need non-seasonal differencing
- Often (not always) a first-difference (nonseasonal) will “detrend" the data, i.e., we use
(1− L)yt = yt − yt−1 in the presence of trend
I Differencing for Trend and Seasonality
- When both trend and seasonality→ apply both a non-seasonal first-difference and a
seasonal difference⇒ examine ACF and PACF of
(1− L12)(1− L)yt = (yt − yt−1)− (yt−12 − yt−13)
- Removing trend doesn’t mean that we have removed the dependency: We may have
removed the mean, µt , part of which may include a periodic component
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 33 / 41
ARIMA Seasonal ARIMA models
SARIMA Models
I SARIMA Models: incorporates both non-seasonal and seasonal factors intwo ways1 multiplicative: shorthand notation is
ARIMA(p,d ,q)× (P,D,Q)S, where
− p ≡ AR order, d≡ order of integration, q ≡ MA order
− P ≡ seasonal AR order, D ≡ seasonal differencing, Q ≡ seasonal MA order
− S ≡ time span of repeating seasonal pattern
− yt ∼ ARIMA(p,d ,q)× (P,D,Q)S ⇔ φ(LS)ϕ(L)yt = Θ(LS)θ(L)εt
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 34 / 41
ARIMA Seasonal ARIMA models
SARIMA Models
• Non-seasonal: ϕ(L) = 1− ϕ1L− . . .− ϕpLp, θ(L) = 1 + θ1L + . . . + θqLq,
• Seasonal: Φ(LS) = 1− φ1LS − . . .− φPLSP, Θ(LS) = 1 + θ1LS + . . . + θQLSQ,
• Examples: ARIMA(1,0,0)× (1,0,0)12, ARIMA(0,0,1)× (0,0,1)12
I additive:yt ∼ ARIMA(p,d ,q) + (P,D,Q)S
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 35 / 41
ARIMA Seasonal ARIMA models
Application of SARIMA to Airline data
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 36 / 41
ARIMA Seasonal ARIMA models
Application of SARIMA to Airline data
• After first- and seasonally differencing the data:
− No presence of a trend component in the transformed data
- Use the “noconstant" option with ARIMA
− Stata command: arima lnair, arima(0,1,1) sarima(0,1,1,12) noconstant
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 37 / 41
ARIMA Seasonal ARIMA models
Application of SARIMA to Airline data
I We can write the outcome of the regression as:
(1− L12)(1− L)lnairt = −0.402εt−1−0.557εt−12+0.224εt−13 + εt
- Coefficient on εt−13 is the product of the coefficients on the εt−1 and εt−12 terms:
(−0.402)× (−0.557) = 0.224
- ARIMA labeled the dependent variable DS12.lnair to indicate that it has applied the
difference operator ∆ and the lag-12 seasonal difference operator ∆12 to “lnair "
- This model could have been fit by typing the command:
arima DS12.lnair, ma(1) mma(1, 12) noconstant
- For simple multiplicative models, using the sarima() option is easier, though this second
syntax allows us to incorporate more complicated seasonal terms
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 38 / 41
ARIMA Seasonal ARIMA models
Forecasting with SARIMA: Airline data
• SARIMA models have a good forecast performance:
- Static: close to the observed data
- Dynamic: not as good as static but shows an overall acceptable performance.
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 39 / 41
ARIMA X-12-ARIMA Seasonal Adjustment
X-12-ARIMA Seasonal Adjustment in STATA
I X-12-ARIMA was the U.S. Census Bureau’s software package for seasonaladjustment:
H can be used together with many statistical packages:
- Gretl or EViews which provides a graphical user interface for X-12-ARIMA
- NumXL avails X-12-ARIMA functionality in Microsoft Excel
I Many agencies presently are using X-12-ARIMA for seasonal adjustment:
- Statistics Canada
- U.S. Bureau of Labor Statistics
- Brazilian Institute of Geography and Statistics
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 40 / 41
ARIMA X-12-ARIMA Seasonal Adjustment
X-12-ARIMA Seasonal Adjustment in STATA
I Menu-driven X-12-ARIMA seasonal adjustment in Stata by:
- Qunyong Wang, Institute of Statistics and Econometrics, Nankai University
- Na Wu, Economics School, Tianjin University of Finance and Economics
Firmin Doko Tchatoka (UoA) 2018 ES-Summer Institute-Cotonou July 9, 2018 41 / 41
top related