Download - Forecasting without forecasters
1
Rob J Hyndman
Forecasting withoutforecasters
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Motivation 2
Motivation
Forecasting without forecasters Motivation 3
Motivation
Forecasting without forecasters Motivation 3
Motivation
Forecasting without forecasters Motivation 3
Motivation
Forecasting without forecasters Motivation 3
Motivation
Forecasting without forecasters Motivation 3
Motivation
1 Common in business to have over 1000products that need forecasting at least monthly.
2 Forecasts are often required by people who areuntrained in time series analysis.
3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.
Specifications
Automatic forecasting algorithms must:
å determine an appropriate time series model;
å estimate the parameters;
å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4
Motivation
1 Common in business to have over 1000products that need forecasting at least monthly.
2 Forecasts are often required by people who areuntrained in time series analysis.
3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.
Specifications
Automatic forecasting algorithms must:
å determine an appropriate time series model;
å estimate the parameters;
å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4
Motivation
1 Common in business to have over 1000products that need forecasting at least monthly.
2 Forecasts are often required by people who areuntrained in time series analysis.
3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.
Specifications
Automatic forecasting algorithms must:
å determine an appropriate time series model;
å estimate the parameters;
å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4
Motivation
1 Common in business to have over 1000products that need forecasting at least monthly.
2 Forecasts are often required by people who areuntrained in time series analysis.
3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.
Specifications
Automatic forecasting algorithms must:
å determine an appropriate time series model;
å estimate the parameters;
å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4
Motivation
1 Common in business to have over 1000products that need forecasting at least monthly.
2 Forecasts are often required by people who areuntrained in time series analysis.
3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.
Specifications
Automatic forecasting algorithms must:
å determine an appropriate time series model;
å estimate the parameters;
å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4
Example: Asian sheep
Forecasting without forecasters Motivation 5
Numbers of sheep in Asia
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
250
300
350
400
450
500
550
Example: Asian sheep
Forecasting without forecasters Motivation 5
Automatic ETS forecasts
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
250
300
350
400
450
500
550
Example: Cortecosteroid sales
Forecasting without forecasters Motivation 6
Monthly cortecosteroid drug sales in Australia
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
Example: Cortecosteroid sales
Forecasting without forecasters Motivation 6
Automatic ARIMA forecasts
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
M3 competition
Forecasting without forecasters Motivation 7
M3 competition
Forecasting without forecasters Motivation 7
3003 time series.
Early comparison of automatic forecastingalgorithms.
Best-performing methods undocumented.
Limited subsequent research on generalautomatic forecasting algorithms.
M3 competition
Forecasting without forecasters Motivation 7
3003 time series.
Early comparison of automatic forecastingalgorithms.
Best-performing methods undocumented.
Limited subsequent research on generalautomatic forecasting algorithms.
M3 competition
Forecasting without forecasters Motivation 7
3003 time series.
Early comparison of automatic forecastingalgorithms.
Best-performing methods undocumented.
Limited subsequent research on generalautomatic forecasting algorithms.
M3 competition
Forecasting without forecasters Motivation 7
3003 time series.
Early comparison of automatic forecastingalgorithms.
Best-performing methods undocumented.
Limited subsequent research on generalautomatic forecasting algorithms.
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Exponential smoothing 8
Exponential smoothing
Forecasting without forecasters Exponential smoothing 9
Classic Reference
Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.
Exponential smoothing
Forecasting without forecasters Exponential smoothing 9
Classic Reference
Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.
å “Unfortunately, exponential smoothingmethods do not allow the easy calculation ofprediction intervals.” (MWH, p.177)
å No satisfactory way to select an exponentialsmoothing method.
Exponential smoothing
Forecasting without forecasters Exponential smoothing 9
Classic Reference
Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.
å “Unfortunately, exponential smoothingmethods do not allow the easy calculation ofprediction intervals.” (MWH, p.177)
å No satisfactory way to select an exponentialsmoothing method.
Exponential smoothing
Forecasting without forecasters Exponential smoothing 9
Classic Reference
Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.
Current Reference
Hyndman and Athanasopoulos(2013) Forecasting: principlesand practice, OTexts: Australia.OTexts.com/fpp.
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ methodA,M: Multiplicative Holt-Winters’ method
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exponential smoothingmethods.
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exponential smoothingmethods.Each can have an additive or multiplicativeerror, giving 30 separate models.
Forecasting without forecasters Exponential smoothing 10
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↑
TrendExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↑ ↖
Trend SeasonalExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↗ ↑ ↖
Error Trend SeasonalExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↗ ↑ ↖
Error Trend SeasonalExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↗ ↑ ↖
Error Trend SeasonalExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting without forecasters Exponential smoothing 11
Innovations state space models
å All ETS models can be written in innovationsstate space form (IJF, 2002).
å Additive and multiplicative versions give thesame point forecasts but different predictionintervals.
Automatic forecasting
From Hyndman et al. (IJF, 2002):
Apply each of 30 models that are appropriate tothe data. Optimize parameters and initialvalues using MLE (or some other criterion).
Select best method using AIC:
AIC = −2 log(Likelihood) + 2p
where p = # parameters.
Produce forecasts using best method.
Obtain prediction intervals using underlyingstate space model.
Forecasting without forecasters Exponential smoothing 12
Automatic forecasting
From Hyndman et al. (IJF, 2002):
Apply each of 30 models that are appropriate tothe data. Optimize parameters and initialvalues using MLE (or some other criterion).
Select best method using AIC:
AIC = −2 log(Likelihood) + 2p
where p = # parameters.
Produce forecasts using best method.
Obtain prediction intervals using underlyingstate space model.
Forecasting without forecasters Exponential smoothing 12
Automatic forecasting
From Hyndman et al. (IJF, 2002):
Apply each of 30 models that are appropriate tothe data. Optimize parameters and initialvalues using MLE (or some other criterion).
Select best method using AIC:
AIC = −2 log(Likelihood) + 2p
where p = # parameters.
Produce forecasts using best method.
Obtain prediction intervals using underlyingstate space model.
Forecasting without forecasters Exponential smoothing 12
Automatic forecasting
From Hyndman et al. (IJF, 2002):
Apply each of 30 models that are appropriate tothe data. Optimize parameters and initialvalues using MLE (or some other criterion).
Select best method using AIC:
AIC = −2 log(Likelihood) + 2p
where p = # parameters.
Produce forecasts using best method.
Obtain prediction intervals using underlyingstate space model.
Forecasting without forecasters Exponential smoothing 12
Exponential smoothing
Forecasting without forecasters Exponential smoothing 13
Forecasts from ETS(M,A,N)
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
300
400
500
600
Exponential smoothing
fit <- ets(livestock)fcast <- forecast(fit)plot(fcast)
Forecasting without forecasters Exponential smoothing 14
Forecasts from ETS(M,A,N)
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
300
400
500
600
Exponential smoothing
Forecasting without forecasters Exponential smoothing 15
Forecasts from ETS(M,Md,M)
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Exponential smoothing
fit <- ets(h02)fcast <- forecast(fit)plot(fcast)
Forecasting without forecasters Exponential smoothing 16
Forecasts from ETS(M,Md,M)
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
1.6
M3 comparisons
Method MAPE sMAPE MASE
Theta 17.83 12.86 1.40
ForecastPro 18.00 13.06 1.47
ETS additive 18.58 13.69 1.48
ETS 19.33 13.57 1.59
Forecasting without forecasters Exponential smoothing 17
References
RJ Hyndman, AB Koehler, RD Snyder, andS Grose (2002). “A state space framework forautomatic forecasting using exponentialsmoothing methods”. International Journal ofForecasting 18(3), 439–454.
RJ Hyndman, AB Koehler, JK Ord, and RD Snyder(2008). Forecasting with exponentialsmoothing: the state space approach.Springer-Verlag.
RJ Hyndman and G Athanasopoulos (2013).Forecasting: principles and practice. OTexts.OTexts.com/fpp/.
Forecasting without forecasters Exponential smoothing 18
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters ARIMA modelling 19
ARIMA modelling
Forecasting without forecasters ARIMA modelling 20
Classic Reference
Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.
ARIMA modelling
Forecasting without forecasters ARIMA modelling 20
Classic Reference
Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.
å “There is such a bewildering variety of ARIMAmodels, it can be difficult to decide which modelis most appropriate for a given set of data.”(MWH, p.347)
Auto ARIMA
Forecasting without forecasters ARIMA modelling 21
Forecasts from ARIMA(0,1,0) with drift
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
250
300
350
400
450
500
550
Auto ARIMA
fit <- auto.arima(livestock)fcast <- forecast(fit)plot(fcast)
Forecasting without forecasters ARIMA modelling 22
Forecasts from ARIMA(0,1,0) with drift
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
250
300
350
400
450
500
550
Auto ARIMA
Forecasting without forecasters ARIMA modelling 23
Forecasts from ARIMA(3,1,3)(0,1,1)[12]
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
Auto ARIMA
fit <- auto.arima(h02)fcast <- forecast(fit)plot(fcast)
Forecasting without forecasters ARIMA modelling 24
Forecasts from ARIMA(3,1,3)(0,1,1)[12]
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1− B)dyt = c + θ(B)εt
Need to select appropriate orders p,q,d, andwhether to include c.
Forecasting without forecasters ARIMA modelling 25
Algorithm choices driven by forecast accuracy.
How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1− B)dyt = c + θ(B)εt
Need to select appropriate orders p,q,d, andwhether to include c.
Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select p,q, c by minimising AIC.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.
Forecasting without forecasters ARIMA modelling 25
Algorithm choices driven by forecast accuracy.
How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1− B)dyt = c + θ(B)εt
Need to select appropriate orders p,q,d, andwhether to include c.
Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select p,q, c by minimising AIC.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.
Forecasting without forecasters ARIMA modelling 25
Algorithm choices driven by forecast accuracy.
How does auto.arima() work?
A seasonal ARIMA process
Φ(Bm)φ(B)(1− B)d(1− Bm)Dyt = c + Θ(Bm)θ(B)εt
Need to select appropriate orders p,q,d, P,Q,D, andwhether to include c.
Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select D using OCSB unit root test.Select p,q, P,Q, c by minimising AIC.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.
Forecasting without forecasters ARIMA modelling 26
M3 comparisons
Method MAPE sMAPE MASE
Theta 17.83 12.86 1.40
ForecastPro 18.00 13.06 1.47
BJauto 19.14 13.73 1.55
AutoARIMA 18.98 13.75 1.47
ETS-additive 18.58 13.69 1.48
ETS 19.33 13.57 1.59
ETS-ARIMA 18.17 13.11 1.44Forecasting without forecasters ARIMA modelling 27
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.
I have an algorithm that does better than all ofthese, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.
I have an algorithm that does better than all ofthese, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.
I have an algorithm that does better than all ofthese, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.
I have an algorithm that does better than all ofthese, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.
I have an algorithm that does better than all ofthese, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
M3 conclusions
MYTHS
Simple methods do better.
Exponential smoothing is better than ARIMA.
FACTS
The best methods are hybrid approaches.
ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.
I have an algorithm that does better than all ofthese, but it takes too long to be practical.
Forecasting without forecasters ARIMA modelling 28
References
RJ Hyndman and Y Khandakar (2008).“Automatic time series forecasting : theforecast package for R”. Journal of StatisticalSoftware 26(3)
RJ Hyndman (2011). “Major changes to theforecast package”.robjhyndman.com/hyndsight/forecast3/.
RJ Hyndman and G Athanasopoulos (2013).Forecasting: principles and practice. OTexts.OTexts.com/fpp/.
Forecasting without forecasters ARIMA modelling 29
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Time series with complex seasonality 30
Examples
Forecasting without forecasters Time series with complex seasonality 31
US finished motor gasoline products
Weeks
Tho
usan
ds o
f bar
rels
per
day
1992 1994 1996 1998 2000 2002 2004
6500
7000
7500
8000
8500
9000
9500
Examples
Forecasting without forecasters Time series with complex seasonality 31
Number of calls to large American bank (7am−9pm)
5 minute intervals
Num
ber
of c
all a
rriv
als
100
200
300
400
3 March 17 March 31 March 14 April 28 April 12 May
Examples
Forecasting without forecasters Time series with complex seasonality 31
Turkish electricity demand
Days
Ele
ctric
ity d
eman
d (G
W)
2000 2002 2004 2006 2008
1015
2025
TBATS model
TBATSTrigonometric terms for seasonality
Box-Cox transformations for heterogeneity
ARMA errors for short-term dynamics
Trend (possibly damped)
Seasonal (including multiple and
non-integer periods)
Forecasting without forecasters Time series with complex seasonality 32
Examples
fit <- tbats(gasoline)fcast <- forecast(fit)plot(fcast)
Forecasting without forecasters Time series with complex seasonality 33
Forecasts from TBATS(0.999, {2,2}, 1, {<52.1785714285714,8>})
Weeks
Tho
usan
ds o
f bar
rels
per
day
1995 2000 2005
7000
8000
9000
1000
0
Examples
fit <- tbats(callcentre)fcast <- forecast(fit)plot(fcast)
Forecasting without forecasters Time series with complex seasonality 34
Forecasts from TBATS(1, {3,1}, 0.987, {<169,5>, <845,3>})
5 minute intervals
Num
ber
of c
all a
rriv
als
010
020
030
040
050
0
3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June
Examples
fit <- tbats(turk)fcast <- forecast(fit)plot(fcast)
Forecasting without forecasters Time series with complex seasonality 35
Forecasts from TBATS(0, {5,3}, 0.997, {<7,3>, <354.37,12>, <365.25,4>})
Days
Ele
ctric
ity d
eman
d (G
W)
2000 2002 2004 2006 2008 2010
1015
2025
References
Automatic algorithm described inAM De Livera, RJ Hyndman, and RD Snyder(2011). “Forecasting time series with complexseasonal patterns using exponentialsmoothing”. Journal of the American StatisticalAssociation 106(496), 1513–1527.
Slightly improved algorithm implemented inRJ Hyndman (2012). forecast: Forecastingfunctions for time series.cran.r-project.org/package=forecast.
More work required!
Forecasting without forecasters Time series with complex seasonality 36
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Hierarchical and grouped time series 37
Introduction
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
Examples
Manufacturing product hierarchies
Pharmaceutical sales
Net labour turnover
Forecasting without forecasters Hierarchical and grouped time series 38
Introduction
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
Examples
Manufacturing product hierarchies
Pharmaceutical sales
Net labour turnover
Forecasting without forecasters Hierarchical and grouped time series 38
Introduction
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
Examples
Manufacturing product hierarchies
Pharmaceutical sales
Net labour turnover
Forecasting without forecasters Hierarchical and grouped time series 38
Introduction
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
Examples
Manufacturing product hierarchies
Pharmaceutical sales
Net labour turnover
Forecasting without forecasters Hierarchical and grouped time series 38
Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.
Example: Pharmaceutical products are organizedin a hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.
A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.
Example: daily numbers of calls to HP call centresare grouped by product type and location of callcentre.
Forecasting without forecasters Hierarchical and grouped time series 39
Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.
Example: Pharmaceutical products are organizedin a hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.
A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.
Example: daily numbers of calls to HP call centresare grouped by product type and location of callcentre.
Forecasting without forecasters Hierarchical and grouped time series 39
Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.
Example: Pharmaceutical products are organizedin a hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.
A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.
Example: daily numbers of calls to HP call centresare grouped by product type and location of callcentre.
Forecasting without forecasters Hierarchical and grouped time series 39
Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.
Example: Pharmaceutical products are organizedin a hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.
A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.
Example: daily numbers of calls to HP call centresare grouped by product type and location of callcentre.
Forecasting without forecasters Hierarchical and grouped time series 39
Hierarchical data
Total
A B C
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
Bt : vector of all series atbottom level in time t.
Hierarchical data
Total
A B C
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
Bt : vector of all series atbottom level in time t.
Hierarchical data
Total
A B C
Y t = [Yt, YA,t, YB,t, YC,t]′ =
1 1 11 0 00 1 00 0 1
YA,tYB,tYC,t
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
Bt : vector of all series atbottom level in time t.
Hierarchical data
Total
A B C
Y t = [Yt, YA,t, YB,t, YC,t]′ =
1 1 11 0 00 1 00 0 1
︸ ︷︷ ︸
S
YA,tYB,tYC,t
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
Bt : vector of all series atbottom level in time t.
Hierarchical data
Total
A B C
Y t = [Yt, YA,t, YB,t, YC,t]′ =
1 1 11 0 00 1 00 0 1
︸ ︷︷ ︸
S
YA,tYB,tYC,t
︸ ︷︷ ︸
Bt
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
Bt : vector of all series atbottom level in time t.
Hierarchical data
Total
A B C
Y t = [Yt, YA,t, YB,t, YC,t]′ =
1 1 11 0 00 1 00 0 1
︸ ︷︷ ︸
S
YA,tYB,tYC,t
︸ ︷︷ ︸
BtY t = SBt
Forecasting without forecasters Hierarchical and grouped time series 40
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
Bt : vector of all series atbottom level in time t.
Grouped data
Total
A
AX AY
B
BX BY
Total
X
AX BX
Y
AY BY
Y t =
YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t
=
1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1
︸ ︷︷ ︸
S
YAX,tYAY,tYBX,tYBY,t
︸ ︷︷ ︸
Bt
Forecasting without forecasters Hierarchical and grouped time series 41
Grouped data
Total
A
AX AY
B
BX BY
Total
X
AX BX
Y
AY BY
Y t =
YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t
=
1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1
︸ ︷︷ ︸
S
YAX,tYAY,tYBX,tYBY,t
︸ ︷︷ ︸
Bt
Forecasting without forecasters Hierarchical and grouped time series 41
Grouped data
Total
A
AX AY
B
BX BY
Total
X
AX BX
Y
AY BY
Y t =
YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t
=
1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1
︸ ︷︷ ︸
S
YAX,tYAY,tYBX,tYBY,t
︸ ︷︷ ︸
Bt
Y t = SBt
Forecasting without forecasters Hierarchical and grouped time series 41
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast everyseries of interest independently.
å Adjust forecasts to impose constraints.
Let Yn(h) be vector of initial forecasts for horizon h,made at time n, stacked in same order as Y t.
Optimal reconciled forecasts:
Yn(h) = S(S′S)−1S′Yn(h)
Forecasting without forecasters Hierarchical and grouped time series 42
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast everyseries of interest independently.
å Adjust forecasts to impose constraints.
Let Yn(h) be vector of initial forecasts for horizon h,made at time n, stacked in same order as Y t.
Optimal reconciled forecasts:
Yn(h) = S(S′S)−1S′Yn(h)
Forecasting without forecasters Hierarchical and grouped time series 42
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast everyseries of interest independently.
å Adjust forecasts to impose constraints.
Let Yn(h) be vector of initial forecasts for horizon h,made at time n, stacked in same order as Y t.
Optimal reconciled forecasts:
Yn(h) = S(S′S)−1S′Yn(h)
Forecasting without forecasters Hierarchical and grouped time series 42
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast everyseries of interest independently.
å Adjust forecasts to impose constraints.
Let Yn(h) be vector of initial forecasts for horizon h,made at time n, stacked in same order as Y t.
Optimal reconciled forecasts:
Yn(h) = S(S′S)−1S′Yn(h)
Forecasting without forecasters Hierarchical and grouped time series 42
Independent of covariance structure of hierarchy!
Optimal reconciliation weights are S(S′S)−1S′,independent of data.
Features
Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.
Method outperforms bottom-up and top-down,especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts atany level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work withany hierarchical or grouped time series.
Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.
Method outperforms bottom-up and top-down,especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts atany level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work withany hierarchical or grouped time series.
Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.
Method outperforms bottom-up and top-down,especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts atany level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work withany hierarchical or grouped time series.
Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.
Method outperforms bottom-up and top-down,especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts atany level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work withany hierarchical or grouped time series.
Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.
Method outperforms bottom-up and top-down,especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts atany level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work withany hierarchical or grouped time series.
Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.
Method outperforms bottom-up and top-down,especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts atany level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work withany hierarchical or grouped time series.
Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43
Features
Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.
Method outperforms bottom-up and top-down,especially for middle levels.
Covariates can be included in base forecasts.
Adjustments can be made to base forecasts atany level.
Point forecasts are always aggregate consistent.
Very simple and flexible method. Can work withany hierarchical or grouped time series.
Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43
Challenges
Computational difficulties in big hierarchies dueto size of the S matrix and non-singularbehavior of (S′S).
Need to estimate covariance matrix to produceprediction intervals.
Forecasting without forecasters Hierarchical and grouped time series 44
Challenges
Computational difficulties in big hierarchies dueto size of the S matrix and non-singularbehavior of (S′S).
Need to estimate covariance matrix to produceprediction intervals.
Forecasting without forecasters Hierarchical and grouped time series 44
Example using Rlibrary(hts)
# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))
Forecasting without forecasters Hierarchical and grouped time series 45
Example using Rlibrary(hts)
# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))
Forecasting without forecasters Hierarchical and grouped time series 45
Total
A
AX AY
B
BX BY
Example using Rlibrary(hts)
# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))
# Forecast 10-step-ahead using optimal combination method# ETS used for each series by defaultfc <- forecast(y, h=10)
Forecasting without forecasters Hierarchical and grouped time series 46
Example using Rlibrary(hts)
# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))
# Forecast 10-step-ahead using optimal combination method# ETS used for each series by defaultfc <- forecast(y, h=10)
# Select your own methodsally <- allts(y)allf <- matrix(, nrow=10, ncol=ncol(ally))for(i in 1:ncol(ally))
allf[,i] <- mymethod(ally[,i], h=10)allf <- ts(allf, start=2004)# Reconcile forecasts so they add upfc2 <- combinef(allf, Smatrix(y))
Forecasting without forecasters Hierarchical and grouped time series 47
References
RJ Hyndman, RA Ahmed, G Athanasopoulos,and HL Shang (2011). “Optimal combinationforecasts for hierarchical time series”.Computational Statistics and Data Analysis55(9), 2579–2589
RJ Hyndman, RA Ahmed, and HL Shang (2013).hts: Hierarchical time series.cran.r-project.org/package=hts.
RJ Hyndman and G Athanasopoulos (2013).Forecasting: principles and practice. OTexts.OTexts.com/fpp/.Forecasting without forecasters Hierarchical and grouped time series 48
Outline
1 Motivation
2 Exponential smoothing
3 ARIMA modelling
4 Time series with complex seasonality
5 Hierarchical and grouped time series
6 Functional time series
Forecasting without forecasters Functional time series 49
Fertility rates
Forecasting without forecasters Functional time series 50
Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.
ft(x) = µ(x) +K∑
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allowforecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)principal components.
Univariate models used for automatic forecasting ofscores {βt,k}.
Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.
ft(x) = µ(x) +K∑
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allowforecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)principal components.
Univariate models used for automatic forecasting ofscores {βt,k}.
Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.
ft(x) = µ(x) +K∑
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allowforecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)principal components.
Univariate models used for automatic forecasting ofscores {βt,k}.
Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.
ft(x) = µ(x) +K∑
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allowforecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)principal components.
Univariate models used for automatic forecasting ofscores {βt,k}.
Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.
ft(x) = µ(x) +K∑
k=1
βt,k φk(x) + et(x)
Forecasting without forecasters Functional time series 51
Decomposition separates time and age to allowforecasting.
Estimate µ(x) as mean ft(x) across years.
Estimate βt,k and φk(x) using functional (weighted)principal components.
Univariate models used for automatic forecasting ofscores {βt,k}.
Fertility application
Forecasting without forecasters Functional time series 52
15 20 25 30 35 40 45 50
050
100
150
200
250
Australia fertility rates (1921−2006)
Age
Fer
tility
rat
e
Fertility model
Forecasting without forecasters Functional time series 53
15 20 25 30 35 40 45 50
05
1015
Age
Mu
15 20 25 30 35 40 45 50
0.00
0.10
0.20
0.30
AgeP
hi 1
Year
Bet
a 1
1920 1960 2000
−20
−10
05
10
15 20 25 30 35 40 45 50
−0.
2−
0.1
0.0
0.1
0.2
Age
Phi
2Year
Bet
a 2
1920 1960 2000
−10
010
2030
Forecasts of ft(x)
Forecasting without forecasters Functional time series 54
15 20 25 30 35 40 45 50
050
100
150
200
250
Australia fertility rates (1921−2006)
Age
Fer
tility
rat
e
Forecasts of ft(x)
Forecasting without forecasters Functional time series 54
15 20 25 30 35 40 45 50
050
100
150
200
250
Australia fertility rates (1921−2006)
Age
Fer
tility
rat
e
Forecasts of ft(x)
Forecasting without forecasters Functional time series 54
15 20 25 30 35 40 45 50
050
100
150
200
250
Australia fertility rates (1921−2006)
Age
Fer
tility
rat
e
Forecasts of ft(x)
Forecasting without forecasters Functional time series 54
15 20 25 30 35 40 45 50
050
100
150
200
250
Australia fertility rates (1921−2006)
Age
Fer
tility
rat
e
80% prediction intervals
R code
Forecasting without forecasters Functional time series 55
15 20 25 30 35 40 45 50
050
100
150
200
250
Australia fertility rates (1921−2006)
Age
Fer
tility
rat
e
library(demography)plot(aus.fert)fit <- fdm(aus.fert)fc <- forecast(fit)
References
RJ Hyndman and S Ullah (2007). “Robustforecasting of mortality and fertility rates: Afunctional data approach”. ComputationalStatistics and Data Analysis 51(10), 4942–4956
RJ Hyndman and HL Shang (2009). “Forecastingfunctional time series (with discussion)”.Journal of the Korean Statistical Society 38(3),199–221
RJ Hyndman (2012). demography: Forecastingmortality, fertility, migration and populationdata.cran.r-project.org/package=demography.
Forecasting without forecasters Functional time series 56
For further information
robjhyndman.com
Slides and references for this talk.
Links to all papers and books.
Links to R packages.
A blog about forecasting research.
Forecasting without forecasters Functional time series 57