NATCOR: Forecasting & Predictive Analytics
Lecture 3: Exponential Smoothing
John Boylan
Lancaster Centre for Forecasting Department of Management Science
Methods and Models
Forecasting Method A (numerical) procedure for generating a forecast. eg Take the average of all observations up to (and including) time t, as a forecast for time t+1.
Forecasting Model A statistical description of the data generating process. eg All observations are centred around an unchanging mean (μ) with a normally distributed i.i.d. noise term ( ) with zero mean and constant variance (V).
Slide 2 NATCOR – Exponential Smoothing
),0(~ VNtε
tty εµ +=
∑=
+ =t
iitt y
ty
1|1
1ˆ
Link between Models and Methods
Heuristic Methods (No Link) These are methods that have been designed without reference to statistical models and have no link to such models. eg Simple Moving Averages (see later slides).
Model-Based Methods (Linked) These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing (see later slides). eg “Average of all observations” – links to the model on the previous slide.
Slide 3 NATCOR – Exponential Smoothing
tty εµ +=
Arithmetic Mean
Slide 4
∑=
+ =t
iitt y
ty
1|1
1ˆ
• Gives equal weight to all observations • Has longest possible ‘memory’. • Reduces noise, as the random fluctuations tend to cancel out. • The more data is available, the longer the average, and the
better the estimation of the mean level. • If the model has held in the past, and continues to
hold over the forecast horizon, then the Arithmetic Mean is the ‘best’ forecast for time t+1.
What forecast should be used for time t+2 ? NATCOR – Exponential Smoothing
tty εµ +=
Forecasting with the Arithmetic Mean
Slide 5
10 20 30 40 50 60 70 80 90350
400
450
500
550
600
650
Month
Uni
ts
• Forecast becomes more stable as time progresses • If model holds, forecast accuracy depends on level of
“noise” in the model error term
NATCOR – Exponential Smoothing
1 step-ahead forecasts:
83|84,...,2|3,1|2 ˆˆˆ yyy1 - 12 step-ahead f/casts:
84|96,...,84|86,84|85 ˆˆˆ yyy11|2ˆ yy =
2/)(ˆ 212|3 yyy +=
Arithmetic Mean and Outliers
Slide 6
10 20 30 40 50 60 70 80 90 100 110100
200
300
400
500
600
ActualsArithmetic Mean
NATCOR – Exponential Smoothing
• The Arithmetic Mean becomes more robust to outliers as the length of history grows.
• The “weight” given to the outlier is only 1/t, where t is the length of the history used in calculating the mean.
Outlier
Arithmetic Mean and Level Shifts
Slide 7 NATCOR – Exponential Smoothing
• The Arithmetic Mean is poor at handling level shifts. • The method has a long memory. • It cannot “forget” the previous level and adjust to the new
level within a reasonable period of time.
Level Shift occurs here
10 20 30 40 50 60 70 80 90 100 11050
100
150
200
250
300
350
ActualsArithmetic Mean
Random Walk Model
Slide 8 NATCOR – Exponential Smoothing
ttt yy ε+= −1
• Mean level no longer constant (see graph) • The next noise term ( ) is not forecastable at time • ‘Best’ forecast of is to use the latest observation ( )
1+tεty1+ty
t
Naïve Forecast
Slide 9 NATCOR – Exponential Smoothing
• Naïve does not “filter” the noise - it copies the noise. • Arithmetic Mean good at filtering noise but unresponsive
to level shifts. The Naïve method is the opposite.
ttt yy =+ |1ˆ
10 20 30 40 50 60 70 80 90350
400
450
500
550
600
650
Month
Uni
ts
Alternative Approach: Simple Moving Averages
Slide 10 NATCOR – Exponential Smoothing
• Gives equal weight to all of the last N observations in the average:
• ‘Memory’ depends on length of Simple Moving Average • Unlike Arithmetic Mean and Naïve methods, the Simple Moving
Average has a parameter (N) that needs to be determined. • Higher N values filter noise better but respond more slowly to
level shifts. • Method is not model-based but may still perform more accurately
than some model-based methods (eg Naïve).
∑+−=
+ =t
Ntiitt y
Ny
1|1
1ˆ
Difference between Simple and Centred Moving Average
Slide 11 NATCOR – Exponential Smoothing
• Simple Moving Average (SMA) of length 3 takes the average of the first three observations as a forecast for the fourth period.
• Centred Moving Average (CMA) of length 3 takes the average of the first three observations as an estimate of the underlying model at the second period.
Simple Moving Average
Effect of Length of SMA
Slide 12 NATCOR – Exponential Smoothing
10 20 30 40 50 60 70 80 90 350
400
450
500
550
600
650
Month
Uni
ts
Actuals SMA(6) SMA(12) SMA(24)
• Different lengths of SMA may produce quite different forecasts. • Best choice of length depends on whether it is more important
to filter noise or respond to level shifts.
SMA and Outliers
Slide 13
10 20 30 40 50 60 70 80 90 100 110 100
150
200
250
300
350
400
450
500
550
Actuals SMA(6) SMA(12) SMA(24)
• Robustness of SMA to outliers depends on length of SMA • The longer the SMA, the more robust is the forecast to outlying
observations. NATCOR – Exponential Smoothing
SMA and Level Shifts
Slide 14
10 20 30 40 50 60 70 80 90 100 11050
100
150
200
250
300
350
ActualsMA(6)MA(12)MA(24)
NATCOR – Exponential Smoothing
• Adaptation of SMA to level shifts depends on length of SMA • It will take N periods for an SMA to fully adapt to a new level
(where N is the length of the SMA).
Choice of Length (Order) of SMA
Slide 15 NATCOR – Exponential Smoothing
• Best length of SMA not known in advance • Times series graph may give some clues but cannot
determine best length of SMA from this alone. • Need to compare accuracy of SMA using different lengths.
We experiment on past data, but only using data that would have been available at the time to calculate our forecasts. Issues to resolve
1. What error measure? 2. How many steps-ahead? 3. Over what time period?
Error Measures (h-step-ahead forecasts)
Slide 16
Mean Squared Error (MSE)
Mean Absolute Error (MAE)
∑ ∑−
=
−
=+++++++ −==
1
0
1
0
2|
2 )ˆ(11 m
j
m
jjtjhtjhtjht yy
me
mMSE
∑ ∑−
=
−
=+++++++ −==
1
0
1
0|ˆ11 m
j
m
jjtjhtjhtjht yy
me
mMAE
NATCOR – Exponential Smoothing
Mean Absolute Percentage Error (MAPE) ∑ ∑
−
=
−
= ++
+++++
++
++ −==
1
0
1
0
|ˆ100100 m
j
m
j jht
jtjhtjht
jht
jht
yyy
mye
mMAPE
1. Choice of Error Measure to determine length of SMA
Slide 17 NATCOR – Exponential Smoothing
• Most common choice is MSE.
• MSE is the error measure used in times series theory to link models to methods which are ‘optimal’ (Minimum Mean Square Error, MMSE) for that model.
• This is what was meant by ‘best’ forecast in earlier slides.
• MSE also links to the AIC measure for model selection (discussed later).
• However, results can be sensitive to outlying observations.
2. Choice of Forecast Horizon (h) to determine length of SMA
Slide 18 NATCOR – Exponential Smoothing
• Most common choice is one-step-ahead.
• If we are only interested in (say) 3-step-ahead errors, then we may minimise MSE for 3-step-ahead forecasts.
• Often, we are interested in 1-step, 2-step and 3-step-ahead errors (say). Then minimising MSE for 1-step-ahead forecasts ‘stands in’ for the other two horizons.
• Alternative approaches, taking into account all the relevant horizons, are currently being researched by the Lancaster Centre for Forecasting.
3. Choice of Time Period over which to determine length of SMA
Slide 19 NATCOR – Exponential Smoothing
Jan77 Jan79 Jan81 Jan83 Jan85 Jan87 2000
4000
6000
8000
10000
12000US exports of upper and lining leather
Month
Uni
ts
In-sample Out-of-sample
DataIn-sample forecastOut-of-sample forecastForecast origin
• Divide history into ‘in-sample’ (training set) and ‘out-of-sample’ (test set).
• Use in-sample to determine length of SMA • Use out-of-sample to compare SMA with other methods
Example Series
Slide 20 NATCOR – Exponential Smoothing
• Open Exponential Smoothing Exercise spreadsheet at first tab
(Data Visualisation) for these series in Columns A and C.
• Two additional series – High Noise, and High Noise with Level Shift are in Columns B and D.
60.0070.0080.0090.00
100.00110.00120.00130.00140.00
Jan
2012
Apr 2
012
Jul 2
012
Oct
201
2
Jan
2013
Apr 2
013
Jul 2
013
Oct
201
3
Jan
2014
Apr 2
014
Jul 2
014
Oct
201
4
Jan
2015
Apr 2
015
Jul 2
015
Oct
201
5
Medium Noise
80.00100.00120.00140.00160.00180.00200.00220.00240.00260.00
Jan
2012
Apr 2
012
Jul 2
012
Oct
201
2
Jan
2013
Apr 2
013
Jul 2
013
Oct
201
3
Jan
2014
Apr 2
014
Jul 2
014
Oct
201
4
Jan
2015
Apr 2
015
Jul 2
015
Oct
201
5
Medium Noise with Level Shift
Fixed Forecasts and Rolling Forecasts in Out-of-Sample
Slide 21 NATCOR – Exponential Smoothing
Jan77 Jan79 Jan81 Jan83 Jan85 Jan87 2000
4000
6000
8000
10000
12000US exports of upper and lining leather
Month
Uni
ts
In-sample Out-of-sample
DataIn-sample forecastOut-of-sample forecastForecast origin
• Graph shows fixed forecasts, made at the Forecast Origin (ie one 1-step-ahead f/cast, one 2-step-ahead f/cast etc).
• Rolling forecasts are made at the Origin, then at the Origin plus one period, Origin plus two periods etc.
Split between In-Sample and Out-of-Sample
Slide 22 NATCOR – Exponential Smoothing
Fixed Forecasts
• Accuracy will be assessed for all forecast horizons out-of-sample, with each f/cast made at the Forecast Origin.
• So, out-of-sample length should be set to be equal to the longest forecast horizon.
Rolling Forecasts
Trade off between:
1. Longer in-sample lengths allow more accurate assessment of the optimal parameter (length of SMA).
2. Longer out-of-sample lengths allow for more accurate comparisons of different methods if using Rolling Forecasts .
Data Splitting in EXCEL
Slide 23 NATCOR – Exponential Smoothing
Data Splitting Exercise
• Open spreadsheet at 2nd tab (2. Data Splitting)
• Experiment with different “In-sample sizes”
(Cell K2, or use the slider bar below) for both: • Medium Noise • Medium Noise with Level Shift.
• What effect would changing the “In-sample size” have on estimation of length of SMA in the Training Set and evaluation of forecast accuracy in the Test Set?
Simple Exponential Smoothing (SES)
Slide 24 NATCOR – Exponential Smoothing
Suppose data does not have seasonality or systematic trend Data may have outliers and/or level shifts.
Exponential Smoothing adjusts the last forecast by a fraction (α) of the last forecast error:
Example • Previous Forecast = 100 • Previous Actual = 90 • Previous Error = -10 • Smoothing Constant (α) = 0.2 • New Forecast = 100 + (0.2 x (-10)) = 98
ttttt eyy α+= −+ 1||1 ˆˆ
SES: Error Correction & Standard Forms
Slide 25 MSCI 523 – Exponential Smoothing
Error Correction Form
Standard Form
Substitute for the error expression in Error Correction Form:
This is a weighted average of the last actual and last forecast.
ttttt eyy α+= −+ 1||1 ˆˆ
1|ˆ −−= tttt yye
1|1||1 ˆˆˆ −−+ −+= ttttttt yyyy αα
1||1 ˆ)1(ˆ −+ −+= ttttt yyy αα
Calculation of SES
Slide 26 NATCOR – Exponential Smoothing
• Initialise Forecast in period 2 by using Naïve method. • Can then optimise α (0 ≤ α ≤ 1). • Alternatively, can optimise both Initial Forecast and α.
Period Actual SES(0.3) Sqd Error SES(0.7) Sqd Error 1 90 2 85 90.0 25.0 90.0 25.0 3 83 88.5 30.3 86.5 12.3 4 92 86.9 26.5 84.1 63.2 5 98 88.4 92.3 89.6 70.3 6 81 91.3 105.6 95.5 209.8 7 94 88.2 33.7 85.3 74.9 8 150 89.9 3607.7 91.4 3433.5 9 86 108.0 482.0 132.4 2154.9
10 90 101.4 129.2 99.9 98.5 11 104 98.0 93.0 12 96 98.0 93.0
Overall MSE 563.4 764.7
SES in EXCEL
Slide 27 NATCOR – Exponential Smoothing
SES Exercise
• Make sure you have the Solver Add-In (File, Options, Add-Ins, Solver Add-In, OK) • Open spreadsheet at 6th tab: 6. Exponential Smoothing) • Select “Medium Noise with Level Shift” at 2nd tab and then
return to 6th tab. Input 24 to Cell X3 (In Sample Size). • Initialise forecast (naïve) in Cell C3 • Calculate Training Set 1-step-ahead forecasts (C4:C26) • Note that Test Set forecasts are all the same as C26.
• Experiment with different alpha values (Cell P3) • Optimise alpha, and check Test Set accuracy
How SES addresses Noise
Slide 28 NATCOR – Exponential Smoothing
• Low smoothing constants (alpha values) filter noise. • High smoothing constants have little filtering effect. • BUT: high smoothing constants react more quickly to
level shifts.
SES and Trended Series
Slide 29 NATCOR – Exponential Smoothing
1948 1958 1968 1978 1988 1998 2008 2018 200000
400000
600000
800000
1000000
1200000
1400000UK Gross Domestic Product: chained volume measures
Year
GD
P
Alpha = 0.2
1948 1958 1968 1978 1988 1998 2008 2018 200000
400000
600000
800000
1000000
1200000
1400000UK Gross Domestic Product: chained volume measures
Year
GD
P
Alpha = 0.7
• With a low alpha, SES does not keep up with trend and produces a poor forecast.
• With a high alpha, SES keeps up better, but is not filtering the noise well and produces a forecast that could be improved.
SES and Seasonal Series
Slide 30 NATCOR – Exponential Smoothing
10/26/0810/27/0810/28/0810/29/0810/30/0810/31/0810/26/08 40000
50000
60000
70000
80000
90000
100000
110000UK Hourly Electricity Demand
Day
Dem
and
10/26/0810/27/0810/28/0810/29/0810/30/0810/31/0810/26/08 40000
50000
60000
70000
80000
90000
100000
110000UK Hourly Electricity Demand
DayD
eman
d
Alpha = 0.2 Alpha = 0.7
• With a low alpha, seasonality is not captured. • With a high alpha, the noise is not smoothed AND the seasonal
pattern is out by one period. • In both cases, the forecasts are poor.
Is SES a Model-Based Method?
Slide 31 NATCOR – Exponential Smoothing
• It is sometimes stated that SES is an ‘ad hoc’ or ‘heuristic’ method, lacking a model-based foundation.
• This is wrong!
• It is true that when SES was first proposed, the method lacked a model foundation.
• Since then, two model forms have been found to underpin SES: • ARIMA(0,1,1) Model • State Space Local Level Model
• Model formulations become useful when looking at a whole
family of Exponential Smoothing models (including trend and seasonality).
Summary
Slide 32 NATCOR – Exponential Smoothing
• Arithmetic Mean robust to outliers but very slow to respond to level shifts.
• Naïve responds immediately to level shifts but does not filter noise.
• Simple Moving Average (SMA) may be a good compromise but is not part of a wider family of model-based methods.
• Simple Exponential Smoothing (SES) allows suitable weights to be identified for past data and is part of a wider family of model-based methods.