frequency independent automatic input variable selection
TRANSCRIPT
![Page 1: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/1.jpg)
Universität Hamburg Institut für Wirtschaftsinformatik
Prof. Dr. D.B. Preßmar
Frequency independent automatic input variable
selection for neural networks for forecasting
www.lancs.ac.uk
NikolaosNikolaos KourentzesKourentzesSven F. CroneSven F. Crone
LUMS –Department of Management Science
![Page 2: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/2.jpg)
MotivationMotivation
► Large numbers of univariate time series are often needed to be forecasted
automatically in business and other contexts [Hyndman & Khandakar, 08]
Large Scale Automatic Forecasting ProblemsLarge Scale Automatic Forecasting Problems
10,000 + productsForecast daily
► Questions: Appropriate forecasting method? Correct specification? [Goodrich, 00]
► Typically time series periodicity is provided by experts� Fully automatic?
Forecast daily
Automatic forecasting ���� Necessary!
![Page 3: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/3.jpg)
MotivationMotivationWhy Neural Networks?Why Neural Networks?
► Promising performance 64% (out of 126) articles found ANNs outperforming
benchmarks [Kourentzes, 10] (73% according to Adya & Collopy, 98)
► Large scale studies (100+ time series) NN at least as good as benchmarks [Hill et al., 96,
� Evidence of automatic forecasting with NNs
NN in Business Time Series Forecasting NN in Business Time Series Forecasting
Liao & Fildes, 05] � Evidence of automatic forecasting with NNs [Crone & Kourentzes, 10]
► Forecasting Competitions (M3) → NN lower accuracy than statistical models[Makridakis & Hibbon, 00]
► NN produce unreliable forecasts → criticised to “offer little promise even after
much research“ [Armstrong, 06]
Why? What does NN research suggest?
![Page 4: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/4.jpg)
MotivationMotivationFocus on the input vectorFocus on the input vector
► Problem caused by inconsistent trial and error modelling approaches [Zhang et al., 98]
► Input variable selection
Modelling complexity gives rise to the problemsModelling complexity gives rise to the problems
� The most important issue in forecasting with NN [Zhang, 01, Zhang et al., 01, Zhang et al.,
98, Darbellay and Slama, 00 ]
� No widely accepted methodology on how to select the input variables [Anders
and Korn, 99, Zhang et al., 98]
► Fully automatic input selection implies knowledge of time series
frequency/periodicities � Ignored in “automated forecasting applications”
Focus on frequency identification & the input variables selection
![Page 5: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/5.jpg)
Iterative Neural FilterIterative Neural Filter
► Step 1. Identify seasonal frequencies using the Iterative Neural Filter (INF)
A methodology to identify seasonal frequencies and inputsA methodology to identify seasonal frequencies and inputs
► Step 2. Identify lagged inputs
► Step 3. Fit model & produce forecasts
![Page 6: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/6.jpg)
► Split time series in different possible seasonalities → find mean euclidean
distance
Iterative Neural FilterIterative Neural Filter
Step 1. Identify seasonal frequencies
Euclidean Distance to Identify SeasonalityEuclidean Distance to Identify Seasonality
0
1
Y = sin(2πt/12)
2 4-1
0
1
t
yt
s = 5 - Distance: 0.847
2 4 6 8 10 12-1
0
1
t
yt
s = 12 - Distance: 0
5 10 15-1
0
1
t
yt
s = 19 - Distance: 0.962
5 10 15 20-1
0
1
t
yt
s = 24 - Distance: 0
5 10 15 20 25 30 35 40 45 50-1
Multiple Multiple seasonalitiesseasonalities (12, 24, 36, ...) (12, 24, 36, ...) �� Identification problemIdentification problem
![Page 7: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/7.jpg)
► Split time series in different possible seasonality (periodicity) → find
euclidean distance
Iterative Neural FilterIterative Neural Filter
Step 1. Identify seasonal frequencies
Penalised Euclidean DistancePenalised Euclidean Distance
20
40
60
Mean
Euclidean
Distance:
Season 4
20
40
60
Mean
Euclidean
Distance:
Season 7
0 50 1001
1.5
2
2.5
Season
Dis
tance
Penalised Euclidean Distance)log()1log()( sDsD sps τ−+=
1 2 3 40
20Distance:
391.62 4 6
0
20 Distance:
171.2
Identify seasonality avoiding multiples
![Page 8: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/8.jpg)
Iterative Neural FilterIterative Neural FilterStep 1. Identify seasonal frequencies
I1
I2
H1
.
.
.
.O
1Y
=S
tt
πψ
2sin)(1
=S
tt
πψ
2cos)(2
Identified seasonality
Deterministic inputs
The iterative neural filter removes each identified seasonalities ���� explore remaining information for additional seasonalities
I3
I4
.
.H
n
.
.
1Y
1
If more seasonalities are identified add more inputs in each iteration...
=S
tt
πψ
2sin)(1
=S
tt
πψ
2cos)(2
S
tt =)(3ψ
1)(4 +−= tNtψ
Trends, level shifts, etc
![Page 9: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/9.jpg)
Iterative Neural FilterIterative Neural FilterStep 1. Identify seasonal frequencies
50 100 150 200450
500
550
t
Input Time series
Iteration 1:
0 50 1001
1.5
2
2.5
Season
Dis
tance
Penalised Euclidean Distance
Penalised Distance
50 100 150 200450
500
550
t
INF output
Season = 1 � Stop!
50 100 150 200-30
-20
-10
0
10
20
30
t
Input Time series
0 50 1000.8
1
1.2
1.4
1.6
1.8
Season
Dis
tan
ce
Penalised Distance
Subtract the INF output from the input time series and repeat
Iteration 2:
![Page 10: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/10.jpg)
Iterative Neural FilterIterative Neural FilterStep 1. Identify seasonal frequencies
![Page 11: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/11.jpg)
Iterative Neural FilterIterative Neural FilterStep 2. Identify inputs
Fit two competing regressions:
t
N
j
S
i
jitjitD
s j
dbMaY ε+++= ∑∑= =1 1
ˆDeterministic
t
N
j
SjtjtS
s
YbaY ε++= ∑=
−1
ˆStochastic
Moving average of order max(Sj)
� Compare using AIC
![Page 12: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/12.jpg)
Iterative Neural FilterIterative Neural FilterStep 2. Identify inputs
►Use stepwise regression►Use stepwise regression
� Force as initial inputs the pre-identified inputs
(deterministic/stochastic)
► Input vector identified
![Page 13: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/13.jpg)
Experimental SetupExperimental SetupTime series
► Synthetic time series:
• Deterministic / Stochastic
• Four different levels of noise (None, Low, Medium, High)
• Quarterly and monthly seasonality & Day of the week and year double
seasonality
Empirical Evaluation – Time series
seasonality
• Total time series: 520
►Real time series:
• US air passenger miles
• Average bus ridership for Portland Oregon
• Total number of room nights and takings in Victoria
• Number of serious injuries and deaths in UK road accidents
![Page 14: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/14.jpg)
ResultsResults
►Use INF to identify inputs for neural networks (Primed NN)
►Use NNs with automatically identified inputs as benchmarks (Stoch_NN)
� Inputs identified using regression [Swanson & White, 98, Kourentzes, 10]
►Use exponential smoothing as statistical benchmark (EXSM)
� Robust and accurate benchmark [Makridakis & Hibbon, 00]
� Use INF seasonality output to setup seasonal models
Synthetic Data Real Data
Subset Primed_NN Stoch_NN EXSM Subset Primed_NN Stoch_NN EXSM
Train 7.25% 7.45% 7.68% Train 8.52% 7.86% 7.82%
Valid 7.16% 7.47% 7.52% Valid 5.06% 5.91% 6.83%
Test 7.37% 7.70% 7.47% Test 7.86% 11.95% 10.72%
![Page 15: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/15.jpg)
ConclusionsConclusions
► Proposed methodology identifies seasonal frequencies and inputs for neural
networks automatically
►Outperforms statistical and neural network benchmarks
► INF is useful to fully automate other forecasting methods � seasonal frequency► INF is useful to fully automate other forecasting methods � seasonal frequency
identification without the need for human experts
► Future work: Introduce stochastic elements in IMF to separate more accurately
the seasonal components
![Page 16: Frequency independent automatic input variable selection](https://reader030.vdocuments.us/reader030/viewer/2022040720/624d1af86eb7105f793d5a4b/html5/thumbnails/16.jpg)
Nikolaos KourentzesLancaster University Management SchoolCentre for ForecastingLancaster, LA1 4YX, UKTel. +44 (0) 7960271368email [email protected]