mind the gap? - willkommen an der zhaw · arima sarimax multiple regression forecast combination...
TRANSCRIPT
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 1
Mind the Gap?
Hype-Cycle versus Business Reality in Data Science (a forecasting perspective)
Sven F. Crone
16/09/2016
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 2
• The Hype?• Gartner’s Hype Cycle over time
• The Gap?• Is there a Gap?• If so, how large is it?
• How to close the Gap?• More research?• More applications?
Mind the Gap?
Agenda
Hype-Cycle for Emerging Technologies
New Technologies make bold promises
how to discern the hype
from what's commercially viable?Should you make an early move?
Take a moderate approach?
Wait for further maturation?
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 3
Hype-Cycle for Emerging Technologies
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 4
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 5
We are on
the plateau
of productivity!
Are we all on
the plateau
of productivity?
In 2015 Gartner dropped Big Data,
Analytics, and Data Science!
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 6
Hype Cycle for Advanced Analytics
… looks like we are
back in the hype!
(and have 5 more years)
http://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html/2
The average 2016 respondent used 8.1 algorithms, a big increase vs a
similar poll in 2011.
unclear which TS algorithms are used?
844 voters
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 7
Brown (1956)
Box & Jenkins (1970)
Rumelhart & McClelland (1986).
“the number of components in integrated circuits (IC) has doubled
every year from its invention in 1958 until 1965” predicted that the trend would continue "for at least ten years"
Moore (1965)
Breiman (1996)
Breiman
(2001)
Freund & Schapire
(1997)
Boser, Guyon, and Vapnik (1992)
Harvey (1989)
Quinlan (1986)
Hochreiter & Schmidhuber
(1997)
“Data Science” 1997
“Data Mining” 1995
“Analytics” 200s
Cover & Hart (1967)
• The Hype?– Gartner’s Hype Cycle over time
• The Gap?• Is there a Gap?• If so, how large is it?
• How to close the Gap?• More research?• More applications?
Mind the Gap?
Agenda
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 8
Evidence from Forecasting Practice?
Practitioner Survey• Target group: demand planning & forecasting professionals (in manufacturing)• LCF Mailing list, forecasting lists / blogs (ISF, SAS), 100s of
LinkedIn Groups, 2000+ personalised LinkedIn invites
540 responses (representative) 200 valid surveys in manufacturing
Robust Questionaire Design• Pilot study in 2011 (to ensure validity)• Final pre-version pre-tested with 18 FMCG forecasters• Conducted January 2012-August 2012• Validity for large manufacturers active in the FMCG / CPG industry
Evidence from Forecasting Practice?
Use of judgment introduces (cognitive) biases
Use of judgment limits automation!
(i.e. “Statistics” aka Data Science or “Gut-Feel”?)
70% of industry forecasters
employ some form of judgment
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 9
Evidence from Forecasting Practice?
43% use simplest methods only for level time series
(28% MA+15% Naive)
43% use simplest methods only for level time series
(28% MA+15% Naive)
75% use simplest methods only for level time series (32% ETS + 28% MA+15% Naive)
very limited use of post 1980s
methods
Use of methods developed & implemented in the 1950s-1960s
Only very few methods use explanatory
variables
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 10
Does your vehicle fleet look like this?
BMW Isetta (1956-1962)
Austin Healey 3000 (1959)
Working with IBM type 704 electronic data
processing machine used for making computations for aeronautical research.
21 March 1957, NASA
Does your hardware look like this?
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 11
Do your employees look like this?
Then why do your
algorithms look like this?
𝑳𝒕 = 𝜶 ∙𝒀𝒕𝑺𝒕−𝒔
+ 𝟏 − 𝜶 ∙ 𝑳𝒕−𝟏 + 𝑻𝒕−𝟏
𝑻𝒕 = 𝜷 ∙ 𝑳𝒕 + 𝑳𝒕−𝟏 + 𝟏− 𝜷 ∙ 𝑻𝒕−𝟏
𝑺𝒕 = 𝜸 ∙𝒀𝒕𝑳𝒕+ 𝟏− 𝜸 ∙ 𝑺𝒕−𝒔
𝑭𝒕+𝒉 = 𝑳𝒕 + 𝑻𝒕 ∙ 𝒉 ∙ 𝑺𝒕−𝒔+𝒉
Enos @ Mercury-Atlas 5
Juri Gararin
John F. Kennedy is 35th President of the USA
Pigs‘s bay invasion
Then why do your
algorithms look like this?
𝑳𝒕 = 𝜶 ∙𝒀𝒕𝑺𝒕−𝒔
+ 𝟏− 𝜶 ∙ 𝑳𝒕−𝟏 + 𝑻𝒕−𝟏
𝑻𝒕 = 𝜷 ∙ 𝑳𝒕 + 𝑳𝒕−𝟏 + 𝟏− 𝜷 ∙ 𝑻𝒕−𝟏
𝑺𝒕 = 𝜸 ∙𝒀𝒕𝑳𝒕+ 𝟏− 𝜸 ∙ 𝑺𝒕−𝒔
𝑭𝒕+𝒉 = 𝑳𝒕 + 𝑻𝒕 ∙ 𝒉 ∙ 𝑺𝒕−𝒔+𝒉
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 12
𝑳𝒕 = 𝜶 ∙𝒀𝒕𝑺𝒕−𝒔
+ 𝟏− 𝜶 ∙ 𝑳𝒕−𝟏 + 𝑻𝒕−𝟏
𝑻𝒕 = 𝜷 ∙ 𝑳𝒕 + 𝑳𝒕−𝟏 + 𝟏− 𝜷 ∙ 𝑻𝒕−𝟏
𝑺𝒕 = 𝜸 ∙𝒀𝒕𝑳𝒕+ 𝟏− 𝜸 ∙ 𝑺𝒕−𝒔
𝑭𝒕+𝒉 = 𝑳𝒕 + 𝑻𝒕 ∙ 𝒉 ∙ 𝑺𝒕−𝒔+𝒉
Enos in Mercury-Atlas 5
Juri Gararin in space
John F. Kennedy is 35th President of the USA
US in Pigs‘s bay
New Tools!
“It can drive a 6-D nail thru a
2 X 4 at 200 yards”
New Tools?
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 13
Brown (1956)
Box & Jenkins (1970)
Rumelhart & McClelland (1986).
Breiman (1996)
Breiman
(2001)
Freund & Schapire
(1997)
Boser, Guyon, and Vapnik (1992)
Pearl & Reed (1920)!
Harvey (1989)
„Maturity“
Today
Quinlan (1986)
Hochreiter & Schmidhuber
(1997)
Gap?
How to measure
„algorithm maturity“?
dog has ø10 years life expectancy
= 5.6 generations
= 403 dog years
IT-Hardware §253 Abs. 3 Satz 1 HGB
Mainframes: ø7 years depreciation
Workstations: ø3 years
=11.2 generations
= 784 IT years
1:1 in human years?
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 14
• The Hype?– Gartner’s Hype Cycle over time
• The Gap?• Is there a Gap?• If so, how large is it?
• How to close the Gap?• More research?• More applications?
Mind the Gap?
Agenda
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 15
ISI Web of Science ti=((neural netw*) or (perceptron*) or (deep learn*)) and (forecast* or (time series*))
3900 papers on Neural Networks (only in forecasting, 117,923 in total)
377 papers on Exponential Smoothing
ISI Web of Science ti=(exponential smoothing)
5,657 in topic
327,693 in topic
Upgrade to 1970s technology … ?
Siemens System 4004
Intel 4004 (4-bit bus,
108-KHz CPU chip, 1971)
OCC1 Osborne 1
Personal Business Computer
busicom calculator 141P
(Intel 4004 powered Datapoint 2200 (1972)
ARIMA
SARIMAX
Multiple Regression
Forecast Combination (Ensembles)
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 16
Upgrade to 1980s technology … ?
…
Advanced Exponential Smoothing
GETS Multiple Regression, SCANPRO
Double & Triple Seasonality ETS
Bootstrapping, zero-inflated demand
Upgrade to 1990s technology … ?
State space models
Combination (Ensembles)
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 17
Upgrade to 2000s technology … ?
Artificial
Neural
Networks
A Network of NeuronsNew Tools
Each MLP calculates a NARX(p)-process
0
50
100
150
200
... ...
yt
yt-1
u1
un+m+1
un+3
un+2
un+1
u2
yt-2
yt-n-1
u3
un
...
un+m
un+m+2
un+m+h
θ n+1
q n+2
q n+3
q n+m
q n+m+1
q n+m+2
q n+m+h...
...
1ˆ
ty
2ˆ
ty
ˆt hy
fu(·)
fu(·)
fu(·)
fu(·)
fu(·)
fu(·)
fu(·)
1ˆ , ,...,t h t t t n t hy f x x x
Multilayer Perceptrons Class of statistical methods for (nonlinear auto-)regression Combination of simple processing units complex system behaviour
Squared Error Loss SE(et)
2 1 1 2ep
2
1.5
1
0.5
0.5
1
1.5
2
epR
R 4
R 2 SE e
R 1 AE e
Verlustfunktion
Multilayer
Perceptron
(MLP)
NARX(p)
process
NARX(p)
process
NARX(p)
process
NARX(p)
process
Weighted
average of
4 NARX(p)
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 18
Bootstrap and Aggregating
1)
01234
1 4 7 10 13 16 19 22
Fre
qu
en
cy
0
5000
10000
Val
ue
M1(x)S’1(x,y)
0
5000
10000
1 4 7 10 13 16 19 22
0
5000
10000
Val
ue
0
1
2
3
4
1 4 7 10 13 16 19 22
Fre
qu
en
cy
2)
M2(x)S’2(x,y)
0
5000
10000
1 4 7 10 13 16 19 22
3)
01234
1 4 7 10 13 16 19 22
Fre
qu
en
cy
0
5000
10000
Val
ue
M3(x)S’3(x,y)
0
5000
10000
1 4 7 10 13 16 19 22
Combine
0
2000
4000
6000
8000
1 4 7 10 13 16 19 22
Bagged output:
Me(x) = M1(x) + M2(x) + ... + Mk(x)
k
18-Aug-2007 01-Dec-2007 15-Mar-2008 28-Jun-2008 11-Oct-2008 24-Jan-2009 09-May-2009 22-Aug-2009 05-Dec-20090
500
1000
1500
2000
2500
3000
3500Training Out-of-Sample
Time Series: 5 | Product: 1080215 (Trend: 0, Season13: 0, Season52: 1)
f(X) = +125.37*Constant +0.71*Lag1 +0.07*Lag2 +610.76*Adcode: 401 +617.24*Discount -1569.46*Xmas -696.48*Xmas+1 +1209.29*Xmas-2 -1581.59*Easter-1 -894.91*Labour
Data
NN-R
Regression
Fit appropriate forecasting models to:
• Predict future value
• Understand effect (elasticity) of each exogenous factor affect forecasts
Retailing and promotional modelling
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 19
Post-regression
Simulated time series of AR(3)X with weather & price variables
Models nonlinear interactive Demand-Price Elasticity
Reality is highly nonlinear, especially when considering multiple variables.
Neural networks (and other advanced nonlinear causal models) are a way forward
20 40 60 80 1000
0.5
1
1.5
2
2.5
3
3.5x 10
4
Xt
Ft
Actual
Regression
NN
0 100 200 300 400 500 600 700 800 900500
1000
1500
Time
Units
0 100 200 300 400 500 600 700 800 9000
100
200
300
Time
Price
20 40 60 80 100 120 140 160 180 2000
2000
4000
6000
Time
Units
20 40 60 80 100 120 140 160 180 2000
10
20
30
40
Time
Tem
pera
ture
0 10 20 30-2000
0
2000
4000
6000
8000
Xt
Ft
Actual
Regression
NN
4060
80100
120140
-10
010
2030
40-2
0
2
4
6
x 104
Xt2X
t3
Ft5
Actual
Regression
NN
Improved Forecasting Algorithms
Median sMAPE Train Valid Test
Seasonal Linear Regression (35B) 38.41609 16.61117 18.20583
MLP AR, SinCos + Selection 17.82759 11.73440 9.13354
Improvement -20.5885 -4.87677 -9.07229
Improvement in % -53.59% -29.36% -49.83%
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 20
Improved Model Selection: Human expert vs. system?
sMAPE Test Errors in %Prior
ModelIF
ModelChange %
pointsChange in %
# items
Canada 40,7 33,8 -6,9 -16,9% 47Germany 55,4 51,7 -3,7 -6.8% 155France 43,7 42,6 -1,2 -2.7% 262Greece 50,9 49,4 -1,7 -3.3% 196Italy 42,7 39,9 -2,8 -6.5% 175Netherlands 41,0 38,9 -2,1 -5.1% 154Poland 55,2 47,1 -8,1 -14.7% 78South Africa 37,3 35,9 -1,4 -3.7% 36Average -7.5%
• Automatic Model Selection able to improve slightly on Expert
• Automatic Model Selectio n reduces work significantly
How to define if things “look” similar?
5 10 15 20 250
200
400
600
800
1000
1200
All time series with up to 27 months of observations
Hard to extract useful information
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
200
400
600
800
1000
1200
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
200
400
600
800
1000
1200
No additional information
Find time series that
behave similarly
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
200
400
600
800
1000
1200
Self-Organising Maps
clustering can do it
1 2 30
500
1000
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
200
400
600
800
1000
1200
1 2 30
500
1000
1 2 30
500
1000
1 2 30
500
1000
Focus on the first three months (new products)
New Product Clustering
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 21
Outlier Identification
02/04/01 04/04/01 06/04/01 08/04/01 10/04/01 12/04/01 14/04/01 16/04/01 18/04/01 20/04/01 22/04/01 24/04/01 26/04/01 28/04/01 29/04/015
6
7
8
9
10x 10
4
Date
Load
Week 1 Week 2 Week 3 Week 4
2 4 6 8 10 12 14 16 18 20 22 24
4
6
8
10
12
x 104
Hour
Normal
Outlier
1. Outlying daily patterns• Functional outliers - in level & shape
• Predictable reasons: Bank Holidays
• Unpredictable: Natural disasters, hurricanes, Industrial
actions, etc
2. Outlying Spikes within daily pattern• Special occurrences during normal day
Newer (& better?)
algorithms exits!
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 22
Extend your
Tool Box!
Top 10 algorithms in data mining
Wu, Kumar, Quinlan, Ghosh, Yang, Motoda, Hand, Steinberg (2008)
Artificial Neural Networks,
Support Vector Regression
C4.5, k-Means, SVM, Apriori, EM,
PageRank, AdaBoost, kNN,
Naive Bayes, and CART
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 23
Ok for some tasks!
Old Tools!
Not most efficient
for all tasks!
Worlds largest tree-house – 10 stories tall
Old Tools!
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 24
New Tools!
Useless …
for the task!
New Tools!
Useless.
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 25
Beware!
Exponential
Smoothing in
the cloud … is
still Exponential
Smoothing!
New Old Tools!
“Buying the most expensive
Hammer …
…does not make you
the best Carpenter!”
Beware!
Crafstman
or Novice?
(analyst &
IT team … )
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 26
APO DP: 78% (45% ETS + 22% Av+ 11% Naïve) use simple methods!
APO drives use of simpler methods than other software packages
Study Results (all industries)Use of Forecasting Methods
Exp.Smoothing is the single most used method
family in APO DP
Only marginal use of Linear Regression in SAP APO DP
Excel drives useof even simpler
methods
(low-cost) specialist software
drives use ofcomplex methds
Ask yourself …
Can you afford
to ignore 60 years
of progress?
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 27
Artificial Intelligencemeets Forecasting.
Enhanced alorithms & model selectionfor
Mind the Gap in Data Science©2016 Sven F. Crone | All rights reserved. page 28
Take aways
• The hype is real– … but it‘s just a hype you can use!
• The gap is substantial. – Companies are slow adopters to algorithms– (not all) Software companies are innovators
• The potential is substantial– High opportunities from low-cost pilot studies– Try new algorithms!
• Neural Networks• Support Vector Regression• Decision Trees• K-Nearest Neighbours• …