forecast, detect, intervene: anomaly detection for time series. deepak agarwal yahoo! research
TRANSCRIPT
![Page 1: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/1.jpg)
Forecast, Detect, Intervene: Anomaly Detection for Time
Series.Deepak Agarwal
Yahoo! Research
![Page 2: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/2.jpg)
Outline
• Approach– Forecast– Detect– Intervene
• Monitoring multiple series– Multiple testing, a Bayesian solution.
• Application
![Page 3: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/3.jpg)
Issues• {yt} : univariate, regularly spaced time series to be
monitored for anomalies, “novel events” , surprises prospectively.– E.g. query volume, Hang-ups, ER admissions
• Goal: A semi-automated statistical approach – Forecast accurately : good baseline model.– Detect deviations from baseline:
• sensitivity/specificity/timeliness
– Baseline model adaptive: learn changes automatically • Important in applications : better forecasts →fewer false +ve
![Page 4: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/4.jpg)
Approach• Three components : (West and Harrison, 1976)
–Forecast: Bayesian version of Kalman filter
–Detection: A new sequential algorithm
– Intervention: correct baseline model.
![Page 5: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/5.jpg)
Forecasting
![Page 6: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/6.jpg)
Kalman filter
• Observation Equation– Conditional distribution of data given parameters
• State Equation – Evolution of parameters (states) through time
• Posterior of states, predictive distribution– Estimated online by recursive algorithm
![Page 7: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/7.jpg)
OBSERVATION EQUATIONy ~ (0, )
t
: observed value: mean (unknown and estimated from data): with variance
(usually unknown and estimated from data)
' independent conditional on 'i.e., if the truth
t
t
v N Vt t
ytTruet
v Noise Vt
y s st t
1
is known, ' provide no informationto predict the future.
This model severely overfits (more unknowns than knowns)Need to make simplifying assumptions to estimate the
true mean surface { }t T
t
y st
t
![Page 8: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/8.jpg)
STATE EQUATION: What assumptions are appropriate for the “Truth”?
for all t
Too simple, performs poorly unless mean stationary.(Rarely works on real data)
position of particle at described by some differential equationt.
t
te g
Simple :
Determined by system dynamics :
. (Wikle,1998) studies ecological processes using reaction-diffusion equations.
:
Works well for empirical data analysis.Assumes function of true mean at previous time poit
Simple Markovian assumptions
0 gives back the constant model.
ntse.g. Simple Random Walk: ~ (0, )t 1
:
w N Wt tt
NB Wt
![Page 9: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/9.jpg)
More general models
1
equationy ~ (0, )
equation~ (0, )
updated using Kalman filter
One can almost take any static model and make it dynamic.
e.g, is a 7-dim vector correspo
Tt t t t t
Tt t t
t t t t t
t
t
observationx Vx
StateG w W
nding to day of week effects.
Covariates whose coefficients evolve dynamically
1t
Yt-1 Yt
t
xtXt-1
Gt
Yt-1
Xt-1
![Page 10: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/10.jpg)
Kalman Filter update at time t: t-1 t-1 1 1 1
t t 1
1 1
1
t
(a) Posterior for : ( | ) ~ ( , )
( ) Prior for : ( | ) ~ ( , )
Prior parameters derived by plugging posterior estimates in state equation
0
( )Likelihood of y :
t t t
t t t
t t t
t t t
D N m C
b D N a R
a m mR C W
c
t t
1 t t-1 t
t
t 1
( | ) ~ ( , )
( )Posterior for : ( | ) ~ ( , )
(y -m ); C A /( )
(1 A )
If , this gives the well known exponentially weighted mo
t t t t
t t t
t t t t t
t t t
t t t t
t
y N V
d D N m C
m m A AV
R R V
m A y m
A A
ving average (EWMA). Under mild conditions, this is true.
![Page 11: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/11.jpg)
Estimating Variance componentst t-1
t 1 1
Assume W C
W (1 ) / / (0 1)
called "discount" factor, makes prior at vague to enablecurrent estimates being influenced more by recent past.
0:Prior too vague.1: Pri
t w w t t w w
w
w
w
C R C
t
2t 0 0
or too tight, close to a static model.
plays a role similar to window size in streaming algorithms.
In practice, it is better to be conservative and choose smaller .Reason: Var(e | ) (1 ( )
w
w
w wD Q
2/(1 ))
We select using initial data and re-adjust the value later if needed.
w
w
![Page 12: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/12.jpg)
Detection
![Page 13: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/13.jpg)
An existing method
errors. correlated-auto d)
changes variancec) shiftsmean b)a)outlier :Detects-
s.for factor Bayeson based algorithm sequentialA -
:
10 iid s null,Under -
)|(/))|(( 11
t
t
tttttt
u
),N(u
DyVDyEyu
'05) Salvador, and llowork(Garga Related
Gaussian ondistributi predictive when Residual
![Page 14: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/14.jpg)
![Page 15: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/15.jpg)
Pitfalls of GS
• What if predictive not Gaussian?– Mixtures of Gaussians, Poisson etc
• Bayes factor: specify alternative explicitly– Large number of unspecified parameters– Require explicit model for each alternative
![Page 16: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/16.jpg)
Our approach
• Normal scores derived from p-values– Good for continuous, approximately good for
discrete, especially for large means.
• A sequential procedure with far less tweaking parameters.
• Our method has more power, we sacrifice on timeliness.
![Page 17: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/17.jpg)
Sequential detection procedureAt time t, we are in one of these regions:• Acceptance region (A): The null model is true,
the system is behaving as expected, no anomalies, start a new run.
• Rejection region (R) : The null model is not true, an anomaly is generated which is reported to the user and/or the forecasting model is reset. Start a new run.
• Continue (C): Don’t have enough data to reach a decision, keep accumulating evidence by taking another sample.
![Page 18: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/18.jpg)
Detecting outliers and mean shifts
r
r t-i+1i=0
inc
: | |
All other changes based on most recent run of ( 1) points in
: : ' are iid ( ,1) (|h|>0).
Test based on u u / ~ (0,1/ ) under null.
Test statistics: Pmean =
t
h i
u a
r
M u s N h
r N r
Outlier
C
Mean shifts
h=0 r r
dec h=0 r r
P (u u ) (to detect mean increase.) Pmean = P (u <= u ) (to detect mean decrease.)
Large shifts, would be identified as outliers. Moderateshifts when identified can
obs
obs
provide important information about the system.
![Page 19: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/19.jpg)
Detecting variance shifts
k
2 2 2- 1
02 2( )
1
1
M : ' are iid (0, )
Test based on ~ with under null.
Test statistics: var ( ) (to detect variance increase) var (
tr
r t i ri
obsinc k r r
dec k r
u s N k
U u df r
P P U UP P U
2 2( ) ) (to detect variance decrease)
Large variance increase will be identified as an outlier, moderateincrease when detected can be important in applications.
Variance decrease indicates the s
obsrU
ystem getting stable or the discountparameters being set too low. Helps in improving the forecasting model,may not be that important to the user.
![Page 20: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/20.jpg)
Gradual changes, auto correlated errors.
g i 1
Local patches of auto-correlated errors (Gargallo and Salvador, 2003).
Roughly, the residuals follow an AR(1) process.
M : u ~ (0,1)(0 1)
Residuals tend to be positive(negative) more often th
i igu N g
an negative(positive).Gradual ramp up(down) in the residuals, short run of consecutive moderatechanges.
Report the change, relax the discount factors to learn the gradual shifts.
Use two test statistics2 2
2- 1 1
0 0
1
:
g= / ~ (0,1/ ) under null for large r
For small r, exact distribution not available in closed form.
(1 ) ~ (0, (12
r r
t i t i t ii i
t t t
u u u N r
u u u N
(obs) (obs)1 g=0 2 g=0
(1 ) )) under null.
Test statistics: AC =P (g > g ); AC =min(P (g > g ), (| | | |)
r
obsnull t tP u u
![Page 21: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/21.jpg)
2 : Cauchy with scale 1 and center 0. 2, analytical form not known, well
approximated by mixture of two normals.
( ) (0,1) (1 ) (0,1 )
( , ) estimated using si
r
rr r g
f r g PN P N
P
Distribution of g under g = 0
mulationsfrom the distribution.
For 7, the normal approximation is good.r
r P τ
3 .84 14.91
4 .87 4.40
5 .96 3.77
6 .98 3.98
7 .99 3.66
8 .98 0.10
9 .98 0.10
10 .85 0.09
11 .85 0.09
![Page 22: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/22.jpg)
The sequential algorithm at time t
-1max 2
-1max 2
max 1 max
1: | | outlier else continue.
1: | | outlier compute max( ( , , var , var , ))
arg max( ( , , var , var , ))
accept null;
t
t
inc dec inc dec
inc dec inc dec
r u a
r u a elseS Pmean Pmean P P ACi Pmean Pmean P P AC
S c S
2 max
1 max 2
1 2
declare anomaly continue.
3.5, .25, 1.2, 3.0 for 5% false positive rate.
c ic S c
a c c
![Page 23: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/23.jpg)
Blue:ours; red: Gargallo and Salvador(GS)
![Page 24: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/24.jpg)
![Page 25: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/25.jpg)
Intervention
![Page 26: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/26.jpg)
Intervention to adjust the baseline.
• Outlier → A tail or rare event has occurred– Ignore points → short tail; more false +ve– Use points→ elongated tail, more false -ve
• A robust solution: ignore points but elongate tail – retain same prior mean, increase prior variance.– system adapts, re-initializing the monitor.
• Use the above for mean shifts and variance increase.
• Variance decrease: System stable, make prior tight.
• Slow changes: System under-adaptive, make prior vague.
![Page 27: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/27.jpg)
Intervention strategy
2t 1 1 1
w
Outlier, mean shift, variance increase:
| ~ ( , )( 2,3); max(.9,.95 ) .
Variance decrease:min(1,1.025 ); min(1,1.025 )
Slow change, positive autocorrelation, persistent bli
t t t t v t
new neww v v
D N m m R m n n
w
ps.
max(.7,.95 ); max(.9,.95 )new neww v v
![Page 28: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/28.jpg)
No intervention, m=1
![Page 29: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/29.jpg)
strong intervention, m=3
![Page 30: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/30.jpg)
Example: Blue is data, yellow is forecast.
![Page 31: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/31.jpg)
Multiple testing
![Page 32: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/32.jpg)
Multiple testing: A Bayesian Approach.
• Monitoring large number of independent streams – testing multiple hypotheses at each time point – Need correction for multiple testing.
• Main idea: – Derive an empirical null based on observed deviations– Present analyst with interesting cases adjusting for global
characteristics of the system.– We use a Bayesian approach to derive shrinkage
estimates of deviations– the “shrunk” deviations automatically build in penalty for
conducting multiple tests.
![Page 33: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/33.jpg)
Bayesian procedure.
series). timeofnumber (moderate Hermite-Gauss :BayesianFully
series) timeofnumber (large EB using estimated : etersHyperparam
residuals. large of shrinkage-over prevents :component Two
)./(
),;()1/(),;()1/(
))1(()1(),,,;|(
. of meansposterior monitoringby anomaliesDetect
).,()(1)1(~);,(~|
on.distributi predictive of value-pon based score normal:
12
121
21
tttst
ttstttttsttstst
stststststtstttttstst
st
ttttsttsttststst
st
B
uNDPeNDPqq
BuBqquE
NPPNu
u
![Page 34: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/34.jpg)
Experiment comparing multiple testing versus naïve procedure (threshold raw standardized
residuals)• Simulate K noise points N(0,1)
(K=500,1000,..), 100 signal points from [2,11]U[-2,-11].
• Adjust threshold of Bayesian residuals to match sensitivity of naive procedure.
• Compute False Discovery Rate (FDR) for both procedures.
![Page 35: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/35.jpg)
FDR of naive and Bayesian procedures. The Bayesianmethod gets better with increase in number of time series.
Calculations based on 100 replications.The differences are statistically significant.
![Page 36: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/36.jpg)
Application
![Page 37: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/37.jpg)
• Goal: To find leading indicators of social disruption events
in China before it gets reported in the mainstream media.• Approach: Monitor the occurrence of a set of pre-defined
patterns on a collection of Chinese websites (mainly news sites, government sites and portals similar to yahoo located in eastern China).
Motivating Application (bio-surveillance).
![Page 38: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/38.jpg)
English translation of some Chinese patterns being monitored
![Page 39: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/39.jpg)
Notations and transformation.
./1)(
)/)1(1000)/(1000(5.
Tukey -Freeman
./)(Var ;)/(
model,Poisson aUnder
rate. Occurence
.downloaded pages ofnumber
.day on websiteon pattern of freq
rates. ofon distributi thesymmetrize
;dependence ncemean varia remove tion totransforma
ijt
ijtijtijtijt
ijtijtijtijtijtijtijt
ijt
ijt
ththijt
nijt
ZVar
nSnSijt
Z
nrnSrE
n
tjiS
![Page 40: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/40.jpg)
Dotted solid lines: Days when reports appeared in mainstream media
Dotted gray lines: Days when our system found spikes related to the reports that appeared later.
![Page 41: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/41.jpg)
Rough validation using actual media reports.
• July 24th : mystery illness kills 17 people in China, we noticed several spikes on July 17th and 18th alerting us on this.
• Sept 29th and Dec 7th : On Sept 29th , news reports of China carving out emergency plans to fight bird flu and prevent it from spreading to humans. On Dec 7th , a confirmed case of bird flu in humans reported.
• We reported several spikes on Sept 12th and 14th, Nov 2nd, 7th, 11th, and 16th mostly for the pattern influenza, flu, pneumonia, meningitis. On Nov 21st , four big spikes on bf3.syd.com.cn on influenza, flu, pneumonia, meningitis;
emergency, disaster, crisis; prevention and quarantine.
![Page 42: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research](https://reader035.vdocuments.us/reader035/viewer/2022070401/56649f1b5503460f94c3078a/html5/thumbnails/42.jpg)
Questions?