![Page 1: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/1.jpg)
© 2014 MapR Technologies 1
© MapR Technologies, confidential
How to Find What You Didn’t Know to Look For
Anomaly Detection
October 14, 2014
![Page 2: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/2.jpg)
© 2014 MapR Technologies 2
Anomaly Detection: How To Find What You Didn’t Know to Look For
Ted Dunning, Chief Applications Architect MapR Technologies
Email [email protected] [email protected]
Twitter @Ted_Dunning
Ellen Friedman, Consultant and Commentator
Email [email protected]
Twitter @Ellen_Friedman
![Page 3: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/3.jpg)
© 2014 MapR Technologies 3
e-book available courtesy of MapR
http://bit.ly/1jQ9QuL
A New Look at Anomaly Detectionby Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
![Page 4: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/4.jpg)
© 2014 MapR Technologies 4
Practical Machine Learning series (O’Reilly)
• Machine learning is becoming mainstream• Need pragmatic approaches that take into account real world
business settings:– Time to value– Limited resources– Availability of data– Expertise and cost of team to develop and to maintain system
• Look for approaches with big benefits for the effort expended
![Page 5: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/5.jpg)
© 2014 MapR Technologies 5
Anomaly Detection
![Page 6: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/6.jpg)
© 2014 MapR Technologies 6
Who Needs Anomaly Detection?
Utility providers using smart meters
![Page 7: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/7.jpg)
© 2014 MapR Technologies 7
Who Needs Anomaly Detection?
Feedback from manufacturing assembly lines
![Page 8: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/8.jpg)
© 2014 MapR Technologies 8
Who Needs Anomaly Detection?
Monitoring data traffic on communication networks
![Page 9: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/9.jpg)
© 2014 MapR Technologies 9
What is Anomaly Detection?
• The goal is to discover rare events – especially those that shouldn’t have happened
• Find a problem before other people see it– especially before it causes a problem for customers
• Why is this a challenge?– I don’t know what an anomaly looks like (yet)
![Page 10: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/10.jpg)
© 2014 MapR Technologies 10
Spot the Anomaly
![Page 11: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/11.jpg)
© 2014 MapR Technologies 11
Spot the Anomaly
Looks pretty anomalous
to me
![Page 12: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/12.jpg)
© 2014 MapR Technologies 12
Spot the Anomaly
Will the real anomaly please stand up?
![Page 13: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/13.jpg)
© 2014 MapR Technologies 13
Basic idea:Find “normal” first
![Page 14: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/14.jpg)
© 2014 MapR Technologies 14
Steps in Anomaly Detection
• Build a model: Collect and process data for training a model• Use the machine learning model to determine what is the normal
pattern • Decide how far away from this normal pattern you’ll consider to
be anomalous• Use the AD model to detect anomalies in new data
– Methods such as clustering for discovery can be helpful
![Page 15: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/15.jpg)
© 2014 MapR Technologies 15
How hard is it to set an alert for anomalies?
Grey data is from normal events; x’s are anomalies.Where would you set the threshold?
![Page 16: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/16.jpg)
© 2014 MapR Technologies 16
Basic idea:Set adaptive thresholds
![Page 17: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/17.jpg)
© 2014 MapR Technologies 17
What Are We Really Doing
• We want action when something breaks (dies/falls over/otherwise gets in trouble)
• But action is expensive• So we don’t want too many false alarms• And we don’t want too many false negatives
• What’s the right threshold to set for alerts?– We need to trade off costs
![Page 18: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/18.jpg)
© 2014 MapR Technologies 18
A Second Look
![Page 19: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/19.jpg)
© 2014 MapR Technologies 19
A Second Look
99.9%-ile
![Page 20: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/20.jpg)
© 2014 MapR Technologies 20
New algorithm: t-digest
![Page 21: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/21.jpg)
© 2014 MapR Technologies 21
Online Summarizer
99.9%-ile
t
x > t ? Alarm !x
How Hard Can it Be?
![Page 22: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/22.jpg)
© 2014 MapR Technologies 22
Detecting Anomalies in Sporadic Events
![Page 23: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/23.jpg)
© 2014 MapR Technologies 23
Using t-Digest
• Apache Mahout uses t-digest as an on-line percentile estimator– very high accuracy for extreme tails– new in version Mahout v 0.9
• t-digest also available elsewhere– in streamlib (open source library on github)– standalone (github and Maven Central)
• What’s the big deal with anomaly detection?
• This looks like a solved problem
![Page 24: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/24.jpg)
© 2014 MapR Technologies 24
Already Done? Etsy Skyline?
![Page 25: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/25.jpg)
© 2014 MapR Technologies 25
What About This?
![Page 26: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/26.jpg)
© 2014 MapR Technologies 26
Model Delta Anomaly Detection
Online Summarizer
δ > t ?
99.9%-ile
t
Alarm !
Model
-
+ δ
![Page 27: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/27.jpg)
© 2014 MapR Technologies 27
The Real Inside Scoop
• The model-delta anomaly detector is really just a sum of random variables– the model we know about already– and a normally distributed error
• The output (delta) is (roughly) the log probability of the sum distribution (really δ2)
• Thinking about probability distributions is good
• But how do you handle AD in systems with sporadic events?
![Page 28: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/28.jpg)
© 2014 MapR Technologies 28
Spot the Anomaly
Anomaly?
![Page 29: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/29.jpg)
© 2014 MapR Technologies 29
Maybe not!
![Page 30: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/30.jpg)
© 2014 MapR Technologies 30
Where’s Waldo?
This is the real anomaly
![Page 31: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/31.jpg)
© 2014 MapR Technologies 31
Normal Isn’t Just Normal
• What we want is a model of what is normal
• What doesn’t fit the model is the anomaly
• For simple signals, the model can be simple …
• The real world is rarely so accommodating
![Page 32: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/32.jpg)
© 2014 MapR Technologies 32
We Do Windows
![Page 33: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/33.jpg)
© 2014 MapR Technologies 33
We Do Windows
![Page 34: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/34.jpg)
© 2014 MapR Technologies 34
We Do Windows
![Page 35: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/35.jpg)
© 2014 MapR Technologies 35
We Do Windows
![Page 36: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/36.jpg)
© 2014 MapR Technologies 36
We Do Windows
![Page 37: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/37.jpg)
© 2014 MapR Technologies 37
We Do Windows
![Page 38: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/38.jpg)
© 2014 MapR Technologies 38
We Do Windows
![Page 39: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/39.jpg)
© 2014 MapR Technologies 39
We Do Windows
![Page 40: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/40.jpg)
© 2014 MapR Technologies 40
We Do Windows
![Page 41: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/41.jpg)
© 2014 MapR Technologies 41
We Do Windows
![Page 42: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/42.jpg)
© 2014 MapR Technologies 42
We Do Windows
![Page 43: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/43.jpg)
© 2014 MapR Technologies 43
We Do Windows
![Page 44: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/44.jpg)
© 2014 MapR Technologies 44
We Do Windows
![Page 45: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/45.jpg)
© 2014 MapR Technologies 45
We Do Windows
![Page 46: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/46.jpg)
© 2014 MapR Technologies 46
We Do Windows
![Page 47: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/47.jpg)
© 2014 MapR Technologies 47
Windows on the World
• The set of windowed signals is a nice model of our original signal• Clustering can find the prototypes
– Fancier techniques available using sparse coding
• The result is a dictionary of shapes• New signals can be encoded by shifting, scaling and adding
shapes from the dictionary
![Page 48: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/48.jpg)
© 2014 MapR Technologies 48
Most Common Shapes (for EKG)
![Page 49: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/49.jpg)
© 2014 MapR Technologies 49
Reconstructed signal
Original signal
Reconstructed signal
Reconstructionerror
< 1 bit / sample
![Page 50: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/50.jpg)
© 2014 MapR Technologies 50
An Anomaly
Original technique for finding 1-d anomaly works against reconstruction error
![Page 51: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/51.jpg)
© 2014 MapR Technologies 51
Close-up of anomaly
Not what you want your heart to do.
And not what the model expects it to do.
![Page 52: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/52.jpg)
© 2014 MapR Technologies 52
A Different Kind of Anomaly
![Page 53: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/53.jpg)
© 2014 MapR Technologies 53
Model Delta Anomaly Detection
Online Summarizer
δ > t ?
99.9%-ile
t
Alarm !
Model
-
+ δ
![Page 54: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/54.jpg)
© 2014 MapR Technologies 54
The Real Inside Scoop
• The model-delta anomaly detector is really just a sum of random variables– the model we know about already– and a normally distributed error
• The output (delta) is (roughly) the log probability of the sum distribution (really δ2)
• Thinking about probability distributions is good
![Page 55: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/55.jpg)
© 2014 MapR Technologies 55
Anomalies among sporadic events
![Page 56: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/56.jpg)
© 2014 MapR Technologies 56
Sporadic Web Traffic to an e-Business Site
It’s important to know if traffic is stopped or delayed because of a problem…
But visits to site normally come at varying intervals.
How long after the last event should you begin to worry?
![Page 57: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/57.jpg)
© 2014 MapR Technologies 57
Sporadic Web Traffic to an e-Business Site
It’s important to know if traffic is stopped or delayed because of a problem…
But visits to site normally come at varying intervals.
And how do you let your CEO sleep through the night?
![Page 58: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/58.jpg)
© 2014 MapR Technologies 58
Basic idea:Time interval between events is how to
convert to something useful you can measure
![Page 59: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/59.jpg)
© 2014 MapR Technologies 59
Sporadic Events: Finding Normal and Anomalous Patterns
• Time between intervals is much more usable than absolute times
• Counts don’t link as directly to probability models
• Time interval is log ρ
• This is a big deal
![Page 60: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/60.jpg)
© 2014 MapR Technologies 60
Event Stream (timing)
• Events of various types arrive at irregular intervals– we can assume Poisson distribution
• The key question is whether frequency has changed relative to expected values– This shows up as a change in interval
• Want alert as soon as possible
![Page 61: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/61.jpg)
© 2014 MapR Technologies 61
Converting Event Times to Anomaly
99.9%-ile
99.99%-ile
![Page 62: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/62.jpg)
© 2014 MapR Technologies 62
But in the real world, event rates often change
![Page 63: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/63.jpg)
© 2014 MapR Technologies 63
Time Intervals Are Key to Modeling Sporadic Events
![Page 64: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/64.jpg)
© 2014 MapR Technologies 64
Model-Scaled Intervals Solve the Problem
![Page 65: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/65.jpg)
© 2014 MapR Technologies 65
Model Delta Anomaly Detection
Online Summarizer
δ > t ?
99.9%-ile
t
Alarm !
Model
-
+ δ
log p
![Page 66: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/66.jpg)
© 2014 MapR Technologies 66
Detecting Anomalies in Sporadic Events
![Page 67: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/67.jpg)
© 2014 MapR Technologies 67
Detecting Anomalies in Sporadic Events
![Page 68: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/68.jpg)
© 2014 MapR Technologies 68
Slipped Week: Simple Rate Predictor
![Page 69: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/69.jpg)
© 2014 MapR Technologies 69
Poisson Distribution
• Time between events is exponentially distributed
• This means that long delays are exponentially rare
• If we know λ we can select a good threshold– or we can pick a threshold empirically
![Page 70: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/70.jpg)
© 2014 MapR Technologies 70
Seasonality Poses a Challenge
![Page 71: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/71.jpg)
© 2014 MapR Technologies 71
Something more is needed …
![Page 72: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/72.jpg)
© 2014 MapR Technologies 72
We need a better rate predictor…
![Page 73: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/73.jpg)
© 2014 MapR Technologies 73
A New Rate Predictor for Sporadic Events
![Page 74: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/74.jpg)
© 2014 MapR Technologies 74
Improved Prediction with Adaptive Modeling
![Page 75: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/75.jpg)
© 2014 MapR Technologies 75
Anomaly Detection + Classification Useful Pair
• Use the AD model to detect anomalies in new data– Methods such as clustering for discovery can be helpful
• Once you have well-defined models in your system, you may also want to use classification to tag those
• Continue to use the AD model to find new anomalies
![Page 76: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/76.jpg)
© 2014 MapR Technologies 76
Recap (out of order)
• Anomaly detection is best done with a probability model• -log p is a good way to convert to anomaly measure• Adaptive quantile estimation (t-digest) works for auto-setting
thresholds
![Page 77: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/77.jpg)
© 2014 MapR Technologies 77
Recap
• Different systems require different models• Continuous time-series
– sparse coding to build signal model
• Events in time– rate model base on variable rate Poisson– segregated rate model
• Events with labels– language modeling– hidden Markov models
![Page 78: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/78.jpg)
© 2014 MapR Technologies 78
Why Use Anomaly Detection?
![Page 79: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/79.jpg)
© 2014 MapR Technologies 79
Keep in mind…
• Model normal, then find anomalies
• t-digest for adaptive threshold
• Probabilistic models for complex patterns
-
![Page 80: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/80.jpg)
© 2014 MapR Technologies 80
Keep in mind…
• Time intervals are key for sporadic events
• Complex time shift to predict rate with seasonality
• Sequence of events reveals phishing attack
![Page 81: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/81.jpg)
© 2014 MapR Technologies 81
e-book available courtesy of MapR
http://bit.ly/1jQ9QuL
A New Look at Anomaly Detectionby Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
![Page 82: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/82.jpg)
© 2014 MapR Technologies 82
Coming in October: Time Series Databasesby Ted Dunning and Ellen Friedman © Oct 2014 (published by O’Reilly)
![Page 83: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/83.jpg)
© 2014 MapR Technologies 83
Thank you for coming today!
![Page 84: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/84.jpg)
© 2014 MapR Technologies 85
© MapR Technologies, confidential
![Page 85: Anomaly Detection - New York Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022061200/5476d907b4af9fae028b45da/html5/thumbnails/85.jpg)
© 2014 MapR Technologies 86
Sandbox