simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05

Some Simple Math for Anomaly Detection

#Monitorama PDX2014.05.05

Toufic Boubez, Ph.D.Co-Founder, CTOMetafor Softwaretoufic@metaforsoftware.com@tboubez

Preamble

• I lied!– There are no “simple” tricks– If it’s too good to be true, it probably is

• I usually beat up on parametric, Gaussian, supervised techniques– This talk is to show some alternatives– Only enough time to cover a couple of relatively simple but very useful techniques– Oh, and I will still beat up on the usual suspects

• Adrian and James are right! Listen to them! – What’s the point of collecting all that data if you can’t get useful information out of

• Note: real data• Note: no y-axis labels on charts – on purpose!!• Note to self: remember to SLOW DOWN!• Note to self: mention the cats!! Everybody loves cats!!

• Co-Founder/CTO Metafor Software• Co-Founder/CTO Layer 7 Technologies

– Acquired by Computer Associates in 2013– I escaped

• Co-Founder/CTO Saffron Technology• IBM Chief Architect for SOA• Co-Author, Co-Editor: WS-Trust, WS-

SecureConversation, WS-Federation, WS-Policy• Building large scale software systems for >20 years (I’m

older than I look, I know!)

Toufic intro – who I am

Wall of Charts™

The WoC side-effects: alert fatigue

“Alert fatigue is the single biggest problem we have right now … We need to be more intelligent about our alerts or we’ll all go insane.”

- John Vincent (@lusis)

(#monitoringsucks)

Watching screens cannot scale + it’s useless

Gotta turn things over to the machines

TO THE RESCUE: Anomaly Detection!!

• Anomaly detection (also known as outlier detection) is the search for items or events which do not conform to an expected pattern. [Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM Computing Surveys 41 (3): 1]

• For devops: Need to know when one or more of our metrics is going wonky

Attempt #1: thresholds …

• Roots in manufacturing process QC

… are based on Gaussian distributions

• Make assumptions about probability distributions and process behaviour– Data is normally distributed with a useful and

usable mean and standard deviation– Data is probabilistically “stationary”

Three-Sigma Rule

• Three-sigma rule– ~68% of the values lie within 1 std deviation of the mean– ~95% of the values lie within 2 std deviations– 99.73% of the values lie within 3 std deviations: anything

else is an outlier

Aaahhhh

• The mysterious red lines explained

Stationary Gaussian distributions are powerful

• Because far far in the future, in a galaxy far far away:– I can make the same predictions because the

statistical properties of the data haven’t changed– I can easily compare different metrics since they

have similar statistical properties• Let’s do this!!• BUT…• Cue in DRAMATIC MUSIC

Then THIS happens

3-sigma rule alerts

Or worse, THIS happens!

3-sigma rule alerts

WTF!? So what gives!?

• Remember this?

Histogram – probability distribution

Attempts #2, #3, etc: mo’ better thresholds

• Static thresholds ineffective on dynamic data– Thresholds use the mean as predictor and alert if

data falls more than 3 sigma outside the mean• Need “moving” or “adaptive” thresholds:

– Value of mean changes with time to accommodate new data values/trends

Moving Averages “big idea”

• At any point in time in a well-behaved time series, your next value should not significantly deviate from the general trend of your data

• Mean as a predictor is too static, relies on too much past data (ALL of the data!)

• Instead of overall mean use a finite window of past values, predict most likely next value

• Alert if actual value “significantly” (3 sigmas?) deviates from predicted value

Moving Averages typical method

• Generate a “smoothed” version of the time series– Average over a sliding (moving) window

• Compute the squared error between raw series and its smoothed version

• Compute a new effective standard deviation by smoothing the squared error

• Generate a moving threshold:– Outliers are 3-sigma outside the new, smoothed data!

• Ta-da!

Simple and Weighted Moving Averages

• Simple Moving Average– Average of last N values in your time series

• S[t] <- sum(X[t-(N-1):t])/N– Each value in the window contributes equally to

prediction– …INCLUDING spikes and outliers

• Weigthed Moving Average– Similar to SMA but assigns linearly (arithmetically)

decreasing weights to every value in the window– Older values contribute less to the prediction

Exponential Smoothing techniques

• Exponential Smoothing– Similar to weighted average, but with weights decay exponentially over

the whole set of historic samples• S[t]=αX[t-1] + (1-α)S[t-1]

– Does not deal with trends in data• DES

– In addition to data smoothing factor (α), introduces a trend smoothing factor (β)

– Better at dealing with trending– Does not deal with seasonality in data

• TES, Holt-Winters– Introduces additional seasonality factor– … and so on

Let’s look at an example

Holt-Winters predictions

A harder example

Exponential smoothing predictions

Hmmmm, so are we doomed?

• No!• ALL smoothing predictive methods work best

with normally distributed data!• But there are lots of other non-Gaussian

based techniques– We can only scratch the surface in this talk

Trick #1: Histogram!

THIS is normal

This isn’t

Neither is this

Trick #2: Kolmogorov-Smirnov test

• Non-parametric test– Compare two probability

distributions– Makes no assumptions (e.g.

Gaussian) about the distributions of the samples

– Measures maximum distance between cumulative distributions

– Can be used to compare periodic/seasonal metric periods (e.g. day-to-day or week-to-week)

http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

KS with windowing

KS Test on difficult data

Trick #3: Diffing/Derivatives

• Often, even when the data itself is not stationary, its derivatives tends to be!

• Most frequently, first difference is sufficient:dS(t) <- S(t+1) – S(t)

• Can then perform some analytics on first difference

CPU time series

Its first difference – possible random walk?

We’re not doomed, but: Know your data!!

• You need to understand the statistical properties of your data, and where it comes from, in order to determine what kind of analytics to use.– Your data is very important!– You spend time collecting it so spend time analyzing it!

• A large amount of data center data is non-Gaussian– Guassian statistics won’t work– Use appropriate techniques

• Only scratched the surface• I want to talk more about algorithms, analytics,

current issues, etc, in more depth, but time’s up!!– Come talk to me or email me if interested.

• Thank you!

toufic@metaforsoftware.com@tboubez

Oh yeah, and we’re hiring!

In Vancouver, BC

simple math for anomaly detection toufic boubez - metafor software - monitorama pdx 2014-05-05

Data & Analytics

monitoring challenges - monitorama 2016 - monitoringless

sysdig monitorama slides

prometheus (monitorama 2016)

m. lautenschlager, h. ramthun 1 metafor review 5 / 2010

numerical simulation of nonlinear mechanical problems using...

1 eric guilyardi and the metafor team common metadata for...

metafor: visualizing stories as...

metafor: visualizing stories as code - welcome to...

metaphor contribution towards theme … · acceptance page...

title slide — finger touch.€¦ · © 2018 bloomberg...

metafor v25 short - uliegecugnon/sck-ulg/metafor...

t.c. mİllÎ eĞİtİm...

statsd workshop monitorama 2013

baltimore 2019 prospectus - monitorama · coffee - $3,000...

workshop on basic and advanced level meta-analyses€¦ ·...

meta-analysis with r: the metafor...

eric guilyardi (locean/ipsl and univ. reading) and the...

eric guilyardi (locean/ipsl and univ. reading) and the...

monitorama: 19 talks in 19 quotes

earth system curator and model metadata discovery and...