anomaly detection using the cla

Post on 10-May-2015

832 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Anomaly Detection Using The Cortical Learning Algorithm

Subutai Ahmadsubutai@numenta.org

2Source: mimobaby.com

3

Lindsay Lohan

4

Anomaly Detection Using The Cortical Learning Algorithm

Subutai Ahmadsubutai@numenta.org

6

Three Topics

• What is “Anomaly Detection”?

• How is the anomaly score computed in NuPIC/CLA today?

• How is the anomaly score used in the product Grok?

+ sample code!

7

8

Spatial (Static) Anomalies

9

Temporal Anomalies

10

Windmill Gear Bearing Temperature

11

Anomalies In Random Behavior

12

“Temporary” Anomalies

13

Anomaly Detection

• Anomalies are any significant deviation from normal behavior

• Anomaly detection is valuable

• Anomaly detection is hard – there are many flavors– Spatial anomalies

– Temporal anomalies

– Anomalies in random data

– “Temporary” anomalies

– Etc.

14

The Anomaly Score In NuPIC

• NuPIC implements anomaly scoring for streaming datasets

• Core feature of the OPF (Online Prediction Framework)– Use inferenceType = TemporalAnomaly

– Outputs an anomaly score between 0 and 1 for every data point

• Detects spatial and temporal anomalies

• Continuously learning online system

• Works for numerical and categorical data

15

Computing Anomaly Score

Time of DayEncoders Sensor Value

Data

Spatial Pooler

Temporal Pooler

CLA

Predictions

CLA constantly learns common spatial patterns and temporal sequences in the stream of inputs

Anomaly Score =

0 if current value was predicted1 if value was totally unpredictedbetween 0 and 1 if similar to predicted value

At each time step Temporal Pooler makes multiple predictions about what might come next

16

Artificial Example

B, C, or D occurs– Anomaly score = 0

E occurs:– Completely different from B,C, or D -->

anomaly score = 1

– Similar to B, C, or D --> score will be between 0 and 1

– “Similar” means “similar after encoding”

• If A -> E repeats:– Anomaly score will drop to 0

A B A B A C A B A D A _

17

Example: Anomalous CPU Usage

18

Example: Heater Temperature

Unusual temporal behavior

Unusually lowreadings

Anomalyscore

Anomalyscore

19

Example: Change In Randomness

20

Sample Code

• Sample code and datasets for running anomaly detection available:

• https://github.com/subutai/nupic.subutai/run_anomaly

|-- README.md

|-- data

| |-- art_load_balancer_spikes.csv

| |-- cpu_5f553.csv

| |-- cpu_825cc.csv

| |-- cpu_cc0c5.csv

| `-- rds_connections.csv

|-- model_params.py

|-- run_all.sh

`-- run_anomaly.py

21

Grok

• Define what to monitor• Grok ingests streaming

data

• Builds models automatically

• Continuously learns• Adapts to changes

• Visualize likelihood of unusual behavior

• See metrics and data• Prevent downtime

22

Use Case: Sudden Changes, Slow changes

23

Use Case: Subtle Changes

24

What Have We Learned From Grok?

• Anomaly detection is extremely useful

• Real world data is really really noisy!– We will never build a perfect predictive model

• There’s no way to set a threshold on the anomaly score– High anomaly score not necessarily bad

– Random stuff happens normally

• Visually you can see a qualitative change in the anomaly scores

• In Grok we detect the change in the anomaly score itself– Compute a likelihood that the predictability of the data has changed

25

Anomaly Likelihood In Grok

1. For each new data point compute anomaly score using OPF

2. Estimate the probability distribution of historical anomaly scores

3. Compute likelihood that the recent anomaly scores comes from same distribution as historical anomaly scores

26

Example: Anomaly Score

27

Example: Likelihood Score

28

Example: Change In Randomness

29

Use Case: Changes in Randomness

30

Windmill Gear Bearing Temperature

31

Anomaly Likelihood Code

• Anomaly likelihood scheme has proven to be critical in making anomaly score useful in a practical application

• We are making the Anomaly Likelihood code available:

https://github.com/subutai/nupic.subutai/run_anomaly

• Self contained function right now– It might be useful to look at, but not in an easy to use form yet!

– Plan to create better sample code and then perhaps integrate into OPF.

32

What About Swarming?

• Swarming is an automated parameter selection scheme in NuPIC

– Runs hundreds of models with unique parameter combinations

– Selects the best field combinations and parameters

• In Grok we use a single pre-swarmed parameter set– Fixed set of fields (timestamp + value)

– Data fed in every 5 minutes

– Works very well across different data streams with above characteristics

• In general you will still need to swarm– Great set of tutorials online put together by Matt

– But the system is relatively insensitive to small parameter changes, so you may not need to swarm too often

33

Where Do We Go Next?

• The CLA is proving to be excellent at detecting anomalies in datasets we’ve tried so far

– Fully automated - no parameter tuning in Grok!

• We’ve learned a lot in the process of creating the product– We’d like to spread the ideas to the community

• It’s clear we’re just scratching the surface

34

Benchmark For Streaming Anomaly Detection

• Hard to find good anomaly detection benchmarks for streaming data

• We’ve decided to create a dataset and testing methodology focused on streaming data and anomaly detection

– Model real time online streaming data sources

– Emphasis will be on temporal streaming data, automation, and continuous learning

– Well defined methodology for evaluating algorithms

– Baseline results using CLA

• We’re hoping it will be useful to the NuPIC community as we continue to push the boundaries

– Please see Ian Danforth or me if you’re interested

35

Resources

• Read “The Science of Anomaly Detection” whitepaper on numenta.com

• Github repository containing sample code, anomaly likelihood algorithm, and data:

–https://github.com/subutai/nupic.subutai/run_anomaly

• Survey of Machine Learning techniques: – Chandola, Varun, Arindam Banerjee, and Vipin Kumar. "Anomaly detection:

A survey." ACM Computing Surveys (CSUR) 41.3 (2009): 15.

top related