large scale anomaly detection in data center logs and metrics · large scale anomaly detection in...

24
Large Scale Anomaly Detection in Data Center Logs and Metrics Rafael Martínez

Upload: others

Post on 29-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

Large Scale Anomaly Detection in

Data Center Logs and Metrics

Rafael Martínez

Page 2: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Gradiant, ICT technology centre in Spain

Since 2008, focused on technological development and knowledge transfer to industry

+100

5,2M€ 54% 46%

professionals

revenue in 2017contracted companies competitive public funding

12european projects

Page 3: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Focus on…

Page 4: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Focus on… Connectivity · Intelligence · Security

Page 5: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

• Large amounts of data

• Data flowing 24×7×365

• Logs/events & metrics

• Goal: Keep systems OK

Problem statement

Page 6: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

ML to the rescue!

Anomaly Detection

Page 7: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

1. Anomaly Detection in Metrics

2. Anomaly Detection in Logs

Anomaly Detection

Page 8: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

1. Anomaly Detection in Metrics

2. Anomaly Detection in Logs

Anomaly Detection

Requirements

• Run on endless data streams

• Provide results in real-time

Page 9: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

1. Anomaly Detection in Metrics

• Discord Discovery

2. Anomaly Detection in Logs

• LogScore

Anomaly Detection

Requirements

• Run on endless data streams

• Provide results in real-time

Page 10: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Metrics

Discord Discovery

Page 11: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Metrics

Discord Discovery

Improvements on the original batch algorithm

• Heuristics to reduce O(n2) towards O(n)• Early abandon

• Randomization

• Denoise filtering

Page 12: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Metrics

Discord Discovery

Improvements on the original batch algorithm

• Heuristics to reduce O(n2) towards O(n)

• Fully streaming operation

• Supports input chunks of any size

Page 13: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

Page 14: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

Page 15: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

No satisfactory results from single runs

• Too little representative clusters

• Too many outliers

Page 16: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

spec = freq_words / (freq_words + wildcards)

high spec => more informative patterns

low spec => less outliers (bigger clusters)

spec = 0.6

Page 17: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

Page 18: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

Page 19: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

1. Anomaly Detection in Metrics

• Discord Discovery

2. Anomaly Detection in Logs

• LogScore

Anomaly Detection

Both algorithms

• Learn normality from past data

• Unsupervised operation

Page 20: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Functional architecture

Page 21: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Deployment

Page 22: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Tests and results

• 4 months of data (52 GB, 121 machines, 306M log events)

• Not only detect strange events

• But also provide a summary suitable for human supervision

• Good perceived performance

Page 23: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Conclusions

• High-value solution with hard requirements

• Fully scalable in large Big Data environments

• One company already pushing these technologies into the market

Page 24: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

Thank you!

(+34) 986 120 430 | [email protected] | www.gradiant.org

Rafael P. Martínez-Álvarez, PhD

[email protected]