URCA: Pulling out Anomalies by their Root Causes
Fernando Silveira and Christophe Diot
URCA: Pulling out Anomalies by their Root Causes
URCA: Pulling out Anomalies by their Root Causes
Presenter: Fernando Silveira
UPMC and Technicolor
Joint work with Christophe Diot
Presented at INFOCOM 2010 – San Diego, USA
URCA: Pulling out Anomalies by their Root Causes
Time
Pack
et
cou
nts
Traffic Anomaly Detection
3 Friday, February 19, 2010
TrafficData
AlarmAnomalyDetector
AnomalyAnomaloustraffic
URCA: Pulling out Anomalies by their Root Causes
Obtaining information about an anomaly’s cause.
Automating root cause analysis is important… Manual analysis is tedious and error prone Study from Arbor Networks with 67 ISPs
Average ISP observes ~ 19 anomalies/day
… but it is also a hard problem. Most detectors do not provide any information beyond an alarm
Root Cause Analysis of Traffic Anomalies
4 Friday, February 19, 2010
URCA: Pulling out Anomalies by their Root Causes
Anomaly detection methods with properties that facilitate root cause analysis tasks
Anomaly classification Lakhina et al. - SIGCOMM’05 Based on clustering entropy residuals Limited to anomalies found in entropy
Anomalous flow identification Schweller et al. - IMC’04, Li et al. - IMC’06 Based on reversible sketches Complexity of choosing and computing sketches Limited to anomalies found in sketches
Related Work
5 Friday, February 19, 2010
URCA: Pulling out Anomalies by their Root Causes
Our Contribution
6 Friday, February 19, 2010
URCA (Unsupervised Root Cause Analysis) a tool that finds an anomaly’s root cause can be used with different anomaly detectors
It provides accurate and fast results: anomalies are analyzed as fast as they are detected (1-5
minutes)
URCA: Pulling out Anomalies by their Root Causes
Outline
7 Friday, February 19, 2010
Algorithmsfor URCA
PerformanceEvaluation
URCA: Pulling out Anomalies by their Root Causes
8
Source IP Destination IP
Source Port Destination Port
Source AS Destination AS
Previous Hop AS Next Hop AS
Incoming Router Interface
Outgoing Router Interface
Our Approach
Friday, February 19, 2010
URCA has two steps: anomalous flow identification root cause classification
Our methods rely on flow features
URCA: Pulling out Anomalies by their Root Causes
Step 1: Anomalous Flow Identification
9 Friday, February 19, 2010
TrafficData
AlarmAnomalyDetector
Filter
80/TCP
443/TCP
1614/TCP
22/TCP
53/UDP
25/TCP
…
Candidate Anomalous
Flows
Destination Port
URCA: Pulling out Anomalies by their Root Causes
Flow Identification - Example
10 Friday, February 19, 2010
Time
Pack
et
cou
nts
Output Interface (2 values)Destination AS (3 values)
eth0
eth1
AS 3354
AS 1277
AS 2108Candidate
flows
Normalflows
Anomalousflows
Normalflows
Anomaly
URCA: Pulling out Anomalies by their Root Causes
11
Visualizing Root Cause Flows
Friday, February 19, 2010
Network scan
Routing change
URCA: Pulling out Anomalies by their Root Causes
Step 2: Root Cause Classification
12 Friday, February 19, 2010
a a a a b b c cb c
We compute metrics from each anomaly number of source IP’s, ASN’s, flow sizes, packet sizes, etc.
Hierarchical Clustering known anomalies + 1 unknown
Bootstrapping labels helped by visualization
?
URCA: Pulling out Anomalies by their Root Causes
Outline
13 Friday, February 19, 2010
Algorithmsfor URCA
PerformanceEvaluation
URCA: Pulling out Anomalies by their Root Causes
Traces from links in GEANT2
Anomalies obtained with the ASTUTE anomaly detector
Experimental Methodology
14 Friday, February 19, 2010
Trace Duration Anomalies Ground Truth
A 1 month 93 Manual labeling
B
3 months
519
AnomalyInjection
C 329
D 386
E 1099
F 680
URCA: Pulling out Anomalies by their Root Causes
Identification Accuracy - Trace A
15 Friday, February 19, 2010
URCA: Pulling out Anomalies by their Root Causes
Identification Accuracy - Traces B-F
16 Friday, February 19, 2010
* 90-percentile averaged across traces
Anomaly type Missed * Extra *
Distributed DoS attack 10.3% 4.1%
Network scan 4.9% 0%
Port scan 0% 0%
Link failure 0% 0%
Upstream Routing Change 14.7% 0%
Downstream Routing Change
2.9% 0%
URCA: Pulling out Anomalies by their Root Causes
Classification Accuracy - Trace A
17 Friday, February 19, 2010
80%Correct
15%Misclass.
15% Misclassified =
5% first occurrences of an event type+10% routing changes mistaken for link failures
5% requirevisualization
URCA: Pulling out Anomalies by their Root Causes
What you’ll find in the paper:
Algorithms for both identification and classification Experimental evaluation with 6 traces URCA can be applied to other anomaly detectors
Ongoing and Future Work:
URCA with an EWMA-based detector Using other sources of data (e.g., routing data)
Wrapping Up
18 Friday, February 19, 2010
URCA: Pulling out Anomalies by their Root Causes
Special thanks to:
DANTE / GEANT2 - http://www.geant2.net/ Ricardo Oliveira @ UCLA - http://irl.cs.ucla.edu/~rveloso/
More information at:
http://www.thlab.net/~fernando/papers/urca.pdf http://www.thlab.net/~fernando/papers/astute.pdf
The End
19 Friday, February 19, 2010
URCA: Pulling out Anomalies by their Root Causes
Backup Slides
20 Friday, February 19, 2010
URCA: Pulling out Anomalies by their Root Causes
21
Classification results for ASTUTE
Friday, February 19, 2010
URCA: Pulling out Anomalies by their Root Causes
22
Classifying the Unknown ASTUTE Anomalies
Friday, February 19, 2010
URCA: Pulling out Anomalies by their Root Causes
23
Results with EWMA
Friday, February 19, 2010