astute: detecting a different class of traffic anomalies fernando silveira 1,2, christophe diot 1,...

34
ASTUTE: Detecting a Different Class of Traffic Anomalies Fernando Silveira 1,2 , Christophe Diot 1 , Nin a Taft 3 , Ramesh Govindan 4 1 Technicolor 2 UPMC Paris Universitas 3 Intel Labs Berkeley 4 University of Southern California ACM SIGCOMM 2010

Upload: abraham-cameron

Post on 21-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

ASTUTE: Detecting a Different Class of Traffic Anomalies

Fernando Silveira1,2, Christophe Diot1, Nina Taft3, Ramesh Govindan4

1 Technicolor2 UPMC Paris Universitas3 Intel Labs Berkeley4 University of Southern California

ACM SIGCOMM 2010

ASTUTE: Detecting a Different Class of Traffic Anomalies

A Short-Timescale Uncorrelated-Traffic Equilibrium

Comparing to Kalman Filter and Wavelet Analysis,ASTUTE can find anomalies with different features

• Kalman & Wavelet can detect:

few large flows

• ASTUTE can detect:

many small flows

2010/11/2 Speaker: Li-Ming Chen 3

Outline

Motivation & Goal

ASTUTE – An Equilibrium Model

ASTUTE-based Anomaly Detection

Experimental Methodology

Performance Evaluation

Conclusion & My Comments

2010/11/2 Speaker: Li-Ming Chen 4

Anomaly Detection

Traffic anomalies (in large ISPs & enterprise networks) come from: Malicious activities (e.g., DoS, port scan) Misconfigurations/failures of network components (e.g., link

failure, routing problem) Legitimate events (e.g., large file transfers, flash crowds)

Anomaly detection: Build a statistical model of normal traffic An anomaly is defined as deviation from the normal model

2010/11/2 Speaker: Li-Ming Chen 5

Motivation: Challenges in Anomaly Detection Anomaly Detection:

Pros: Can detect new anomalies!

Cons: Training takes times Training data is never guaranteed to be clean Periodical (re)training is required False alarm

Can we detect anomalies without having to learn what is normal?

2010/11/2 Speaker: Li-Ming Chen 6

Observation

Network Traffic show Equilibrium: When many flows are multiplexed on a non-saturated link,

their volume changes over short timescales tend to cancel each other out

making the average change across flows close to ZERO

The equilibrium property Holds if the flows are independent While, is violated by traffic changes caused by several,

potentially small, correlated flows ~ traffic anomalies

2010/11/2 Speaker: Li-Ming Chen 7

Goal

Propose a new approach to anomaly detection based on ASTUTE A mathematical model to describe “A Short-Timescale Uncorrelat

ed-Traffic Equilibrium”

Advantages: No training – computationally simple and immune to data-poisoni

ng Accurately detects a well-defined class of traffic anomalies Theoretical guarantees on the false positive rates

Evaluate the performance against Kalman filter and wavelet analysis

2010/11/2 Speaker: Li-Ming Chen 8

Outline

Motivation & Goal

ASTUTE – An Equilibrium Model

ASTUTE-based Anomaly Detection

Experimental Methodology

Performance Evaluation

Conclusion & My Comments

2010/11/2 Speaker: Li-Ming Chen 9

Equilibrium Model

Flow: a set of packets that share the same values for a given set of traffic features (e.g., 5-tuple)

Binning: use time bin to study the evolution of a flow Flow volume: number of packets in the flow during the

corresponding bin

Measure flow volume on a link for each time bin

bin ibin i+1

time…

flow f startsat time bin sf

flow f continuedfor df bins

),...,,( 1,1,, ffff dsfsfsff xxxx

flow f ’s volume of each time bin can be represented as a vector:

xf,i xf,i+1

2010/11/2 Speaker: Li-Ming Chen 10

Equilibrium Model:Focus on Volume Changes of Flows

bin ibin i+1

time…

),...,,( 1,1,, ffff dsfsfsff xxxx

flow f ’s volume of each time bin can be represented as a vector:

xf,i xf,i+1

F: set of flows that are active in i or i+1

ififif xx ,1,, (volume change of f from i to i+1)

2010/11/2 Speaker: Li-Ming Chen 11

Consequences of the ASTUTE Model Assumptions:

(A1) Flow independence (A2) Stationary

Theorem 1 (consequences of the ASTUTE):

other

Intuition: independent flows cancel each other out

2010/11/2 Speaker: Li-Ming Chen 12

Outline

Motivation & Goal

ASTUTE – An Equilibrium Model

ASTUTE-based Anomaly Detection

Experimental Methodology

Performance Evaluation

Conclusion & My Comments

2010/11/2 Speaker: Li-Ming Chen 13

ASTUTE-based Anomaly Detection Method

Given: A detection threshold K(p) A pair of consecutive time bins

Measure: Set of active flows, F Mean volume change, Variance of volume changes,

Compute AAV (ASTUTE Assessment Value):

Flag an alarm if:

A toy example :

i i+1

No Alarm

(copy from author’s slides)

i2ˆ i

FKi

i

ˆ

ˆ'

)(' pKK

0+2-1

i

3/7ˆ

3/1ˆ

2

i

i

)(378.0' pKK

2)( pK

2010/11/2 Speaker: Li-Ming Chen 14

Note: About Volume Changes Requirement:

Only consider traffic on non-saturated links, and using short-timescale bins

Volume change (for F flows that are active at bin i):

Mean:

Standard deviation:

2010/11/2 Speaker: Li-Ming Chen 15

Note: About Detection Threshold For large F, has a (1-p) confidence interval given by the c

entral limit theorem

If contains zero, then F satisfies ASTUTE Otherwise, there is an ASTUTE anomaly at time bin i

smallest value of K(p) is

i

iI

1-p conf. interval

p/2

K(p)-K(p) 0

i

Fi

i

ˆ

ˆ(defined as AAV)

< 0 > 0

FpKi

i

ˆ

ˆ)( FpK

i

i

ˆ

ˆ)(

2010/11/2 Speaker: Li-Ming Chen 16

Note: Situations that ASTUTE is Violated There are 2 possibilities that ASTUTE is violated:

(1) false positive Controlled by false positive rate p In a fraction p of the time bins, ASTUTE may be violated by

normal traffic (2) Flows violate the model’s assumption: independence &

stationary Stationary:

Only over the timescale of a typical flow duration Authors study which bin sizes show stationary behavior

Independence: Many flows increase/decrease their volumes at the same time!

2010/11/2 Speaker: Li-Ming Chen 17

Note: Validate Stationary Assumption (A2) Stationary:

Depends on timescale (bin size) In the trace:

Long scales: daily usage bias Small scales: no bias!

We use short

timescales to factor

out violations of

stationarity

2010/11/2 Speaker: Li-Ming Chen 18

Note: Validate “Gaussianity” of AAVs

Check distribution similarity

Study the impact of packet sampling rate

2010/11/2 Speaker: Li-Ming Chen 19

Outline

Motivation & Goal ASTUTE – An Equilibrium Model ASTUTE-based Anomaly Detection Experimental Methodology

Competitors (or collaborator!?): Kalman & Wavelet Inspect anomalies from traffic data and identify their root ca

uses Simulation through anomaly injection

Performance Evaluation Conclusion & My Comments

2010/11/2 Speaker: Li-Ming Chen 20

Kalman & Wavelet (alternative anomaly detectors for comparison purpose) Kalman: a spatio-temporal detector

Learning spatial and temporal correlations to predict the next values

Its threshold parameter has similar semantics to that of ASTUTE (allowing a direct comparison)

[26] A. Soule, K. Salamatian, and N. Taft, “Combining Filtering and Statistical Methods for Anomaly Detection,” in Proc. IMC, 2005.

Wavelet: a frequency-based detector Decompose signals into low/medium/high frequency bands The variance of the combined signal (medium & high freq.

bands) represents anomalies [2] P. Barford, J. Kline, D. Plonka, and A.Ron, “A Signal Analysis of Network Traffi

c Anomalies,” In Proc. IMW, 2002.

2010/11/2 Speaker: Li-Ming Chen 21

Kalman & Wavelet (cont’d)

Targets of these two detectors: (1) packet volume time series (2) entropy time series of Src. IP (3) entropy time series of Dst. IP (4) entropy time series of Src. Port (5) entropy time series of Dst. port

2010/11/2 Speaker: Li-Ming Chen 22

Dataset

Flow traces from 3 different networks

(between research institutions)

(public Internet European NRENs)

(inside the enterprise network)

Flow sampling:

0.1

0.01

NO

2010/11/2 Speaker: Li-Ming Chen 23

Manual Classification of Anomalies for Root Cause Analysis Goal:

To perform “root cause” analysis for the anomalies found by ASTUTE, Kalman, and Wavelet

need to know the root cause first Approach:

Use information provided by ASTUTE to help the process of manual classification of anomalies in the traffic trace

Steps: (1) correlated anomalous flows (2) anomalous flow identification (3) anomalous flow classification (by hand)

2010/11/2 Speaker: Li-Ming Chen 24

Results of Anomalous Flow Classification

Take these as the criteria for labeling the anomalies found in the three traces

2010/11/2 Speaker: Li-Ming Chen 25

Simulation through Anomaly Injection Benefit:

Simulation helps understand how methods trade-off detection rates for false positives (ROC curves)

ps: for comparing Kalman and ASTUTE only

Approach: For end-host activity: build a set of benchmark anomalies a

nd inject (recreate identified anomalies) For outages: remove related traffic

2010/11/2 Speaker: Li-Ming Chen 26

Outline

Motivation & Goal

ASTUTE – An Equilibrium Model

ASTUTE-based Anomaly Detection

Experimental Methodology

Performance Evaluation

Conclusion & My Comments

2010/11/2 Speaker: Li-Ming Chen 27

Number of Anomalies and Anomaly Overlap

Small overlap

Kalman & Wavelethave more overlapamong each other

• what are these anomalies??

2010/11/2 Speaker: Li-Ming Chen 28

Anomaly Types (Internet2)

Detection capabilities are different

2010/11/2 Speaker: Li-Ming Chen 29

Anomaly Types (GEANT2 & Corporate)

Users characteristicsin different networks are different

2010/11/2 Speaker: Li-Ming Chen 30

Small Detector Overlap

(map qualitative properties (types) of the anomaliesto their quantitative properties (# flows and packets))

Kalman/Wavelet(few large flow)

ASTUTE(several small flow)

Less total volume

2010/11/2 Speaker: Li-Ming Chen 31

Detection Performance

Type 1 Type 2

Type 2 Type 3

2010/11/2 Speaker: Li-Ming Chen 32

Complementarity of ASTUTE & Kalman

After combination, the performance is better!

2010/11/2 Speaker: Li-Ming Chen 33

Outline

Motivation & Goal

ASTUTE – An Equilibrium Model

ASTUTE-based Anomaly Detection

Experimental Methodology

Performance Evaluation

Conclusion & My Comments

2010/11/2 Speaker: Li-Ming Chen 34

Conclusion

ASTUTE detects anomalies w/o learning the normal behavior Computationally simple and immune to data-poisoning Specializes on strongly correlated flows (several small flow) Limitation: can not find anomalies involving a few large flow

s But those are easy to find!

ASTUTE and Kalman complement each other nicely ASTUTE also provides information that is useful to perform

root cause analysis