automated signature extraction for high volume attacks

Post on 23-Feb-2016

39 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Automated Signature Extraction for High Volume Attacks. Yehuda Afek Anat Bremler -Barr Shir Landau Feibish. This work is part of the Kabarnit –Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. - PowerPoint PPT Presentation

TRANSCRIPT

Automated Signature Extraction for High Volume Attacks

Yehuda AfekAnat Bremler-Barr Shir Landau Feibish

This work is part of the Kabarnit–Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.

2

Zombies on innocent computers

Current DDoS Attack

Server-level DDoS attacks

Infrastructure-level DDoS attacks

Bandwidth-level DDoS attacks

3

High volume attacks - Current Defense

Defense Line1

Defense Line 2

Defense Line n

Defense Line 3

Many different types of attackers:

… Call for HELP!!

Remaining attacks: Botnets (millions of computers) Hard to identify behaviorally, under the radar screen Zero-day – no known signatures

access control list filtering

behavioral analysis

SYN cookies, Challenge-response

4

Signature based DDoS Attack Detection Unknown (zero-day) attacks:

Some hope: Attack tools usually leave some unique footprint (repeating pattern) Example in packet:

Connection: KEEP-ALIVE

Today: Find signatures manually (human eye)

Our goal: Find it automatically

Signatures used by anti-DDoS devices and firewalls to stop attack Mitigation in minutes, good enough for these types of attacks

5

Signatures also used in NIDS/IPS (Snort, Bro, etc.) Worm detection (automated extraction)

Previous work: Worm behavior (address dispersion, suspicious code,

etc.) Fixed-length signatures Non-scalable Notable works:

Kephart et al ‘94 Honeycomb [Kreibich et al ’04] Earlybird [Singh et al ‘04] Autograph[Kim et al ’04] Hancock[Griffin et al ’09]

6

System Overview

Our Challenge: Automatically find signatures that appear frequently only during attack

Where:Input collection:

In mitigation box (DDoS Guard/firewall/anti-DDoS etc.) In the cloud – collect data from several collectors.

Signature ExtractionAttack time traffic

sample

Peace time traffic sample Attack signatures

e.g. Connection: KEEP-ALIVE

7

Signature Extraction - High Level

Attack time traffic sample

Peace time traffic sample

Attack signaturese.g. Connection: KEEP-ALIVE

Signature Extraction

Find frequent strings in attack time traffic

Find frequent strings in peace time traffic

Take only strings found in attack and not in peace

8

Our GoalAutomatically find signatures that appear frequently only during attack

Requirements:1. Find minimal set of signatures

Some filtering devices have limited capacity2. Allow signatures of varying lengths 3. Don’t include signatures found in legitimate traffic

Minimum false positives4. Minimize space and time usage

Large amounts of data Quick response

9

Finding Frequent Strings in Traffic Input: Sequence of packets Output: Strings that appear frequently in packets

Common Stringology solution: use suffix trees/arrays too much space

Our solution uses heavy hitters

Attack time traffic sample

Peace time traffic sample

Attack signaturese.g. Connection: KEEP-ALIVE

Find frequent strings in attack time traffic

Find frequent strings in peace time traffic

Take only strings found in attack and not in peace

10

Heavy Hitters (Frequent Items) Input: N values, integer v Output: v values each appearing at least N/v

times Approximate solution:

Uses O(v) space! One pass over input!

Known counter based HH Algorithms: Misra & Gries 1982 Lossy Counting – Monku and Motwani 2002 Space saving - Metwally et al 2005 – currently using

11

Space saving Heavy Hitters [Metwally et al 2005] Algorithm:

Maintain v values, and their counters.

counter

value

1 101 221 30

Input102230103550

12

Space saving Heavy Hitters [Metwally et al 2005] Algorithm:

Maintain v values, and their counters. If next value x is one of the v, increment its

counter.

counter

value

2 101 221 30

Input102230103550

13

Space saving Heavy Hitters [Metwally et al 2005] Algorithm:

Maintain v values, and their counters. If next value x is one of the v, increment its

counter. Else take item with minimal counter c:

Replace value with x New counter is c+1

Error rate: N/vcounter

value

2 102 351 30

Input102230103550

14

Our Solution Heavy hitters usually done on numbers… how do we

use it for text?

k-grams: strings of length exactly k

Trivial idea: For each packet: Take all k-grams (sliding window) Do Heavy hitters on them

Fixed length not good enough Either too short: cuts up longer signatures

Substring pollution - Too many heavy hitters for one signature Or too long : noisy signatures

abcabcadefgfsdghjghnfdghfgsdhfjsb1=ab

cab2 = bcabb3 = cabc

k-grams

15

Our Solution: Double Heavy Hitters Double Heavy Hitters algorithm: two separate

instances of heavy hitters Heavy Hitters 1: Find heavy hitters of k-grams Heavy Hitters 2: Find heavy hitters of varying-length

strings created during run of Heavy Hitters 1

Heavy Hitters 1

k k….

kk

kk string

k k

Heavy Hitters 2

string

string

string

string

Input to Heavy Hitters 1: k-grams

Input to Heavy Hitters 2: strings

Output is output of Heavy Hitters 2

16

Double Heavy Hitters Algorithm While processing k-grams in Heavy Hitters1 Find max run of k-grams:

Already in Heavy Hitters 1 Counters of consecutive k-grams maintain predefined

ratio Create string Insert into Heavy Hitters 2

abca

cabc

bcab

k-grams:Is already in Heavy Hitters 1?

N YYNN Y YNNN

abca

abcabcCheck

ratio

abca

cabc

bcab

abcd

bcda

cdab

dabc

abca

N

17

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

Heavy Hitters 1counter

K-gram

1 abca1 bcab1 cabc

Heavy Hitters 2counter

string

0 NULL0 NULL0 NULL

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

abcabcabcd

Input:

18

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

Heavy Hitters 1counter

K-gram

2 abca1 bcab1 cabc

String = abcaHeavy Hitters 2counter

string

0 NULL0 NULL0 NULL

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

abcabcabcd

Input:

19

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

Heavy Hitters 1counter

K-gram

2 abca2 bcab1 cabc

String = abcabHeavy Hitters 2counter

string

0 NULL0 NULL0 NULL

abcabcabcd

Input:

20

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

Heavy Hitters 1counter

K-gram

2 abca2 bcab2 cabc

String = abcabcHeavy Hitters 2counter

string

0 NULL0 NULL0 NULL

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

abcabcabcd

Input:

21

Double Heavy Hitters Algorithm Example:

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

Heavy Hitters 1counter

K-gram

3 abcd2 bcab2 cabc

String = abcabcHeavy Hitters 2counter

string

1 abcabc

0 NULL0 NULL

K-gram

bi

abca b1

bcab b2

cabc b3

abca b4

bcab b5

cabc b6

abcd b7

abcabcabcd

Input:

22

Heavy Hitters on text – improving the estimation

Problem: substrings in heavy hitters Only longest run is in input to HH2

Correct the count: After run of algorithm For all strings s in Heavy Hitters 2:

Find other strings which contain s and add their counters to s’s counter

Heavy Hitters 2counter

string

200 wonder300 woman100 wonderwoma

n

Heavy Hitters 2Real counter

counter

string

300 200 wonder400 300 woman100 100 wonderwoma

n

23

Double Heavy Hitters Algorithm Analysis Input:

Input to HH1: N k-grams Input to HH2: C consecutive grams

Error bounds: For HH1 with v items: N/v For HH2 with v items: C/v

We Prove: C ≤ N/(k + 1) Overall: Error bound of the Double Heavy Hitters

algorithm

24

Signature Extraction - High Level

Formalize with thresholds

Attack time traffic sample

Peace time traffic sample

Attack signaturese.g. Connection: keep-ALIVE

Signature Extraction

Find frequent strings in attack time traffic

Find frequent strings in peace time traffic

Take only strings found in attack and not in peace

25

Chose Signatures Create signatures that never appear in legitimate traffic

Strings in attack with frequency > Attack-High

Thresholds: Attack-highPeace-lowPeace-highDelta

26

Chose Signatures Create signatures that never appear in legitimate traffic

Strings in attack with frequency > Attack-High

Strings in peace time

Signatures

Thresholds: Attack-highPeace-lowPeace-highDelta

False positives

27

Chose Signatures Create signatures that rarely appear in legitimate traffic

Strings in attack with frequency > Attack-High

Strings in peace with frequency > Peace-Low

Thresholds: Attack-highPeace-lowPeace-highDelta

Signatures

False positives

28

Chose Signatures Create signatures that may appear in legitimate traffic, but appear in

attack traffic much more

Strings in attack with frequency > Attack-High

Thresholds: Attack-highPeace-lowPeace-highDelta

frequency > Peace-Low

Signatures only if attack frequency at least delta more than peace frequency

False positives

Signatures

frequency > Peace-high

29

Use peace traffic to create filters

Double Heavy Hitters Algorith

m

abcabcadefgfsdghjghnfdghfg......b1=abca

b2 = bcab

b3 = cabc

……

Output values

Peace time traffic packets payload: White list

Maybe white list

Not white list

Use our Double Heavy Hitters algorithm on peace time traffic:

0%

100%

50%

Peace-high

Peace-low

frequency > Peace-high

frequency > Peace-Low

frequency > Peace-high

30

Extracting Attack Signatures

Heavy

Hitters 1

Heavy

Hitters 2

hagdhdadjashdklahdjkasfjasbfjabfhfgahfvhsbdfjkasnkiaywtqyeffcgfacsdxasdbasb1=hagd

b2 = agdh

b3 = gdhd

……

string

Output values

Signatures

Attack traffic packets payload:

White list: discard if contained in whitelist string

Maybe white list:

Now use Double Heavy Hitters algorithm on attack time traffic with filters

Modified DHH

frequency > Attack-High

31

Evaluations Overall eleven tests:

Ten real attack captures 5 captures of peacetime traffic 5 synthetic peacetime captures

One Synthetic attack in real peace time traffic

Compare to human expert

32

Sample Signatures Extra newline between header fields Use of upper-case characters, where

usually lower Use of a rarely used HTTP field Use of rare user agent.

Could not be identified manually

33

Results – Accuracy of Double Heavy Hitters estimation

Graph of frequency of signatures RED – Actual count (frequency) in attack traffic BLUE – Algorithm (DHH) estimation of frequency of signatures

Perc

ent

Signatures1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37

0102030405060708090

100

Algorithm (DHH) Actual Count (frequency)

34

Results - Attack Rate EstimationAt

tack

rate

Test Number

Tests with real peace time traffic

Tests with synthetic peace time traffic

1 2 3 4 5 6 7 8 90

10

20

30

40

50

60

70

80

90

100

Human Ex...

35

Results – Recall and Precision Estimation

Tests with real peace time traffic

Tests with synthetic peace time traffic

Perc

ent

Test Number

1 2 3 4 5 6 7 8 9 10 110

102030405060708090

100

Peacetime ba...

Precision: relevant packets from all identified

Recall: identified packets from all relevantAverage: 99.96Worst case: 99.8

36

Future Work Identify signatures always found in same

packets

Good synthetic peace-time traffic, global white-list

Support regular expression signatures

37

top related