exact pattern matching on resource-limited network devices

36
1 Exact pattern matching on resource-limited network devices Chien-Chung Su 2002/12/10

Upload: britanney-alston

Post on 01-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

Exact pattern matching on resource-limited network devices. Chien-Chung Su 2002/12/10. Outline. Problem definition Resource-limited network devices Introduction of SEBMH Disadvantages of SEBMH Adaptive bucket management Conclusion. Problem definition. Given P : pattern(s) T : text - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exact pattern matching on resource-limited network devices

1

Exact pattern matching on resource-limited network devices

Chien-Chung Su

2002/12/10

Page 2: Exact pattern matching on resource-limited network devices

2

Outline

• Problem definition

• Resource-limited network devices

• Introduction of SEBMH

• Disadvantages of SEBMH

• Adaptive bucket management

• Conclusion

Page 3: Exact pattern matching on resource-limited network devices

3

Problem definition

• Given– P : pattern(s)– T : text

• General action– Find all occurrences of P in T

Page 4: Exact pattern matching on resource-limited network devices

4

Research for exact pattern matching

• The exact matching problem is solved for those typical word-processing applications.

• The story changes radically for other specific applications.– DNA and protein search– Relation between search performance and

database size– Network intrusion detection

Page 5: Exact pattern matching on resource-limited network devices

5

Resource-limited network devices

• Special issues– Security issues

• Check whether P occur in T– Resource-limited

• Try to break the tradeoff between speed and space

• Characteristics– Network-related pattern matching

• Patterns change sometimes• Texts change usually

– Solutions• Dynamic hash function• Adaptive bucket management

Page 6: Exact pattern matching on resource-limited network devices

6

SEBMH

Global Shift Table

Hash-Link-List Structure of ASCII Patterns

Hash-Link-List Structure of non-ASCII Patterns

Input Mask

Page 7: Exact pattern matching on resource-limited network devices

7

Set-Exclusive table

Slepsp

Text e el ……

Sg

L

sp

The shortest pattern of Link-List L

The shortest pattern of all

ep

HashMatching failed

sp1

HashTable

… e

Global ShiftTable

… Sg

el

Set-ExclusiveTable

Sl

Page 8: Exact pattern matching on resource-limited network devices

8

Disadvantages of SEBMH

• Because the hash function is static, the performance is still dependent with pattern set.– Dynamic hash function

• The general pattern matching problem, the global shift values will be close to 1 when there are more and more patterns– Classifying the patterns to ease the influence

Page 9: Exact pattern matching on resource-limited network devices

9

How to improvement

• Pattern classifier

• Approximate perfect hash function

• Adaptive bucket management

Page 10: Exact pattern matching on resource-limited network devices

10

• Step1. sort the class target patterns by KEY• Step2. equally distribute the class target patterns into each bucket

n = BUCKET_NUM; i = 0; while (pattern is not the last one) { for (i=0 ; i<AVG_P ; i++) { 1.dispatch pattern into bucket(n); 2.get the next pattern; } n++; }

• Step3. handle the exception condition for (i=1 ; i<BUCKET_NUM ; i++) { if ( patterns with key in bucket(i-1) equal to patterns with key in bucket( i ) ) 1.group these patterns into bucket(i-1) or bucket( i ) }

Approximate hash function (1)

Page 11: Exact pattern matching on resource-limited network devices

11

Approximate hash function (2)

Page 12: Exact pattern matching on resource-limited network devices

12

Adaptive bucket management

• Assumption– Resource is limited– Total bucket number is fixed

• Step 1 : classify the patterns– For example (feature is a factor)

• Class A• Class B• Class C

Page 13: Exact pattern matching on resource-limited network devices

13

Adaptive bucket management

• Step 2 : allocate buckets– For example

• Traffic distribution– Class A : 50%

– Class B : 30%

– Class C : 20%

• Policy– SEBMH(Class A) could get more buckets at this time

– Set-Exclusive table will be more effective

» bucket ↑, pattern per bucket ↓, efficacy of set-exclusive table ↑

» bucket ↓, set-exclusive utilization ↑

Page 14: Exact pattern matching on resource-limited network devices

14

How to allocate buckets

• Communism• Fair• Greedy

Page 15: Exact pattern matching on resource-limited network devices

15

Basic assumption• Assumption

– Φ : matching time for one pattern– B : total buckets number– P : total patterns number– C : classes number– Bi : buckets number for class i– Pi : patterns number for class i– Di : traffic distribution of class I

• Known– P1 + P2 + … + Pc = P– D1 + D2 + … + Dc = 1

• Problem– Find a sequence (B1, B2, …, Bc)

• B1 + B2 + … + Bc = B

• is small enoughDcBc

PcD

B

PD

B

P ...2

2

21

1

1

Page 16: Exact pattern matching on resource-limited network devices

16

Communism MethodABM is not applied

• Without ABM– Classifier is no need– Average matching time :

– Other overheads• Overheads of approximate perfect hashing• Efficacy of Global-Shift table is not obvious• Efficacy of Set-Exclusive table is not obvious

B

PAMTCM

Page 17: Exact pattern matching on resource-limited network devices

17

Fair MethodAt least one solution

• For example– Traffic distribution

• Class A : 50%• Class B : 30%• Class C : 20%

• With ABM in Fair Method

– Average matching time :

– Example:

B

P

B

P

B

P

B

PCBA CCC

%20

%20%30

%30%50

%50

B

P

BD

PDAMT

C

i ii

iiFM

1

Page 18: Exact pattern matching on resource-limited network devices

18

Greedy MethodWe can find better solutions

• For example– Traffic distribution Pattern distribution

• Class A : 50% Class A : 5• Class B : 30% Class B : 5• Class C : 20% Class C : 20

• With ABM in Greedy Method

– Average matching time :

– Example

3334

2055

24

45

3

20%20

3

5%30

4

5%50

])|,min([1

C

i i

iiGM B

PxsequenceBanyxAMT

Page 19: Exact pattern matching on resource-limited network devices

19

20021112_ 實驗報告

Page 20: Exact pattern matching on resource-limited network devices

20

Objective

• 觀察最佳解的分佈情況• 希望能從觀察中找出演算法來求解

Page 21: Exact pattern matching on resource-limited network devices

21

Traffic dist. 和 pattern dist. 成正比

Bucket = 10 Bucket = 30

Page 22: Exact pattern matching on resource-limited network devices

22

Traffic dist. 和 pattern dist. 成反比

Bucket = 10 Bucket = 30

Page 23: Exact pattern matching on resource-limited network devices

23

結論• 當 pattern 和 traffic 的分布成反比時才有效

果 , 可作為訓練 classifier 的參考依據

Page 24: Exact pattern matching on resource-limited network devices

24

Greedy Algorithm (temp)

• Step 1 : get the Bi from fair method

• Step 2 : borrow 1 bucket from each class– bonus_bucket = # of class

• Step 3 : dispatch the bonus buckets– Bonusi = floor (bonus_bucket * (Pi / P))

• Step 4 : dispatch the remainder buckets– Add bucket into each class and find the best

solution one by one

Page 25: Exact pattern matching on resource-limited network devices

25

How to classify patterns (1)

• The goals the classifier should achieve– High priority

• reduce the frequency of ABM performed

– Low priority• enhance the efficacy of ABM

Page 26: Exact pattern matching on resource-limited network devices

26

How to classify patterns (2)

• reduce the frequency of ABM performed– When ABM should not be performed for specific

classes

• …….(1)

• …….(2)

CceachforN

DDN

iii

1

' ||

CceachforNN

DDNN

i

N

iii

)1(

)()(1

2

1

'2'

Page 27: Exact pattern matching on resource-limited network devices

27

How to classify patterns (3)

• Expected affect of and– ↑

– ↓•

– ↑•

– ↓•

Page 28: Exact pattern matching on resource-limited network devices

28

How to classify patterns (4)

• enhance the efficacy of ABM– Try to let

• Pi is increasing• Di is decreasing

Page 29: Exact pattern matching on resource-limited network devices

29

How to classify patterns (5)

• Operators– Combination

• Directly combine two classes in the same domain

– Sibling aggregation

• Combine two classes in the different domain

patterns

OtherUDPTCP

HTTP FTP …. TFTP ICMP

• Objective– Make the tree with the stable traffic tree

….

• Constrain– A lots of patterns with the same prefix in the same class should be a

independent class

Page 30: Exact pattern matching on resource-limited network devices

30

How to classify patterns (6)

• Mathematical model for training classifier – Merge two classes when

• Conditions of means hold• Conditions of variances hold

– are the same as previous meanings– k (>=1) is a coefficient that could balance

• Resource [ k↑]• Performance [ k ↓]

,

Page 31: Exact pattern matching on resource-limited network devices

31

How to classify patterns (7)

• Conditions of means

)()( yx DmeanNandDmeanM

kS

ND

andkS

MDS

yy

S

xx

11

||||

kS

NMDDS

i

yixi

1 2

)()(

Page 32: Exact pattern matching on resource-limited network devices

32

How to classify patterns (7)

• Conditions of variances

)()( yx DVarQandDVarP

kNM

NQ

NM

MP

kQandkP

Page 33: Exact pattern matching on resource-limited network devices

33

Classifier

• Advantages– reduce the impact of complex approximate

perfect hash function– eliminate the pattern matching not required

Page 34: Exact pattern matching on resource-limited network devices

34

Classifier behavior

Input packet

belong to any class?NO

bypass

YES

dispatch the input packet to the corresponding handler

Page 35: Exact pattern matching on resource-limited network devices

35

Next Experiments

Page 36: Exact pattern matching on resource-limited network devices

36

Conclusion