1 scalable pattern-matching via dynamic differentiated distributed detection (d 4 ) author: kai...

1

Scalable Pattern-Matching via Dynamic Differentiated Distributed Detection (D4)

Author:Kai Zheng, Hongbin LuPublisher:GLOBECOM 2008 Presenter:Han-Chen ChenDate:2009/12/23

2

Introduction Due to unbalance of network flow sizes, traditional flow ba

sed data parallel processing/programming model can not fully exert multicore platforms’ computing power and results in poor performance scalability.

Pattern set pre-partition, let multiple candidate PM methods to handle the subsets, Detection Mode would be selected specifically for each incoming flows at the run-time.

3

Primitive idea of Distributed Detection

Traditional Flow-based Load-Balancing. Reallocating/Balancing the workload via D2.

4

Overhead of Distributed Detection

1. from the OS/system, for increased number of memory references to address the data structures of the subsets.

2. The higher mode used, the higher overhead may be required .

5

Architecture of Differentiated Distributed Detection

stores the information denoting which flow to inspect and which pattern set/sub-set to detect against.

Task-info Queue

6

Methods of Differentiated Distributed Detection

Aho-Corasick (AC) algorithm :AC algorithm always consumes much more memory, relatively lower average performance especially when dealing with huge pattern sets.

Modified-Wu-Manber (MWM) algorithm :Much lower memory requirement, but it would not be handy and its performance becomes non-deterministic when dealing with short patterns (since the Bad-Character shifts are bounded by the minimum pattern length of the set) and when hash collisions occur heavily.

7

Wu-Manber Algorithm Basic idea of the Boyer-Moore algorithm. It contains a SHIFT table,

a HASH table, and a PREFIX table. We impose a requirement that all patterns have the same length. Check B characters. Each string of size B is mapped (using a hash function) to an integer

used as an index to the SHIFT table. We use the exact same integer to index into another table, called H

ASH. The i’th entry of the HASH table, HASH[i], contains a pointer to a list of patterns whose last B characters hash into i.

Due to the suffixes ‘ion’ or ‘ing’ are very common in English texts. We also map the first B’ characters of all patterns into the PREFIX table.

It is much less common to have different patterns that share the same prefix and the same suffix.

8

Wu-Manber Algorithm Ex: pattern set : working talking input string : abcdingB=3;B’=2;

hash[“ing”]=i;

if(Shift[i]>0) shift Shift[i];

else{ calculate prefix “ab” hash value k; find hash table ith bucket which prefix hash value k; check those patterns actually match;}

hashpattern last B characters

Shift table

i

hash table

i

…

i

Talking

working

9

pseudo-code of the prototyped PSP algorithm

…

IS1 ISNintIS2

PS2 PS3 PSm-1 PSm

…

Temp bucket

PS1

AC

10

Step 2 example Pattern1: talking , Pattern2: working K=Hash[“ing”]=15 , Nint=5

When “ring” calculate hash key k=15 I[15] = (I[15]+1)%5 = 1 Add “ring” to IS1

When “working” calculate hash key k=15 I[15] = (I[15]+1)%5 = 2

Add “working” to IS2

1

talking

working

.

.

.

.

PSorig – PSmode-m(1)

11

Implementation of Mode Selector & Scheduler It tends to be always un-worthwhile to apply D2 on small flow

s, since small flows is easy to be scheduled and would be less possible to incur “out-of-balance” issues. (Small flows: tens of KBs.)

The system may not be always ready for D2, even for the large flows. D2 only provides the way to gear up its CPU utilization, if the system is already very busy and would remain busy for a while, applying D2 would merely tire the system out.

MSS should also take account of the characteristics of the system or try to “adapt” to the system, e.g. a pre-test on the system (using certain sample traces) may be necessary when determining the parameters for dynamically mode selecting.

12

Schematic of Mode Selector & Scheduler

13

Performance

1. The straightforward per-flow-based load balance scheme (i.e. the non-D2 scheme using Mode 1 merely).

2. The Brute-force D2 scheme in which the Detection Modes are equal to the number of PME threads used.

3. Dynamic D2 scheme in which Detection Modes are selected in the runtime.4. D4, which is similar with the Dynamic D2 Scheme except that the patterns whose

sizes are not larger than 9 bytes would be processed by the AC algorithms when Mode>1.

Throughput scalability comparison among different MWM-based parallel PM schemes.

14

Thanks for your

listening

1 scalable pattern-matching via dynamic differentiated distributed detection (d 4 ) author: kai...

Documents