an improved algorithm to accelerate regular expression evaluation

24
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture for networking and communications systems, 2007 Presenter: Ching Hsuan Shih Date: 2014/02/26

Upload: adora

Post on 15-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

An Improved Algorithm to Accelerate Regular Expression Evaluation. Author : Michela Becchi , Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture for networking and communications systems ,  2007 Presenter: Ching Hsuan Shih Date: 2014/02/26. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Improved Algorithm to Accelerate Regular Expression Evaluation

An Improved Algorithm to Accelerate Regular Expression Evaluation

Author: Michela Becchi, Patrick Crowley

Publisher: 3rd ACM/IEEE Symposium on Architecture for networking and communications

systems, 2007

Presenter: Ching Hsuan Shih

Date: 2014/02/26

Page 2: An Improved Algorithm to Accelerate Regular Expression Evaluation

Outline

I. Introduction

II.Motivation

III.The Proposal

IV.Reducing the Alphabet

V. Encoding

VI.Experimental Evaluation

Page 3: An Improved Algorithm to Accelerate Regular Expression Evaluation

I. Introduction

• Signature-based deep packet inspection has taken root as a dominant security mechanism in networking devices and computer systems.

• Regular expressions are more expressive than simple patterns of strings and therefore able to describe a wider variety of payload signatures.

• There has been a amount of recent work on implementing regular expressions, particularly with representations based on deterministic finite automata (DFA).

Page 4: An Improved Algorithm to Accelerate Regular Expression Evaluation

I. Introduction (Cont.)

• DFAs have attractive properties that explain the attention they have received.• They have predictable and acceptable memory bandwidth requirements.

• For any given regular expression, a DFA with a minimum number of states can be found [3].

Page 5: An Improved Algorithm to Accelerate Regular Expression Evaluation

I. Introduction (Cont.)

• DFAs corresponding to large sets of regular expressions containing complex patterns can be prohibitively large in terms of numbers of states and transitions.• Yu et al. [15] have proposed segregating rules into multiple groups and

evaluating the corresponging DFAs concurrently.

• Delayed Input DFA (D2FA) [9] redundant transitions common to a pair of states with a single default transition.

Page 6: An Improved Algorithm to Accelerate Regular Expression Evaluation

I. Introduction (Cont.)

• D2FA has three weaknesses• It requires a user-provided parameter value which can only be determined

experimentally for a given rule-set.

• It creates a data-structure whose worst-case paths may be traversed for each input character processed.

• It requires multiple passes over large support data structures during the construction phase.

• We propose an improved simplified algorithm for building default transitions that addresses the problems above.

Page 7: An Improved Algorithm to Accelerate Regular Expression Evaluation

II. Motivation In this section, we describe the D2FA approach [9].• The basic goal of the D2FA is to reduce the amount of memory

needed to represent all the state transitions in a DFA.

Page 8: An Improved Algorithm to Accelerate Regular Expression Evaluation

II. Motivation (Cont.)

• During the string matching operation, the traversal of D2FA will be performed according to the Aho-Corasick algorithm [1], treating default transitions as failure pointers.

• The heuristic proposed in [9] to build a D2FA can be explored systematically as a maximum spanning tree problem on an undirected graph.

• This maximum spanning tree problem can be solved with Kruskal’s algorithm [5].

Page 9: An Improved Algorithm to Accelerate Regular Expression Evaluation

II. Motivation (Cont.)• After the operation of Kruskal’s algorithm, the root of each tree can

be selected.• The node having the smallest maximum distance from any vertices within

the same tree is chosen.

• Direct all default transitions towards the root of the default transition tree.

• In order to limit the maximum default path length, a heuristic is proposed to address this problem by determining a maximum spanning tree forest with bounded diameter.

Page 10: An Improved Algorithm to Accelerate Regular Expression Evaluation

II. Motivation (Cont.)

Page 11: An Improved Algorithm to Accelerate Regular Expression Evaluation

III. The Proposal

• We now take advantage of a simple fact:• DFA traversal always starts at a single initial state S0

• We propose a more general compression algorithm which leads to a traversal time bound independent of the maximum default transition path length.

Page 12: An Improved Algorithm to Accelerate Regular Expression Evaluation

III. The Proposal (Cont.)

• Definition: For each state s, we define its depth as the minimum number of states visited when moving from s0 to s in the DFA.

Page 13: An Improved Algorithm to Accelerate Regular Expression Evaluation

III. The Proposal (Cont.)

• Lemma: With any string of length N, a 2N time bound is guaranteed on all D2FA having only “backwards” transitions.• A string of length N implies N labeled transitions to be followed and the number

of default transitions is always at least one less than the number of labeled transitions taken.

• For a string of length N, the total number of state traversals cannot be higher than 2N-1.

Page 14: An Improved Algorithm to Accelerate Regular Expression Evaluation

III. The Proposal (Cont.)

3.1 Problem Formulation• The problem can be now formulated as an instance of maximum

spanning tree on a directed graph.

Page 15: An Improved Algorithm to Accelerate Regular Expression Evaluation

III. The Proposal (Cont.)

3.2 An example

Page 16: An Improved Algorithm to Accelerate Regular Expression Evaluation

III. The Proposal (Cont.)

3.3 Algorithm• The whole problem is reduced to having each state select the state with

lower depth having the most number of outgoing transitions in common with it.

Page 17: An Improved Algorithm to Accelerate Regular Expression Evaluation

III. The Proposal (Cont.)

Page 18: An Improved Algorithm to Accelerate Regular Expression Evaluation

IV. Reducing the Alphabet

• The basic idea is the following: In an alphabet ∑, two symbols ci and cj will fall into the same class if they are treated the same way in all DFA states.

• In other words, given the transition function δ(states, Σ)→states, δ(s,ci)= δ(s,cj) for each state s belonging to the DFA.

• In practical scenarios (ASCII alphabet) this table will contain 256 entries, with a maximum width of 1 byte (for heavily compressed alphabets 5-6 bits per character may suffice).

Page 19: An Improved Algorithm to Accelerate Regular Expression Evaluation

V. Encoding

5.1 Bitmaps• A scheme [18] consists of associating a bitmap as large as the alphabet size to each

DFA state.

• Bits corresponding to uncompressed labeled transitions present in the current state can be set to 1; the remaining bits are set to 0.

• State identifiers can be simply represented through their base address in memory.

• The length of the necessary bitmaps can substantially decrease after alphabet reduction.

Page 20: An Improved Algorithm to Accelerate Regular Expression Evaluation

V. Encoding (Cont.)

5.2 Content addressing• A technique [16] consists in representing state identifiers with content labels, which

are stored in memory as next state transitions.

• A state content label contains several fields:• A state discriminator

• The list of characters for which a labeled transition is defined

• An identifier for the default transition state

• The size of a content label depends on the number of labeled transitions defined for the corresponging state.

Page 21: An Improved Algorithm to Accelerate Regular Expression Evaluation

VI. Experimental Evaluation

Page 22: An Improved Algorithm to Accelerate Regular Expression Evaluation

VI. Experimental Evaluation (Cont.)

Page 23: An Improved Algorithm to Accelerate Regular Expression Evaluation

VI. Experimental Evaluation (Cont.)

Page 24: An Improved Algorithm to Accelerate Regular Expression Evaluation

VI. Experimental Evaluation (Cont.)