an improved algorithm to accelerate regular expression evaluation author: michela becchi, patrick...
TRANSCRIPT
![Page 1: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/1.jpg)
An Improved Algorithm to Accelerate Regular Expression Evaluation
Author: Michela Becchi, Patrick Crowley
Publisher: 3rd ACM/IEEE Symposium on Architecture for networking and communications
systems, 2007
Presenter: Ching Hsuan Shih
Date: 2014/02/26
![Page 2: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/2.jpg)
Outline
I. Introduction
II.Motivation
III.The Proposal
IV.Reducing the Alphabet
V. Encoding
VI.Experimental Evaluation
![Page 3: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/3.jpg)
I. Introduction
• Signature-based deep packet inspection has taken root as a dominant security mechanism in networking devices and computer systems.
• Regular expressions are more expressive than simple patterns of strings and therefore able to describe a wider variety of payload signatures.
• There has been a amount of recent work on implementing regular expressions, particularly with representations based on deterministic finite automata (DFA).
![Page 4: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/4.jpg)
I. Introduction (Cont.)
• DFAs have attractive properties that explain the attention they have received.
• They have predictable and acceptable memory bandwidth requirements.
• For any given regular expression, a DFA with a minimum number of states can be found [3].
![Page 5: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/5.jpg)
I. Introduction (Cont.)
• DFAs corresponding to large sets of regular expressions containing complex patterns can be prohibitively large in terms of numbers of states and transitions.
• Yu et al. [15] have proposed segregating rules into multiple groups and evaluating the corresponging DFAs concurrently.
• Delayed Input DFA (D2FA) [9] redundant transitions common to a pair of states with a single default transition.
![Page 6: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/6.jpg)
I. Introduction (Cont.)
• D2FA has three weaknesses
• It requires a user-provided parameter value which can only be determined experimentally for a given rule-set.
• It creates a data-structure whose worst-case paths may be traversed for each input character processed.
• It requires multiple passes over large support data structures during the construction phase.
• We propose an improved simplified algorithm for building default transitions that addresses the problems above.
![Page 7: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/7.jpg)
II. Motivation
In this section, we describe the D2FA approach [9].
• The basic goal of the D2FA is to reduce the amount of memory needed to represent all the state transitions in a DFA.
![Page 8: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/8.jpg)
II. Motivation (Cont.)
• During the string matching operation, the traversal of D2FA will be performed according to the Aho-Corasick algorithm [1], treating default transitions as failure pointers.
• The heuristic proposed in [9] to build a D2FA can be explored systematically as a maximum spanning tree problem on an undirected graph.
• This maximum spanning tree problem can be solved with Kruskal’s algorithm [5].
![Page 9: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/9.jpg)
II. Motivation (Cont.)
• After the operation of Kruskal’s algorithm, the root of each tree can be selected.
• The node having the smallest maximum distance from any vertices within the same tree is chosen.
• Direct all default transitions towards the root of the default transition tree.
• In order to limit the maximum default path length, a heuristic is proposed to address this problem by determining a maximum spanning tree forest with bounded diameter.
![Page 10: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/10.jpg)
II. Motivation (Cont.)
![Page 11: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/11.jpg)
III. The Proposal
• We now take advantage of a simple fact:
• DFA traversal always starts at a single initial state S0
• We propose a more general compression algorithm which leads to a traversal time bound independent of the maximum default transition path length.
![Page 12: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/12.jpg)
III. The Proposal (Cont.)
• Definition: For each state s, we define its depth as the minimum number of states visited when moving from s0 to s in the DFA.
![Page 13: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/13.jpg)
III. The Proposal (Cont.)
• Lemma: With any string of length N, a 2N time bound is guaranteed on all D2FA having only “backwards” transitions.
• A string of length N implies N labeled transitions to be followed and the number of default transitions is always at least one less than the number of labeled transitions taken.
• For a string of length N, the total number of state traversals cannot be higher than 2N-1.
![Page 14: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/14.jpg)
III. The Proposal (Cont.)
3.1 Problem Formulation
• The problem can be now formulated as an instance of maximum spanning tree on a directed graph.
![Page 15: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/15.jpg)
III. The Proposal (Cont.)
3.2 An example
![Page 16: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/16.jpg)
III. The Proposal (Cont.)
3.3 Algorithm
• The whole problem is reduced to having each state select the state with lower depth having the most number of outgoing transitions in common with it.
![Page 17: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/17.jpg)
III. The Proposal (Cont.)
![Page 18: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/18.jpg)
IV. Reducing the Alphabet
• The basic idea is the following: In an alphabet ∑, two symbols ci and cj will fall into the same class if they are treated the same way in all DFA states.
• In other words, given the transition function δ(states, Σ)→states, δ(s,ci)= δ(s,cj) for each state s belonging to the DFA.
• In practical scenarios (ASCII alphabet) this table will contain 256 entries, with a maximum width of 1 byte (for heavily compressed alphabets 5-6 bits per character may suffice).
![Page 19: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/19.jpg)
V. Encoding
5.1 Bitmaps• A scheme [18] consists of associating a bitmap as large as the alphabet size to each
DFA state.
• Bits corresponding to uncompressed labeled transitions present in the current state can be set to 1; the remaining bits are set to 0.
• State identifiers can be simply represented through their base address in memory.
• The length of the necessary bitmaps can substantially decrease after alphabet reduction.
![Page 20: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/20.jpg)
V. Encoding (Cont.)
5.2 Content addressing• A technique [16] consists in representing state identifiers with content labels, which
are stored in memory as next state transitions.
• A state content label contains several fields:
• A state discriminator
• The list of characters for which a labeled transition is defined
• An identifier for the default transition state
• The size of a content label depends on the number of labeled transitions defined for the corresponging state.
![Page 21: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/21.jpg)
VI. Experimental Evaluation
![Page 22: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/22.jpg)
VI. Experimental Evaluation (Cont.)
![Page 23: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/23.jpg)
VI. Experimental Evaluation (Cont.)
![Page 24: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649eb45503460f94bbc90e/html5/thumbnails/24.jpg)
VI. Experimental Evaluation (Cont.)