1 fast and memory-efficient regular expression matching for deep packet inspection department of...
Post on 21-Dec-2015
215 views
TRANSCRIPT
![Page 1: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/1.jpg)
1
Fast and Memory-Efficient Regular Expression
Matching for Deep Packet Inspection
Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman and Randy H. Katz
Publisher: ANCS'06, December 3–5, 2006
Present: Yu-Tso Chen
Date: November, 6, 2007
![Page 2: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/2.jpg)
2
Outline
1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple
Patterns 5. Evaluation Result 6. Conclusion
![Page 3: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/3.jpg)
3
Introduction
Three unique complex features• 1) Large numbers of wildcards can cause DFA to
grow exponentially
• 2) Wildcard are used with length restriction(‘?’, ‘+’) will increase the resource
• 3) Groups of characters are also commonly used such interaction can result in highly complex state machine(ex.”^220[\x09-]*ftp”)
![Page 4: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/4.jpg)
4
Introduction (cont.)
Make following contributions• 1) Analyze the computational and storage cost of
building individual DFAs
• 2) Two rewrite rules for specific regular expressions
• 3) Combine multiple DFAs into a small number of group
![Page 5: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/5.jpg)
5
Outline
1. Introduction 2. Definitions and problem
description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple
Patterns 5. Evaluation Result 6. Conclusion
![Page 6: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/6.jpg)
6
Regular Expression Patterns
Compares the regular expressions used in two networking applications (Snort, Linux L-7 filter & XML filtering)• 1)Both types of app. Use wildcards
(‘.’,’?‘,’+’,’*’) contain larger numbers of them
• 2) Classes of characters (“[ ]”) are used only in packet scanning applications
• 3) High percentage of scanning app. Have length restrictions on some of the classes or wildcards
![Page 7: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/7.jpg)
7
Regular Expression Patterns
![Page 8: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/8.jpg)
8
Solution Space for Regular Expression Matching
A single regular expression of length n can be expressed as an NFA with O(n)
When the NFA is converted into a DFA, it may generate states
The processing complexity for each character in the input is O(1) in DFA, but is O(n2) for an NFA when all n states are active at the same time
nO
![Page 9: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/9.jpg)
9
Solution Space for Regular Expression Matching (cont.)
To handle m regular expressions, two choices are possible :• Processing them individually in m automata
• Compiling them into a single automaton
![Page 10: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/10.jpg)
10
Problem Statement
DFA-based approaches in this paper• Our goal is to achieve O(1) computation cost
• The focus of the study is to reduce memory overhead of DFA
There are two sources of memory usage in DFAs : states and transitions• We consider the number of states as the prim
ary factor
![Page 11: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/11.jpg)
11
Outline
1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple
Patterns 5. Evaluation Result 6. Conclusion
![Page 12: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/12.jpg)
12
Design Considerations
Define Completeness of Matching Results :• Exhaustive Matching:M(P,S)={substring S’ of S | S
’ is accepted by the DFA of P}
• It is expensive and often unnecessary to report all matching substrings
• We propose a new concept, Non-overlapping Matching, that relaxes the requirements of exhaustive matching
• Non-overlapping Matching:
• Ex : ab* if input abbb non-overlapping matching will report one match instead of three
• Exhaustive Matching will report, ab, abb, abbb
}P, ofDFA by the accepted ,|S of Si substring{),( SjSiSjSiSPM
![Page 13: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/13.jpg)
13
Design Considerations (cont.)
Define DFA Execution Model for Substring Matching : We focus on patterns without ‘^’ attached at the beginning• Repeater searches
• One-pass search – this approach can truly achieve O(1) computation cost per character
![Page 14: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/14.jpg)
14
DFA Analysis for Individual Regular Expressions
The study is based on the use of exhaustive matching & one-pass search
![Page 15: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/15.jpg)
15
Case 4 : DFA of Quadratic Size The DFA needs to remember the
number of Bs it has seen and their locations
![Page 16: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/16.jpg)
16
Case 5 : DFA of Exponential Size An exponential number of states
(22+1)are needed to represent these two wildcard characters
AAB(AABBCD) is different from ABA(ABABCD) because a subsequence input BCD
![Page 17: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/17.jpg)
17
Regular Expression Rewrites
Rewrite Rule(1)• “^SEARCH\s+[^\n]{1024}” to
“^SEARCH\s [^\n]{1024}”
• “^A+[A-Z]{j}” to “^A [A-Z]{j}” • We can prove match “^A+[A-Z]{j}” also match “^A
[A-Z]{j}”
![Page 18: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/18.jpg)
18
Regular Expression Rewrites (cont.)
Rewrite Rule(2)• We don’t need to keep track of the second AUTH\s
• If there is a ‘\n’ within the next 100 bytes, the return character must also be within 100 bytes to the second AUTH\s
• If there is no ‘\n’ within the next 100 bytes, the first already matched the pattern
• “([^A]|A[^U]|AU[^T]|AUT[^H]|AUTH[^\s]|AUTH\s[^\n]{0,99}\n)*AUTH\s[^\n]{100}”
![Page 19: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/19.jpg)
19
Outline
1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple
Patterns 5. Evaluation Result 6. Conclusion
![Page 20: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/20.jpg)
20
Selective Grouping of Multiple Patterns
The composite DFA may experience exponential growth in size, although none of the individual DFA has an exponential component
![Page 21: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/21.jpg)
21
Regular Expressions Grouping Algorithm
Definition of interaction : two patterns interact with each other if their composite DFA contains more states than the sum of two individual ones
![Page 22: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/22.jpg)
22
Grouping Algorithm
![Page 23: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/23.jpg)
23
Outline
1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple
Patterns 5. Evaluation Result 6. Conclusion
![Page 24: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/24.jpg)
24
Evaluation Result
Effect of Rule Rewriting
![Page 25: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/25.jpg)
25
Evaluation Result (cont.)
![Page 26: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/26.jpg)
26
Outline
1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple
Patterns 5. Evaluation Result 6. Conclusion
![Page 27: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d5e5503460f94a3e013/html5/thumbnails/27.jpg)
27
Conclusion
Rewriting techniques –
memory-efficient DFA-based approaches are possible
Selectively groups patterns together –
speed up the matching process