picking pesky parameters · logic based solution automaton caches processing units ... multi-stride...
TRANSCRIPT
![Page 1: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/1.jpg)
Department of Electrical and Computer Engineering
September 14, 2016
Picking Pesky Parameters: Optimizing Regular Expression
Matching in Practice
![Page 2: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/2.jpg)
2Department of Electrical and Computer Engineering
Outline
§ Introduction to regular expressions§ Design space exploration§ Results§ Optimal Regular Expression Matching
Configuration§ Conclusion
![Page 3: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/3.jpg)
3Department of Electrical and Computer Engineering
What is regular expression matching
§ A regular expression (abbreviated regex ) patterns a match to a string.• E.g. this regex matches a valid IP address:• (([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-
9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])
§ Application of regular expression matching:• bibliographic search• Intrusion detection system• Protocol identification• Content filtering
§ Many network security software, such as Snort and Bro, use rule sets of regular expressions that match attacks.
§ These software need to operate at multiple to tens of Gigabit per second link rates to meet the performance requirements of the network.
![Page 4: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/4.jpg)
4Department of Electrical and Computer Engineering
How to implement a regex lookup engine?
1. Transform the rule set into a state machine (finite automaton).2. packet payloads are scanned by traversing the state machine.
§ Automaton can be non-deterministic (NFA) or deterministic (DFA)§ Example: NFA and DFA of .*ab+[cd]e
0 1 2 3 4a
*b
b[cd] eNFA
DFA 0 1 2 3 4a
a
b
b
[cd] ea
a a
Accepting state
![Page 5: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/5.jpg)
5Department of Electrical and Computer Engineering
What is the problem?
§ There are too many algorithms proposed to tune regex matching.§ There are too many different systems implementations for regex
matching:• Different hardware;• Different types of processors; • Different memory configurations.
§ The performance metrics used in previous publications differ:• reduce memory requirements;• improve the average and worst case throughput;• reduce power and energy consumption.
§ It is very difficult to determine which technique or system implementation to use.
![Page 6: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/6.jpg)
6Department of Electrical and Computer Engineering
What does our work do?
§ Our work addresses the problem of choosing which regular expression technique to use for a given system, rule set, and traffic configuration.
§ We present a systematic evaluation of many widely used regular expression techniques using real-world rule sets.
§ We evaluate the throughput, memory size, energy consumption, and estimated chip area of each configuration.
§ We provide a method for choosing the right configuration based on the results from our experiments.
![Page 7: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/7.jpg)
7Department of Electrical and Computer Engineering
Outline
§ Introduction to regular expressions§ Design space exploration§ Results§ Optimal Regular Expression Matching
Configuration§ Conclusion
![Page 8: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/8.jpg)
8Department of Electrical and Computer Engineering
Two types of solutions
§ Memory based solution
§ Logic based solution
Automaton
Caches
Processing units
……
……
MemoryBus
Input / Match
![Page 9: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/9.jpg)
9Department of Electrical and Computer Engineering
Design space Regular expression ruleset
2-DFADFA
NFA2-NFA
Non-compr. layout
Linear encoding
Bitmapped encoding
Memory-based
Result
A-DFA 2-A-DFA
Logic-based
FPGA clock rate
Automaton
partitioned ruleset
Inputs
Implementation
HW-based multi-stride
SW-based multi-stride Stride-1
Cache size Memory bandwidth
Number of cores
4 configurations 9 configurations
System
EvaluationSynthesis
toolProcessor simulator
Real processor
Throughput speed
Memory & area cost
Power consumpt.
Traffic traces
![Page 10: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/10.jpg)
10Department of Electrical and Computer Engineering
Automaton domain
§ NFA (Nondeterministic Finite Automaton)• Generated from regex ruleset.• The number of states is small, but it allows multiple state activations at the
same time.§ DFA (Deterministic Finite Automaton)
• Generated from NFA.• Allows only one active state at the same time: stable performance.• Size could grow exponentially if some complex patterns exist (called state
explosion).• Large rulesets need to be partitioned into several parts, and generate
multiple DFAs.• A-DFA: a compression technique that allow a DFA state use less than 256
transitions. Should use with a compressed memory layout.§ Multi-stride NFA/DFA (or k-NFA/k-DFA)
• Process k input characters at a time• If the initial alphabet is Σ, a k-NFA/k-DFA is equivalent to a FA defined on
alphabet Σk.
![Page 11: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/11.jpg)
11Department of Electrical and Computer Engineering
Implementation domain -- Memory based solution
§ Three memory layouts:1. Non-compressed layout
• Uses all |Σ| transitions in a state.2. Linear encoding
• Only encodes the existing transitions in an NFA, or default transition and other transitions in an A-DFA.
• Linear search is performed until a transition matching the input character is found or its absence is verified.
3. Bitmapped encoding• Similar to linear encoding, but use a bitmap to
avoid linear search.• Only apply to stride-1 DFA
§ 9 configurations in total• Non-compressed – NFA, DFA, 2-NFA, 2-DFA• Linear encoding – NFA, A-DFA, 2-NFA, 2-A-DFA• Bitmapped encoding – A-DFA
Tx for 0x00
32-bits
Tx for 0x01Tx for 0x02
Tx for 0xFF
256 words
……
state
DFA non-compressed layout
Tx for 0x00
32-bits
Tx for 0x01Tx for 0x03Tx for 0xFF
addr of state 0……
addr of state n
Tx address map
stateDefault Tx
DFA linear encoding
Tx for 0x00
32-bits
Tx for 0x01Tx for 0x03Tx for 0xFF
stateLevel1 Bitmap
Level2 Bitmap
1 word
8 words
Default Tx
DFA bitmapped encoding
![Page 12: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/12.jpg)
12Department of Electrical and Computer Engineering
Implementation domain -- Logic based solution
§ Logic based solutions only use NFA• Stride-1 implementation• Software-based multi-stride approach
• First generate a k-NFA, then encode it in logic.• Resource costly, can only support stride-2
• Hardware-based multi-stride approach• Have a stride-one NFA and the corresponding alphabet translation table• Resource efficient, can support up to stride-4
§ 4 configurations in total• Stride-1 implementation• Software-based -- 2-NFA• Hardware-based -- 2-NFA, 4-NFA
![Page 13: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/13.jpg)
13Department of Electrical and Computer Engineering
System domain
§ Memory based solution• Different cache sizes for level-1 and level-2 cache• Memory bandwidth• Different number of cores
§ Logic based solution• Different FPGA clock rates
![Page 14: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/14.jpg)
14Department of Electrical and Computer Engineering
Design space Regular expression ruleset
2-DFADFA
NFA2-NFA
Non-compr. layout
Linear encoding
Bitmapped encoding
Memory-based
Result
A-DFA 2-A-DFA
Logic-based
FPGA clock rate
Automaton
partitioned ruleset
Inputs
Implementation
HW-based multi-stride
SW-based multi-stride Stride-1
Cache size Memory bandwidth
Number of cores
4 configurations 9 configurations
System
EvaluationSynthesis
toolProcessor simulator
Real processor
Throughput speed
Memory & area cost
Power consumpt.
Traffic traces
![Page 15: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/15.jpg)
15Department of Electrical and Computer Engineering
Outline
§ Introduction to regular expressions§ Design space exploration§ Results§ Optimal Regular Expression Matching
Configuration§ Conclusion
![Page 16: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/16.jpg)
16Department of Electrical and Computer Engineering
Evaluation Methodology
§ Real hardware• TI OMAP 4460 ARM processor• Xilinx Virtex 5 FPGA (XC5VLX50)• Speed, memory usage/slice usage and power are
measured
§ Simulator• SimpleScalar simulator, calibrated with real hardware.• To study the parameters which can not be changed on
real hardware• Cache size• Memory bandwidth
§ Inputs• We use both real rulesets (from Snort, L7-filter, and Bro)
and some synthetic rulesets with different characteristics.• Traffic traces are generated by the traffic generator
(written by Becchi et.al.)
![Page 17: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/17.jpg)
17Department of Electrical and Computer Engineering
Results from real hardware – Memory based solutions
§ TI OMAP 4460 ARM processor§ Rulesets with very high mNFA and very low mDFA should
use DFA, and a ruleset with very high mDFA and very low mNFA should use NFA.• mNFA: the average number of active states in NFA• mDFA: the number of DFAs
0
50
100
150Sp
eed
(Mbp
s)
0.1
1
10
100
1000
Mem
ory
(MB)
snort l7-filter bro exact-match dotstar 0.1 dotstar 0.2 dotstar 0.3 dotstar 0.60
500
1000
1500
Powe
r (m
W)
NFA NCNFA LE2-NFA NC2-NFA LEDFA NCA-DFA LEA-DFA BM2-DFA NC2-A-DFA LE
Ruleset #reg-ex
Length mDFA mNFAmin max avgsnort 462 10 202 44.1 12 2.76l7-filter 111 6 438 63.2 7 6.02bro 782 5 211 34.8 8 20.34exact-match 500 10 256 49.2 2 1.76dotstar 0.1 500 10 243 49.6 11 8.42dotstar 0.2 500 11 212 49.0 24 15.64dotstar 0.3 500 11 251 47.1 33 12.76dotstar 0.6 500 11 274 50.3 49 26.76
![Page 18: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/18.jpg)
18Department of Electrical and Computer Engineering
Results from real hardware – Logic based solutions
§ Xilinx Virtex 5 (XC5VLX50)§ 𝑠𝑝𝑒𝑒𝑑 = 𝑐𝑙𝑘)*+,×𝑠𝑡𝑟𝑖𝑑𝑒×8 𝑏𝑖𝑡𝑠§ Smaller circuit can operate at higher frequency§ Hardware-based stride 4 implementation leads to the best results
0
2000
4000
6000
Spee
d (M
bps)
stride-1 SW stride-2 HW stride-2 HW stride-4
0
10000
20000
30000
Slic
e Us
age
snort l7-filter bro exactmatch dotstar0.1 dotstar0.2 dotstar0.3 dotstar0.60
1000
2000
Powe
r (m
W)
mis
sing
mis
sing
mis
sing
![Page 19: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/19.jpg)
19Department of Electrical and Computer Engineering
Results from real hardware – Logic based solutions
§ Different frequency: power vs. speed trade-off;§ 𝑃 = 𝑃567689 + 𝑃;<=7>89 = 𝑃567689 + 𝛼𝐶𝑉B𝑐𝑙𝑘)*+,§ Should choose highest achievable 𝑐𝑙𝑘)*+, to get highest speed/power ratio.
0 500 1000 1500 2000 2500 3000 3500 4000400
500
600
700
800
900
1000
Speed (Mbps)
Powe
r (m
W)
stride-1SW stride-2HW stride-2HW stride-4
![Page 20: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/20.jpg)
20Department of Electrical and Computer Engineering
Results from Processor Simulation – Cache
§ SimpleScalar simulator§ We select the best cache size based on speed/area.
1 2 4 8 16 32 64 128 256 512
32641282565121024204840968192163840
5
10
15
L1 data cache size (KB)L2 data cache size (KB)
Spee
d/ar
ea (M
bps/
mm
2)
L1 size(KB)
L2 size(KB)
NFA 16 64NFA linear 16 322-NFA 64 10242-NFA linear 64 512DFA 64 128D2FA linear 64 64D2FA bitmap 32 642-DFA 128 40962-D2FA linear 128 4096
Best cache size for different configurationsSelected by maximum speed/area
![Page 21: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/21.jpg)
21Department of Electrical and Computer Engineering
Results from Processor Simulation – Memory bandwidth
§ Most cache miss rates are below 1%§ Low memory bandwidth utilization§ High parallelism is possible
Utilizationof bwmem(%)
Maxthreadssupported
NFA 0.25 81NFA linear 0.17 1202-NFA 0.38 522-NFA linear 0.23 88DFA 0.17 118D2FA linear 0.04 454D2FA bitmap 0.04 4802-DFA 0.26 762-D2FA linear 0.20 101
Demonstration of scalability on Intel x86 CPU.
![Page 22: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/22.jpg)
22Department of Electrical and Computer Engineering
Outline
§ Introduction to regular expressions§ Design space exploration§ Results§ Optimal Regular Expression Matching
Configuration§ Conclusion
![Page 23: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/23.jpg)
23Department of Electrical and Computer Engineering
Optimal Memory-Based Configurations§ Select the optimal configuration by speed/area§ Parallel processing is allowed§ When mNFA/mDFA<0.35, an NFA-based implementation is preferable;§ Otherwise DFA-based implementations are preferable.§ For some simple rulesets, 2-DFA is faster than DFA.
![Page 24: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/24.jpg)
24Department of Electrical and Computer Engineering
Optimal Logic-Based Configurations
§ Hardware-based multi-stride is the best.§ There seems to be a peak speed/slice value at higher stride, but
this is beyond the chip's resource to validate.
1 2 3 4 5 6 7 88
10
12
14
16
18
20
stride
Mbp
s/K
slic
es
speed/slice for different hardware based stride
![Page 25: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/25.jpg)
25Department of Electrical and Computer Engineering
Conclusion
§ The key problem in regular expression matching is not the lack of innovative techniques, but the difficulty of deciding which technique actually works best in a given system setting.
§ In this work, we:• define the regular expression matching design space• propose a benchmark of configurations that evaluate the design space both
on simulator and on real hardware.• present the analysis of ruleset to obtain optimal configuration.
![Page 26: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/26.jpg)
26Department of Electrical and Computer Engineering
Thank you!
![Page 27: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial](https://reader034.vdocuments.us/reader034/viewer/2022042713/5fa6ee197f024f44de0d7ab5/html5/thumbnails/27.jpg)
27Department of Electrical and Computer Engineering
0 20 40 60 80 100 1200
20
40
60
80
100
120
simulator speed (Mbps)
real
spe
ed (M
bps)
Calibration