series dfa for memory-efficient regular expression matching

12
Series DFA for Memory- Efficient Regular Expression Matching Author: Tingwen Liu, Yong Sun, Li Guo, and Binxing Fang Publisher: CIAA 2012( International Conference on Implementation and Application of Automata) Presenter: Sih-An Pan Date: 2014/5/7 1 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Upload: walker

Post on 04-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Series DFA for Memory-Efficient Regular Expression Matching. Author: Tingwen Liu, Yong Sun, Li Guo , and Binxing Fang Publisher : CIAA 2012(  International Conference on Implementation and Application of Automata ) Presenter : Sih -An Pan Date: 2014/5/7. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Series DFA for  Memory-Efficient  Regular Expression Matching

Series DFA for Memory-Efficient Regular

Expression Matching

Author: Tingwen Liu, Yong Sun, Li Guo, and Binxing Fang

Publisher: CIAA 2012( International Conference on Implementation and Application of Automata)

Presenter: Sih-An Pan

Date: 2014/5/7

1Department of Computer Science and Information Engineering

National Cheng Kung University, Taiwan R.O.C.

Page 2: Series DFA for  Memory-Efficient  Regular Expression Matching

Introduction

We focus on state reduction by cutting complex RegExes into well-designed and ordered RegEx fragments that can be compiled into compact DFAs.

We propose Series DFA (SDFA) that concatenates the compact DFAs with epsilon transitions in the order of their appearance.

Computer & Internet Architecture Lab 2

Page 3: Series DFA for  Memory-Efficient  Regular Expression Matching

State Complexity for RegExes

Computer & Internet Architecture Lab 3

Page 4: Series DFA for  Memory-Efficient  Regular Expression Matching

Main Idea of SDFA RegEx1: ba[^a]*bad.{2}cd RegEx2: de[^e]{3} It first locates all unconstrained and constrained repetitions in

the two RegExes, and then cut them into five fragments. Fragment1: ba Fragment2: ^[^a]*bad Fragment3: ^.{2}cd Fragment4: de Fragment5: ^[^e]{3}

Computer & Internet Architecture Lab 4

Page 5: Series DFA for  Memory-Efficient  Regular Expression Matching

Main Idea of SDFA

We call a RegEx as its fragments’ father, each fragment as its son.

For a given RegEx, the first (last) fragment is called its eldestson (youngestson), correspondingly other fragments are non-eldestsons (non-youngestsons).

Fragments ba and de, which are the eldestsons of the two RegExes, are compiled into a composite DFA.

Computer & Internet Architecture Lab 5

Page 6: Series DFA for  Memory-Efficient  Regular Expression Matching

Main Idea of SDFA

Computer & Internet Architecture Lab 6

Page 7: Series DFA for  Memory-Efficient  Regular Expression Matching

Main Idea of SDFA

RegEx1: ba[^a]*bad.{2}cd RegEx2: de[^e]{3}

Computer & Internet Architecture Lab 7

Page 8: Series DFA for  Memory-Efficient  Regular Expression Matching

Optimization in Cutting Process

Cutting at the repetitions of any character range will have low memory size but high memory bandwidth as each fragment is too short.

In contrast, cutting only at the repetitions of wildcards will have low memory bandwidth but high memory size.

We introduce a threshold μ: if the size of a character range is more than μ, we think the range is large enough to be cut at the positions of its repetitions.

Computer & Internet Architecture Lab 8

Page 9: Series DFA for  Memory-Efficient  Regular Expression Matching

Optimization in Matching Process

This specialty can be exploited to decrease memory bandwidth.

As left-most matching is enough to know the fired RegExes, once a RegEx is reported it is safe to set its all non-eldestson DFAs inactive forever.

SDFA is able to ensure that the fragment DFAs of one RegEx will never be accessed by other RegExes.

Computer & Internet Architecture Lab 9

Page 10: Series DFA for  Memory-Efficient  Regular Expression Matching

Experimental Results

Computer & Internet Architecture Lab 10

Page 11: Series DFA for  Memory-Efficient  Regular Expression Matching

Experimental Results

Computer & Internet Architecture Lab 11

Page 12: Series DFA for  Memory-Efficient  Regular Expression Matching

Experimental Results

Computer & Internet Architecture Lab 12