13mcs1043

MEMORY EFFICIENT REGULAR

EXPRESSION BASED SEARCHING IN

NETWORKS

Prachi Sharma

School of Computer Science and Engineering,

VIT University, Chennai- 48

Abstract Regular expression based searching has became a

highly preferable searching technique. Traditional string

matching techniques are used for deep packet inspection in

networks, but all the traditional string matching techniques are

limited to some finite set of string. Regular expressions are well

known for their highly expressive power. Regular expression

describes a language by means of symbols and three operators

such as union, concatenation and kleene star. Pattern matching

plays a vital role in networks, but the memory capacity in

networks especially in routers is less compared to other

components. Deterministic finite automata (DFA) are used to

implementing regular expressions .This DFA representation of

regular expression require a large amount of memory, which

limits the capability of network application, so we need to reduce

the memory requirement of DFA. Delayed input DFA, and Delta

finite automata are existing method to implement DFA in

networks, but they have lot of limitations. This paper proposes a

new method which combines the advantages of delayed input DFA

and Delta finite automata to reduce the memory requirement of

DFA in network components.

Index terms-Delayed input DFA , Novel technique ,delta DFA.

1 INTRODUCTION

Regular expression based searching widely use in deep packet

inspection and in network intrusion detection system. In

addition layer 7 filters firewall and switches also use regular

expression based matching.

We can construct a minimum states DFA for a regular

expression but it still require a large amount of memory, if we

construct a non deterministic finite automata (NDFA) it will

require less memory then a deterministic finite automata (DFA)

but the disadvantage of NDFA is that, it will have more number

of transitions for its each input symbol .We use deterministic

finite automata (DFA) due to its deterministic nature , and it is

faster than non-deterministic finite automata (NDFA).

We can improve the memory requirement of a DFA by

reducing its size , for reducing its size we have two methods in

one we can reduce the number of transitions and in other we can

reduce the number of states. The techniques which has

proposed in past are delayed input DFA (D2FA) which is based

on reducing the number of transitions, and other technique is

the novel technique in which we can combine the common

destination states instead of reducing the number of transitions

,we reduce number of states. Delta finite automata takes the

advantage of delayed input DFA and deterministic finite

automata (DFA) ,it is the improvement over D2FA .By reducing

the memory requirement of DFA they become more flexible

and faster in software and hardware search engines

2 .RELATED WORK

This paper focuses on memory reduction of DFA, which is used

for implementing regular expression. Regular expression based

searching, is widely use in networking, for implementing

regular expression we use DFA. Deterministic finite automata

require larger memory, for reducing it we use memory

reduction techniques. We referred Sailesh Kumar Washington and Sarang Dharmapurikar papar Algorithms to Accelerate

Multiple Regular Expressions Matching for Deep Packet

Inspection [3] and we found new technique Delayed input

DFA which reduces the number of transitions for reducing

memory requirement of DFA. We can reduce the memory of a

DFA by reducing number of states also. In [1]

Memory-Efcient Regular Expression Search Using State

Merging, Michela Becchi, Srihari Cadambi author proposed

a new technique novel method for reducing memory by using

state merging.

3. DELAYED INPUT DFA(D2FA)

In D2FA we reduce the memory requirement by reducing the

number of transitions. In a DFA where we have two states with

common transition set, on same input symbol we can eliminate

this transitions of one state, by introducing a new default

transition between them. In a DFA a single input character

leads a single transition between two states but in delayed input

DFA ( D2FA) for a single input character we may have more

than one transitions between two states this is the disadvantage

of D2FA delayed input DFA to overcome this disadvantage of

delayed input DFA new technique has proposed which is delta

finite automata FA. Delta finite automata (FA) keeps the

advantage of D2FA and takes only one transitions for an input

character like a normal DFA for example we have a DFA in

figure 1.0

Figure 1.0

This DFA with 6 states and 13 transitions can accept a+

and

(b*(c

+(a(b+c)

*a+b(a+c)

*b)(a+b+c)

*)) regular expression. For

example this DFA is accepting bccbbc and caab language

In this DFA state 1 and state 2 have the common transition set

,for same input symbol so we can replace all the outgoing

transition of state 2 with a default transition, from state 2 to

state 1 as shown in the figure 1.1

Figure 1.1

This D2 FA is also accepting the language a

+ bccbbc and

caab language

This D2FA accepting the same language as previous DFA but

for each transition state 2 will lead a transition to state 1 for for

overcome on this problem of delayed DFA new technique has

introduced, FA.

5 .DELTA FINITE AUTOMATA(FA)

As we have already discussed that FA improve the D2FA by

introducing a label on default transition. We can apply FA

technique on D2FA for improving memory access requirement

.as we can see in the figure 1.2

Here we are introducing a new label on b because state 1 has a

loop on input symbol b this FA is same like a normal DFA in

looking.

Figure 1.2

This delta finite automata (FA) is also accepting the

language a+

bccbbc and caab language

This delta finite automata (FA) accepting the same language

as DFA shown in figure 1.0. Now we have reduced the number

of transitions to 11 ,in this diagram we have common

destination state 6 ,for state 4 and state 5, according to

merging technique we will merge them by introducing new

label on them as shown in figure 1.4.

6 PROPOSED WORK

This paper proposes a new technique for representing

deterministic finite automata (DFA), with existing

methodology D2FA and FA. Here we are focusing on memory

reduction of DFA by introducing a new memory reduction

technique ,which is based on merging of two non-equivalent

states. These merging states creates opportunity for other states

to be merged ,we merge states by introducing new labels on

transitions. Unlike other memory reduction techniques we have

no restrictions on transitions on which states leads to their

common destination.

After applying merging technique we use original state and

merged state term to represent the states of new Deterministic

finite automata (DFA) ,after applying these techniques some

states may be affecting and some states may remain unaffected

in the DFA as it is.

Figure 2.0 Normal DFA

Figure 2.1 Delayed DFA

Figure 2.3 Delta Finite Automata

Figure 1.4

In this DFA we are merging states 4 and 5 ,and we introducing a

new label 0 to all the transition of state 4 and label 1 to all the

transition of state 5. Now we have reduced the number of states

from 6 to 5 and transitions 9 from 13 and this DFA accepting

the same language as the DFA, shown in figure 1.0.We are

introducing a new combine Algorithm for reducing transitions

and states in a DFA.

Algorithm for N states

1. for (i=1 to N)

2. compare ith state with each state

if they have same transition set

{

I. Remove one state transition and make

one default transition between them

II. Introduce a new label on this default

transition

}

Else if they have common destination

{

Merge them and introduce new transition label

}

Else

{

I++

}

3. end.

CONCLUSION

Regular expression are widely use in networking field to

perform matching at faster speed . Deterministic finite

automata(DFA) is the simplest way of implementing regular

language but it require large amount of memory and we need

less amount of memory for this purpose in this paper we have

discussed, many representations of deterministic finite

automata such as delayed input DFA (D2FA) , in which we

reduced the number of transition by introducing new default

transition for the states which has the common transition set on

same input symbol but it has some disadvantage it may takes

several states for a single input to improve this problem we

discussed delta finite automata (FA) ,which takes the

advantage of delayed input DFA(D2FA) and leads to take only

one state traversal per input character .After this we discussed

novel technique in which we merged the two non-equivalent

states of common destination without worrying about same

input symbol we applied these all technique on a single DFA

and we reduced the memory requirement by 70 % .These all

technique use for faster matching in networks

ACKNOWLEDGEMENT

I am grateful to my guide Nisha V.M. for her enormous help

and extensive support. I would like to thank our dean Dr. L.

Jeganathan and our program manager Dr. Rajesh kanna, for

giving us this opportunity to present this paper.

REFERENCES

[1] Srihari Cadambi, and Michela Becchi Memory-Efcient

Regular Expression Search Using State Merging.

[2] J. Hopcroft and J Ullman ,Introduction to Automata

Theory, Languages,and Computation, Addison Wesley,

[3] Sailesh Kumar ,Sarang Dharmapurikar ,Fang Yu, Patrick

Crowley, Algorithms to Accelerate Multiple Regular

Expressions Matching for Deep Packet Inspection.

[4] M. Becchi and P. Crowley, An improved algorithm to

accelerate regular expression evaluation, In Proc. of ANCS

07, pages 145154, 2007.

[5] Gregorio Procissi , Fabio Vitucci, fabio , Gianni Antichi, Andrea Di Pietro ,An Improved DFA for Fast Regular

Expression Matching.

13mcs1043

Documents