13mcs1043
DESCRIPTION
memoryTRANSCRIPT
-
MEMORY EFFICIENT REGULAR
EXPRESSION BASED SEARCHING IN
NETWORKS
Prachi Sharma
School of Computer Science and Engineering,
VIT University, Chennai- 48
Abstract Regular expression based searching has became a
highly preferable searching technique. Traditional string
matching techniques are used for deep packet inspection in
networks, but all the traditional string matching techniques are
limited to some finite set of string. Regular expressions are well
known for their highly expressive power. Regular expression
describes a language by means of symbols and three operators
such as union, concatenation and kleene star. Pattern matching
plays a vital role in networks, but the memory capacity in
networks especially in routers is less compared to other
components. Deterministic finite automata (DFA) are used to
implementing regular expressions .This DFA representation of
regular expression require a large amount of memory, which
limits the capability of network application, so we need to reduce
the memory requirement of DFA. Delayed input DFA, and Delta
finite automata are existing method to implement DFA in
networks, but they have lot of limitations. This paper proposes a
new method which combines the advantages of delayed input DFA
and Delta finite automata to reduce the memory requirement of
DFA in network components.
Index terms-Delayed input DFA , Novel technique ,delta DFA.
1 INTRODUCTION
Regular expression based searching widely use in deep packet
inspection and in network intrusion detection system. In
addition layer 7 filters firewall and switches also use regular
expression based matching.
We can construct a minimum states DFA for a regular
expression but it still require a large amount of memory, if we
construct a non deterministic finite automata (NDFA) it will
require less memory then a deterministic finite automata (DFA)
but the disadvantage of NDFA is that, it will have more number
of transitions for its each input symbol .We use deterministic
finite automata (DFA) due to its deterministic nature , and it is
faster than non-deterministic finite automata (NDFA).
We can improve the memory requirement of a DFA by
reducing its size , for reducing its size we have two methods in
one we can reduce the number of transitions and in other we can
reduce the number of states. The techniques which has
proposed in past are delayed input DFA (D2FA) which is based
on reducing the number of transitions, and other technique is
the novel technique in which we can combine the common
destination states instead of reducing the number of transitions
,we reduce number of states. Delta finite automata takes the
advantage of delayed input DFA and deterministic finite
automata (DFA) ,it is the improvement over D2FA .By reducing
the memory requirement of DFA they become more flexible
and faster in software and hardware search engines
2 .RELATED WORK
This paper focuses on memory reduction of DFA, which is used
for implementing regular expression. Regular expression based
searching, is widely use in networking, for implementing
regular expression we use DFA. Deterministic finite automata
require larger memory, for reducing it we use memory
reduction techniques. We referred Sailesh Kumar Washington and Sarang Dharmapurikar papar Algorithms to Accelerate
Multiple Regular Expressions Matching for Deep Packet
Inspection [3] and we found new technique Delayed input
DFA which reduces the number of transitions for reducing
memory requirement of DFA. We can reduce the memory of a
DFA by reducing number of states also. In [1]
Memory-Efcient Regular Expression Search Using State
Merging, Michela Becchi, Srihari Cadambi author proposed
a new technique novel method for reducing memory by using
state merging.
3. DELAYED INPUT DFA(D2FA)
In D2FA we reduce the memory requirement by reducing the
-
number of transitions. In a DFA where we have two states with
common transition set, on same input symbol we can eliminate
this transitions of one state, by introducing a new default
transition between them. In a DFA a single input character
leads a single transition between two states but in delayed input
DFA ( D2FA) for a single input character we may have more
than one transitions between two states this is the disadvantage
of D2FA delayed input DFA to overcome this disadvantage of
delayed input DFA new technique has proposed which is delta
finite automata FA. Delta finite automata (FA) keeps the
advantage of D2FA and takes only one transitions for an input
character like a normal DFA for example we have a DFA in
figure 1.0
Figure 1.0
This DFA with 6 states and 13 transitions can accept a+
and
(b*(c
+(a(b+c)
*a+b(a+c)
*b)(a+b+c)
*)) regular expression. For
example this DFA is accepting bccbbc and caab language
In this DFA state 1 and state 2 have the common transition set
,for same input symbol so we can replace all the outgoing
transition of state 2 with a default transition, from state 2 to
state 1 as shown in the figure 1.1
Figure 1.1
This D2 FA is also accepting the language a
+ bccbbc and
caab language
This D2FA accepting the same language as previous DFA but
for each transition state 2 will lead a transition to state 1 for for
overcome on this problem of delayed DFA new technique has
introduced, FA.
5 .DELTA FINITE AUTOMATA(FA)
As we have already discussed that FA improve the D2FA by
introducing a label on default transition. We can apply FA
technique on D2FA for improving memory access requirement
.as we can see in the figure 1.2
Here we are introducing a new label on b because state 1 has a
loop on input symbol b this FA is same like a normal DFA in
looking.
Figure 1.2
This delta finite automata (FA) is also accepting the
language a+
bccbbc and caab language
This delta finite automata (FA) accepting the same language
as DFA shown in figure 1.0. Now we have reduced the number
of transitions to 11 ,in this diagram we have common
destination state 6 ,for state 4 and state 5, according to
merging technique we will merge them by introducing new
label on them as shown in figure 1.4.
6 PROPOSED WORK
This paper proposes a new technique for representing
deterministic finite automata (DFA), with existing
methodology D2FA and FA. Here we are focusing on memory
reduction of DFA by introducing a new memory reduction
technique ,which is based on merging of two non-equivalent
states. These merging states creates opportunity for other states
to be merged ,we merge states by introducing new labels on
transitions. Unlike other memory reduction techniques we have
no restrictions on transitions on which states leads to their
common destination.
After applying merging technique we use original state and
-
merged state term to represent the states of new Deterministic
finite automata (DFA) ,after applying these techniques some
states may be affecting and some states may remain unaffected
in the DFA as it is.
Figure 2.0 Normal DFA
Figure 2.1 Delayed DFA
Figure 2.3 Delta Finite Automata
Figure 1.4
In this DFA we are merging states 4 and 5 ,and we introducing a
new label 0 to all the transition of state 4 and label 1 to all the
transition of state 5. Now we have reduced the number of states
from 6 to 5 and transitions 9 from 13 and this DFA accepting
the same language as the DFA, shown in figure 1.0.We are
introducing a new combine Algorithm for reducing transitions
and states in a DFA.
Algorithm for N states
1. for (i=1 to N)
2. compare ith state with each state
if they have same transition set
{
I. Remove one state transition and make
one default transition between them
II. Introduce a new label on this default
transition
}
Else if they have common destination
{
Merge them and introduce new transition label
}
Else
{
I++
}
3. end.
CONCLUSION
Regular expression are widely use in networking field to
perform matching at faster speed . Deterministic finite
automata(DFA) is the simplest way of implementing regular
language but it require large amount of memory and we need
-
less amount of memory for this purpose in this paper we have
discussed, many representations of deterministic finite
automata such as delayed input DFA (D2FA) , in which we
reduced the number of transition by introducing new default
transition for the states which has the common transition set on
same input symbol but it has some disadvantage it may takes
several states for a single input to improve this problem we
discussed delta finite automata (FA) ,which takes the
advantage of delayed input DFA(D2FA) and leads to take only
one state traversal per input character .After this we discussed
novel technique in which we merged the two non-equivalent
states of common destination without worrying about same
input symbol we applied these all technique on a single DFA
and we reduced the memory requirement by 70 % .These all
technique use for faster matching in networks
ACKNOWLEDGEMENT
I am grateful to my guide Nisha V.M. for her enormous help
and extensive support. I would like to thank our dean Dr. L.
Jeganathan and our program manager Dr. Rajesh kanna, for
giving us this opportunity to present this paper.
REFERENCES
[1] Srihari Cadambi, and Michela Becchi Memory-Efcient
Regular Expression Search Using State Merging.
[2] J. Hopcroft and J Ullman ,Introduction to Automata
Theory, Languages,and Computation, Addison Wesley,
[3] Sailesh Kumar ,Sarang Dharmapurikar ,Fang Yu, Patrick
Crowley, Algorithms to Accelerate Multiple Regular
Expressions Matching for Deep Packet Inspection.
[4] M. Becchi and P. Crowley, An improved algorithm to
accelerate regular expression evaluation, In Proc. of ANCS
07, pages 145154, 2007.
[5] Gregorio Procissi , Fabio Vitucci, fabio , Gianni Antichi, Andrea Di Pietro ,An Improved DFA for Fast Regular
Expression Matching.