13mcs1043

4
MEMORY EFFICIENT REGULAR EXPRESSION BASED SEARCHING IN NETWORKS Prachi Sharma School of Computer Science and Engineering, VIT University, Chennai- 48 Abstract Regular expression based searching has became a highly preferable searching technique. Traditional string matching techniques are used for deep packet inspection in networks, but all the traditional string matching techniques are limited to some finite set of string. Regular expressions are well known for their highly expressive power. Regular expression describes a language by means of symbols and three operators such as union, concatenation and kleene star. Pattern matching plays a vital role in networks, but the memory capacity in networks especially in routers is less compared to other components. Deterministic finite automata (DFA) are used to implementing regular expressions .This DFA representation of regular expression require a large amount of memory, which limits the capability of network application, so we need to reduce the memory requirement of DFA. Delayed input DFA, and Delta finite automata are existing method to implement DFA in networks, but they have lot of limitations. This paper proposes a new method which combines the advantages of delayed input DFA and Delta finite automata to reduce the memory requirement of DFA in network components. Index terms-Delayed input DFA , Novel technique ,delta DFA. 1 INTRODUCTION Regular expression based searching widely use in deep packet inspection and in network intrusion detection system. In addition layer 7 filters firewall and switches also use regular expression based matching. We can construct a minimum states DFA for a regular expression but it still require a large amount of memory, if we construct a non deterministic finite automata (NDFA) it will require less memory then a deterministic finite automata (DFA) but the disadvantage of NDFA is that, it will have more number of transitions for its each input symbol .We use deterministic finite automata (DFA) due to its deterministic nature , and it is faster than non-deterministic finite automata (NDFA). We can improve the memory requirement of a DFA by reducing its size , for reducing its size we have two methods in one we can reduce the number of transitions and in other we can reduce the number of states. The techniques which has proposed in past are delayed input DFA (D 2 FA) which is based on reducing the number of transitions, and other technique is the novel technique in which we can combine the common destination states instead of reducing the number of transitions ,we reduce number of states. Delta finite automata takes the advantage of delayed input DFA and deterministic finite automata (DFA) ,it is the improvement over D 2 FA .By reducing the memory requirement of DFA they become more flexible and faster in software and hardware search engines 2 .RELATED WORK This paper focuses on memory reduction of DFA, which is used for implementing regular expression. Regular expression based searching, is widely use in networking, for implementing regular expression we use DFA. Deterministic finite automata require larger memory, for reducing it we use memory reduction techniques. We referred Sailesh Kumar Washington and Sarang Dharmapurikar papar “Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection[3] and we found new technique Delayed input DFA which reduces the number of transitions for reducing memory requirement of DFA. We can reduce the memory of a DFA by reducing number of states also. In [1] Memory-Efcient Regular Expression Search Using State Merging”, Michela Becchi, Srihari Cadambi author proposed a new technique novel method for reducing memory by using state merging. 3. DELAYED INPUT DFA(D 2 FA) In D 2 FA we reduce the memory requirement by reducing the

Upload: apekshit-bhowate

Post on 13-Sep-2015

212 views

Category:

Documents


0 download

DESCRIPTION

memory

TRANSCRIPT

  • MEMORY EFFICIENT REGULAR

    EXPRESSION BASED SEARCHING IN

    NETWORKS

    Prachi Sharma

    School of Computer Science and Engineering,

    VIT University, Chennai- 48

    Abstract Regular expression based searching has became a

    highly preferable searching technique. Traditional string

    matching techniques are used for deep packet inspection in

    networks, but all the traditional string matching techniques are

    limited to some finite set of string. Regular expressions are well

    known for their highly expressive power. Regular expression

    describes a language by means of symbols and three operators

    such as union, concatenation and kleene star. Pattern matching

    plays a vital role in networks, but the memory capacity in

    networks especially in routers is less compared to other

    components. Deterministic finite automata (DFA) are used to

    implementing regular expressions .This DFA representation of

    regular expression require a large amount of memory, which

    limits the capability of network application, so we need to reduce

    the memory requirement of DFA. Delayed input DFA, and Delta

    finite automata are existing method to implement DFA in

    networks, but they have lot of limitations. This paper proposes a

    new method which combines the advantages of delayed input DFA

    and Delta finite automata to reduce the memory requirement of

    DFA in network components.

    Index terms-Delayed input DFA , Novel technique ,delta DFA.

    1 INTRODUCTION

    Regular expression based searching widely use in deep packet

    inspection and in network intrusion detection system. In

    addition layer 7 filters firewall and switches also use regular

    expression based matching.

    We can construct a minimum states DFA for a regular

    expression but it still require a large amount of memory, if we

    construct a non deterministic finite automata (NDFA) it will

    require less memory then a deterministic finite automata (DFA)

    but the disadvantage of NDFA is that, it will have more number

    of transitions for its each input symbol .We use deterministic

    finite automata (DFA) due to its deterministic nature , and it is

    faster than non-deterministic finite automata (NDFA).

    We can improve the memory requirement of a DFA by

    reducing its size , for reducing its size we have two methods in

    one we can reduce the number of transitions and in other we can

    reduce the number of states. The techniques which has

    proposed in past are delayed input DFA (D2FA) which is based

    on reducing the number of transitions, and other technique is

    the novel technique in which we can combine the common

    destination states instead of reducing the number of transitions

    ,we reduce number of states. Delta finite automata takes the

    advantage of delayed input DFA and deterministic finite

    automata (DFA) ,it is the improvement over D2FA .By reducing

    the memory requirement of DFA they become more flexible

    and faster in software and hardware search engines

    2 .RELATED WORK

    This paper focuses on memory reduction of DFA, which is used

    for implementing regular expression. Regular expression based

    searching, is widely use in networking, for implementing

    regular expression we use DFA. Deterministic finite automata

    require larger memory, for reducing it we use memory

    reduction techniques. We referred Sailesh Kumar Washington and Sarang Dharmapurikar papar Algorithms to Accelerate

    Multiple Regular Expressions Matching for Deep Packet

    Inspection [3] and we found new technique Delayed input

    DFA which reduces the number of transitions for reducing

    memory requirement of DFA. We can reduce the memory of a

    DFA by reducing number of states also. In [1]

    Memory-Efcient Regular Expression Search Using State

    Merging, Michela Becchi, Srihari Cadambi author proposed

    a new technique novel method for reducing memory by using

    state merging.

    3. DELAYED INPUT DFA(D2FA)

    In D2FA we reduce the memory requirement by reducing the

  • number of transitions. In a DFA where we have two states with

    common transition set, on same input symbol we can eliminate

    this transitions of one state, by introducing a new default

    transition between them. In a DFA a single input character

    leads a single transition between two states but in delayed input

    DFA ( D2FA) for a single input character we may have more

    than one transitions between two states this is the disadvantage

    of D2FA delayed input DFA to overcome this disadvantage of

    delayed input DFA new technique has proposed which is delta

    finite automata FA. Delta finite automata (FA) keeps the

    advantage of D2FA and takes only one transitions for an input

    character like a normal DFA for example we have a DFA in

    figure 1.0

    Figure 1.0

    This DFA with 6 states and 13 transitions can accept a+

    and

    (b*(c

    +(a(b+c)

    *a+b(a+c)

    *b)(a+b+c)

    *)) regular expression. For

    example this DFA is accepting bccbbc and caab language

    In this DFA state 1 and state 2 have the common transition set

    ,for same input symbol so we can replace all the outgoing

    transition of state 2 with a default transition, from state 2 to

    state 1 as shown in the figure 1.1

    Figure 1.1

    This D2 FA is also accepting the language a

    + bccbbc and

    caab language

    This D2FA accepting the same language as previous DFA but

    for each transition state 2 will lead a transition to state 1 for for

    overcome on this problem of delayed DFA new technique has

    introduced, FA.

    5 .DELTA FINITE AUTOMATA(FA)

    As we have already discussed that FA improve the D2FA by

    introducing a label on default transition. We can apply FA

    technique on D2FA for improving memory access requirement

    .as we can see in the figure 1.2

    Here we are introducing a new label on b because state 1 has a

    loop on input symbol b this FA is same like a normal DFA in

    looking.

    Figure 1.2

    This delta finite automata (FA) is also accepting the

    language a+

    bccbbc and caab language

    This delta finite automata (FA) accepting the same language

    as DFA shown in figure 1.0. Now we have reduced the number

    of transitions to 11 ,in this diagram we have common

    destination state 6 ,for state 4 and state 5, according to

    merging technique we will merge them by introducing new

    label on them as shown in figure 1.4.

    6 PROPOSED WORK

    This paper proposes a new technique for representing

    deterministic finite automata (DFA), with existing

    methodology D2FA and FA. Here we are focusing on memory

    reduction of DFA by introducing a new memory reduction

    technique ,which is based on merging of two non-equivalent

    states. These merging states creates opportunity for other states

    to be merged ,we merge states by introducing new labels on

    transitions. Unlike other memory reduction techniques we have

    no restrictions on transitions on which states leads to their

    common destination.

    After applying merging technique we use original state and

  • merged state term to represent the states of new Deterministic

    finite automata (DFA) ,after applying these techniques some

    states may be affecting and some states may remain unaffected

    in the DFA as it is.

    Figure 2.0 Normal DFA

    Figure 2.1 Delayed DFA

    Figure 2.3 Delta Finite Automata

    Figure 1.4

    In this DFA we are merging states 4 and 5 ,and we introducing a

    new label 0 to all the transition of state 4 and label 1 to all the

    transition of state 5. Now we have reduced the number of states

    from 6 to 5 and transitions 9 from 13 and this DFA accepting

    the same language as the DFA, shown in figure 1.0.We are

    introducing a new combine Algorithm for reducing transitions

    and states in a DFA.

    Algorithm for N states

    1. for (i=1 to N)

    2. compare ith state with each state

    if they have same transition set

    {

    I. Remove one state transition and make

    one default transition between them

    II. Introduce a new label on this default

    transition

    }

    Else if they have common destination

    {

    Merge them and introduce new transition label

    }

    Else

    {

    I++

    }

    3. end.

    CONCLUSION

    Regular expression are widely use in networking field to

    perform matching at faster speed . Deterministic finite

    automata(DFA) is the simplest way of implementing regular

    language but it require large amount of memory and we need

  • less amount of memory for this purpose in this paper we have

    discussed, many representations of deterministic finite

    automata such as delayed input DFA (D2FA) , in which we

    reduced the number of transition by introducing new default

    transition for the states which has the common transition set on

    same input symbol but it has some disadvantage it may takes

    several states for a single input to improve this problem we

    discussed delta finite automata (FA) ,which takes the

    advantage of delayed input DFA(D2FA) and leads to take only

    one state traversal per input character .After this we discussed

    novel technique in which we merged the two non-equivalent

    states of common destination without worrying about same

    input symbol we applied these all technique on a single DFA

    and we reduced the memory requirement by 70 % .These all

    technique use for faster matching in networks

    ACKNOWLEDGEMENT

    I am grateful to my guide Nisha V.M. for her enormous help

    and extensive support. I would like to thank our dean Dr. L.

    Jeganathan and our program manager Dr. Rajesh kanna, for

    giving us this opportunity to present this paper.

    REFERENCES

    [1] Srihari Cadambi, and Michela Becchi Memory-Efcient

    Regular Expression Search Using State Merging.

    [2] J. Hopcroft and J Ullman ,Introduction to Automata

    Theory, Languages,and Computation, Addison Wesley,

    [3] Sailesh Kumar ,Sarang Dharmapurikar ,Fang Yu, Patrick

    Crowley, Algorithms to Accelerate Multiple Regular

    Expressions Matching for Deep Packet Inspection.

    [4] M. Becchi and P. Crowley, An improved algorithm to

    accelerate regular expression evaluation, In Proc. of ANCS

    07, pages 145154, 2007.

    [5] Gregorio Procissi , Fabio Vitucci, fabio , Gianni Antichi, Andrea Di Pietro ,An Improved DFA for Fast Regular

    Expression Matching.