busqueda de texto

Upload: gonzalo-vasquez

Post on 21-Mar-2016

214 views

Category:

Documents


0 download

DESCRIPTION

métodos de búsqueda de texto. análisis de algoritmos

TRANSCRIPT

  • 7/21/2019 Busqueda de texto

    1/13

    2004 Goodrich, Tamassia Pattern Matching 1

    Pattern Matching

    1

    a b a c a a b

    234

    a b a c a b

    a b a c a b

  • 7/21/2019 Busqueda de texto

    2/13

    2004 Goodrich, Tamassia Pattern Matching 2

    StringsA string is a sequence ofcharacters

    Eam!"es of strings# $a%a !rogram &TM' document ()A sequence ordinar* tet

    An a"!ha+et is the setof !ossi+"e characters fora fami"* of strings

    Eam!"e of a"!ha+ets# AS-- .nicode /0, 1 /A, , G, T

    'etP+e a string of sie m A su+string P[i .. j]ofPis the

    su+sequence of Pconsisting ofthe characters ith rans+eteen i andj

    A !re of Pis a su+string ofthe t*!e P[0.. i]

    A su5 of Pis a su+string ofthe t*!e P[i ..m 1]

    Gi%en strings T6tet7 and P6!attern7, the !atternmatching !ro+"em consists of

    nding a su+string of Tequa"toP

    A!!"ications# tet editors search engines +io"ogica" research

  • 7/21/2019 Busqueda de texto

    3/13

    2004 Goodrich, Tamassia Pattern Matching 3

    8rute9:orce

    Pattern MatchingThe +rute9force !atternmatching a"gorithmcom!ares the !attern Piththe tet Tfor each !ossi+"eshift of Pre"ati%e to T, unti"either a match is found, or a"" !"acements of the

    !attern ha%e +een tried

    8rute9force !atternmatching runs in time O(nm)

    Eam!"e of orst case# T =aaa ah P =aaah ma* occur in images and

    ()A sequences un"ie"* in Eng"ish tet

    AlgorithmBruteForceMatch(T, P)

    Inputtext T of size nand patternP of size m

    Outputstarting index of asubstring of Tequal toP or 1

    if no such substring existsfor i 0 to n m

    { test shift iof the pattern }

    j 0

    whilej

  • 7/21/2019 Busqueda de texto

    4/13

    2004 Goodrich, Tamassia Pattern Matching 4

    8o*er9Moore &euristicsThe 8o*er9Moore;s !attern matching a"gorithm is +ased onto heuristics

    'ooing9g"ass heuristic#om!arePith a su+sequence of Tmo%ing +acards

    haracter9

  • 7/21/2019 Busqueda de texto

    5/13

    2004 Goodrich, Tamassia Pattern Matching >

    'ast9Cccurrence :unction

    8o*er9Moore;s a"gorithm !re!rocesses the !attern Pandthe a"!ha+et to +ui"d the "ast9occurrence functionLma!!ing to integers, hereL(c)is dened as the "argest inde isuch thatP[i]=c or

    1if no such inde eistsEam!"e#

    ={a, b, c, d} P=abacab

    The "ast9occurrence function can +e re!resented +* anarra* indeed +* the numeric codes of the characters

    The "ast9occurrence function can +e com!uted in timeO(m +s), here mis the sie of Pand sis the sie of

    c a b c d

    L(c) ! " # 1

  • 7/21/2019 Busqueda de texto

    6/13

    2004 Goodrich, Tamassia Pattern Matching ?

    mj

    i

    j l

    $ $ $ $ $ $ a $ $ $ $ $ $

    $ $ $ $ b a

    $ $ $ $ b a

    j

    ase 1# j 1+ l

    The 8o*er9Moore

    A"gorithmAlgorithmBoyerMooreMatch(T, P,

    )

    L lastOccurenceFunction(P, )i m 1j m 1repeat

    if T[i]=P[j]if j =0

    return i { match at i}else

    i i 1j j 1

    else

    { character%&ump }l L[T[i]]i i +m' min(j 1+ l)j m 1

    until i >n 1return 1 { no match }

    m(1 +l)

    i

    jl

    $ $ $ $ $ $

    a $ $ $ $ $ $

    $ a $ $ b $

    $ a $ $ b $

    1

    +

    l

    ase 2# 1+ lj

  • 7/21/2019 Busqueda de texto

    7/13 2004 Goodrich, Tamassia Pattern Matching @

    Eam!"e

    1

    a b a c a a b a d c a b a c a b a a b b

    234

    >

    ?

    @

    AB1012

    a b a c a b

    a b a c a b

    a b a c a b

    a b a c a b

    a b a c a b

    a b a c a b

    1113

  • 7/21/2019 Busqueda de texto

    8/13 2004 Goodrich, Tamassia Pattern Matching

    Ana"*sis

    8o*er9Moore;s a"gorithmruns in time O(nm +s)

    Eam!"e of orst case# T =aaa a

    P =baaa

    The orst case ma* occurin images and ()Asequences +ut is un"ie"*in Eng"ish tet

    8o*er9Moore;s a"gorithm issignicant"* faster thanthe +rute9force a"gorithmon Eng"ish tet

    11

    1

    a a a a a a a a a

    234>?

    b a a a a a

    b a a a a a

    b a a a a a

    b a a a a a

    @AB1012

    13141>1?1@1A

    1B2021222324

  • 7/21/2019 Busqueda de texto

    9/13 2004 Goodrich, Tamassia Pattern Matching B

    The DMP A"gorithm

    Dnuth9Morris9Pratt;sa"gorithm com!ares the!attern to the tet in left-to-right, +ut shifts the

    !attern more inte""igent"*than the +rute9forcea"gorithm

    =hen a mismatch occurs,hat is the moste can

    shift the !attern so as toa%oid redundantcom!arisonsF

    Anser# the "argest !re ofP[0$$j]that is a su5 of P[1$$j]

    x

    j

    $ $ a b a a b $ $ $ $ $

    a b a a b a

    a b a a b a

    )o need tore!eat thesecom!arisons

    esumecom!aring

    here

  • 7/21/2019 Busqueda de texto

    10/13 2004 Goodrich, Tamassia Pattern Matching 10

    DMP :ai"ure :unctionDnuth9Morris9Pratt;sa"gorithm !re!rocesses the!attern to nd matches of!rees of the !attern iththe !attern itse"f

    The failure functionF(j)is dened as the sie ofthe "argest !re of P[0$$j]that is a"so a su5 of P[1$$j]

    Dnuth9Morris9Pratt;s

    a"gorithm modies the+rute9force a"gorithm sothat if a mismatch occursatP[j] T[i] e set j F(j 1)

    j 0 1 # ! 5

    P[j] a b a a b a

    F(j) 0 0 1 1 3

    x

    j

    $ $ a b a a b $ $ $ $ $

    a b a a b a

    F(j 1)

    a b a a b a

  • 7/21/2019 Busqueda de texto

    11/13

    2004 Goodrich, Tamassia Pattern Matching 11

    The DMP A"gorithmThe fai"ure function can +ere!resented +* an arra*and can +e com!uted inO(m)time

    At each iteration of the

    hi"e9"oo!, either iincreases +* one, or

    the shift amount i jincreases +* at "east one6o+ser%e that F(j 1)*j7

    &ence, there are no morethan n iterations of thehi"e9"oo!

    Thus, DMP;s a"gorithm runsin o!tima" time O(m +n)

    AlgorithmMPMatch(T, P)

    F !ailureFunction(P)i 0j 0while i 0j F[j 1]

    elsei i +1

    return 1 { no match }

  • 7/21/2019 Busqueda de texto

    12/13

    2004 Goodrich, Tamassia Pattern Matching 12

    om!uting the :ai"ure

    :unctionThe fai"ure function can +ere!resented +* an arra*and can +e com!uted inO(m)time

    The construction is simi"arto the DMP a"gorithm itse"f

    At each iteration of thehi"e9"oo!, either iincreases +* one, or

    the shift amount i jincreases +* at "east one6o+ser%e that F(j 1)*j7

    &ence, there are no morethan m iterations of thehi"e9"oo!

    Algorithm!ailureFunction(P)

    F[0]0i 1j 0

    while i 0 then{use failure function to shiftP}j F[j 1]

    elseF[i]0 { no match }i i +1

  • 7/21/2019 Busqueda de texto

    13/13

    2004 Goodrich, Tamassia Pattern Matching 13

    Eam!"e

    1

    a b a c a a b a c a b a c a b a a b b

    @

    A

    1B1A1@1>

    a b a c a b

    1?14

    13

    2 3 4 > ?

    B

    a b a c a b

    a b a c a b

    a b a c a b

    a b a c a b

    10 11 12

    c

    j 0 1 # ! 5

    P[j] a b a c a b

    F(j) 0 0 1 0 1 2