busqueda de texto
DESCRIPTION
métodos de búsqueda de texto. análisis de algoritmosTRANSCRIPT
-
7/21/2019 Busqueda de texto
1/13
2004 Goodrich, Tamassia Pattern Matching 1
Pattern Matching
1
a b a c a a b
234
a b a c a b
a b a c a b
-
7/21/2019 Busqueda de texto
2/13
2004 Goodrich, Tamassia Pattern Matching 2
StringsA string is a sequence ofcharacters
Eam!"es of strings# $a%a !rogram &TM' document ()A sequence ordinar* tet
An a"!ha+et is the setof !ossi+"e characters fora fami"* of strings
Eam!"e of a"!ha+ets# AS-- .nicode /0, 1 /A, , G, T
'etP+e a string of sie m A su+string P[i .. j]ofPis the
su+sequence of Pconsisting ofthe characters ith rans+eteen i andj
A !re of Pis a su+string ofthe t*!e P[0.. i]
A su5 of Pis a su+string ofthe t*!e P[i ..m 1]
Gi%en strings T6tet7 and P6!attern7, the !atternmatching !ro+"em consists of
nding a su+string of Tequa"toP
A!!"ications# tet editors search engines +io"ogica" research
-
7/21/2019 Busqueda de texto
3/13
2004 Goodrich, Tamassia Pattern Matching 3
8rute9:orce
Pattern MatchingThe +rute9force !atternmatching a"gorithmcom!ares the !attern Piththe tet Tfor each !ossi+"eshift of Pre"ati%e to T, unti"either a match is found, or a"" !"acements of the
!attern ha%e +een tried
8rute9force !atternmatching runs in time O(nm)
Eam!"e of orst case# T =aaa ah P =aaah ma* occur in images and
()A sequences un"ie"* in Eng"ish tet
AlgorithmBruteForceMatch(T, P)
Inputtext T of size nand patternP of size m
Outputstarting index of asubstring of Tequal toP or 1
if no such substring existsfor i 0 to n m
{ test shift iof the pattern }
j 0
whilej
-
7/21/2019 Busqueda de texto
4/13
2004 Goodrich, Tamassia Pattern Matching 4
8o*er9Moore &euristicsThe 8o*er9Moore;s !attern matching a"gorithm is +ased onto heuristics
'ooing9g"ass heuristic#om!arePith a su+sequence of Tmo%ing +acards
haracter9
-
7/21/2019 Busqueda de texto
5/13
2004 Goodrich, Tamassia Pattern Matching >
'ast9Cccurrence :unction
8o*er9Moore;s a"gorithm !re!rocesses the !attern Pandthe a"!ha+et to +ui"d the "ast9occurrence functionLma!!ing to integers, hereL(c)is dened as the "argest inde isuch thatP[i]=c or
1if no such inde eistsEam!"e#
={a, b, c, d} P=abacab
The "ast9occurrence function can +e re!resented +* anarra* indeed +* the numeric codes of the characters
The "ast9occurrence function can +e com!uted in timeO(m +s), here mis the sie of Pand sis the sie of
c a b c d
L(c) ! " # 1
-
7/21/2019 Busqueda de texto
6/13
2004 Goodrich, Tamassia Pattern Matching ?
mj
i
j l
$ $ $ $ $ $ a $ $ $ $ $ $
$ $ $ $ b a
$ $ $ $ b a
j
ase 1# j 1+ l
The 8o*er9Moore
A"gorithmAlgorithmBoyerMooreMatch(T, P,
)
L lastOccurenceFunction(P, )i m 1j m 1repeat
if T[i]=P[j]if j =0
return i { match at i}else
i i 1j j 1
else
{ character%&ump }l L[T[i]]i i +m' min(j 1+ l)j m 1
until i >n 1return 1 { no match }
m(1 +l)
i
jl
$ $ $ $ $ $
a $ $ $ $ $ $
$ a $ $ b $
$ a $ $ b $
1
+
l
ase 2# 1+ lj
-
7/21/2019 Busqueda de texto
7/13 2004 Goodrich, Tamassia Pattern Matching @
Eam!"e
1
a b a c a a b a d c a b a c a b a a b b
234
>
?
@
AB1012
a b a c a b
a b a c a b
a b a c a b
a b a c a b
a b a c a b
a b a c a b
1113
-
7/21/2019 Busqueda de texto
8/13 2004 Goodrich, Tamassia Pattern Matching
Ana"*sis
8o*er9Moore;s a"gorithmruns in time O(nm +s)
Eam!"e of orst case# T =aaa a
P =baaa
The orst case ma* occurin images and ()Asequences +ut is un"ie"*in Eng"ish tet
8o*er9Moore;s a"gorithm issignicant"* faster thanthe +rute9force a"gorithmon Eng"ish tet
11
1
a a a a a a a a a
234>?
b a a a a a
b a a a a a
b a a a a a
b a a a a a
@AB1012
13141>1?1@1A
1B2021222324
-
7/21/2019 Busqueda de texto
9/13 2004 Goodrich, Tamassia Pattern Matching B
The DMP A"gorithm
Dnuth9Morris9Pratt;sa"gorithm com!ares the!attern to the tet in left-to-right, +ut shifts the
!attern more inte""igent"*than the +rute9forcea"gorithm
=hen a mismatch occurs,hat is the moste can
shift the !attern so as toa%oid redundantcom!arisonsF
Anser# the "argest !re ofP[0$$j]that is a su5 of P[1$$j]
x
j
$ $ a b a a b $ $ $ $ $
a b a a b a
a b a a b a
)o need tore!eat thesecom!arisons
esumecom!aring
here
-
7/21/2019 Busqueda de texto
10/13 2004 Goodrich, Tamassia Pattern Matching 10
DMP :ai"ure :unctionDnuth9Morris9Pratt;sa"gorithm !re!rocesses the!attern to nd matches of!rees of the !attern iththe !attern itse"f
The failure functionF(j)is dened as the sie ofthe "argest !re of P[0$$j]that is a"so a su5 of P[1$$j]
Dnuth9Morris9Pratt;s
a"gorithm modies the+rute9force a"gorithm sothat if a mismatch occursatP[j] T[i] e set j F(j 1)
j 0 1 # ! 5
P[j] a b a a b a
F(j) 0 0 1 1 3
x
j
$ $ a b a a b $ $ $ $ $
a b a a b a
F(j 1)
a b a a b a
-
7/21/2019 Busqueda de texto
11/13
2004 Goodrich, Tamassia Pattern Matching 11
The DMP A"gorithmThe fai"ure function can +ere!resented +* an arra*and can +e com!uted inO(m)time
At each iteration of the
hi"e9"oo!, either iincreases +* one, or
the shift amount i jincreases +* at "east one6o+ser%e that F(j 1)*j7
&ence, there are no morethan n iterations of thehi"e9"oo!
Thus, DMP;s a"gorithm runsin o!tima" time O(m +n)
AlgorithmMPMatch(T, P)
F !ailureFunction(P)i 0j 0while i 0j F[j 1]
elsei i +1
return 1 { no match }
-
7/21/2019 Busqueda de texto
12/13
2004 Goodrich, Tamassia Pattern Matching 12
om!uting the :ai"ure
:unctionThe fai"ure function can +ere!resented +* an arra*and can +e com!uted inO(m)time
The construction is simi"arto the DMP a"gorithm itse"f
At each iteration of thehi"e9"oo!, either iincreases +* one, or
the shift amount i jincreases +* at "east one6o+ser%e that F(j 1)*j7
&ence, there are no morethan m iterations of thehi"e9"oo!
Algorithm!ailureFunction(P)
F[0]0i 1j 0
while i 0 then{use failure function to shiftP}j F[j 1]
elseF[i]0 { no match }i i +1
-
7/21/2019 Busqueda de texto
13/13
2004 Goodrich, Tamassia Pattern Matching 13
Eam!"e
1
a b a c a a b a c a b a c a b a a b b
@
A
1B1A1@1>
a b a c a b
1?14
13
2 3 4 > ?
B
a b a c a b
a b a c a b
a b a c a b
a b a c a b
10 11 12
c
j 0 1 # ! 5
P[j] a b a c a b
F(j) 0 0 1 0 1 2