advisor: prof. r. c. t. lee reporter: z. h. pan
Post on 21-Jan-2016
42 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Bit-parallel approximate string matching algorithms with transposition
Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229
Advisor: Prof. R. C. T. Lee
Reporter: Z. H. Pan
2
This paper is based on “Fast text searching allowing errors” which there are three operation which are insertion, deletion and substitution. This paper uses a new operation transposition. We will use those operations to solve the approximate string matching problem under bit-parallel.
Transposition: It is transposing two adjacent characters in the string.
Example:
Insertion: cat → cast
Deletion: cat → at
Substitution: cat → car
Transposition: cta → cat
3
But now, first we will use the three operations insertion, deletion and substitution to apply in the approximate string matching problem. Then the transposition will be introduced after the three operations.
4
Our approximate string matching problem is defined as follows:
We are given a text T1,n, a pattern P1,m and an error bound k.
We want to find all locations Ti when 1≦i≦n such that there exists a suffix A of T1,i and d(A,P)≦k where d(x,y) is the edit distance (or difference) between x and y.
Example: T=deaabeg, P=aabac and k=2.
For i=5. T1,5= deaab.
We note that there exists a suffix A=aab of T1,5 such that d(A,P)=d(aab,aabac)=2.
5
T
P
S2
Let S be a substring of T.
If there exists a suffix S2 of S and a suffix P2 of P such that
d(S2, P2) = 0, and d(S1, P1) ≦k,
we have d(S, P) ≦ k.
S1
S
P1 P2
Our approach is based upon the following observation:
Case 1 :
6
Example:
A=addcd and B=abcd. k=2. We may decompose A and B as follows:A=add+cd.B=ab+cd.d(add,ab)=2.
Thus d(A,B)=2.
7
T
P
S2
Let S be a substring of T.
If there exists a suffix S2 of S and a suffix P2 of P such that
d(S2, P2) = 1, and d(S1, P1) ≦k-1,
we have d(S, P) ≦ k.
S1
S
P1 P2
Case 2 :
8
Example:
A=addb and B=dc. k=3. We may decompose A and B as follows:A=add+b.B=d+c.d(b,c)=1d(add,d)=2.
Thus d(A,B)=3.
9
To solve our approximate string matching problem, we use a table, called .
where 1≦i≦n and 1≦j≦m.
11000
a a b a a c a a b a c a b 11100
11110
11111
11111
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
11101
11010
11100
11110
11111
11011
11001
11100
Example:
T:aabaacaabacab, P:aabac and k=1.
Consider i=9, j=4.
T1,9=aabaacaab
P1,4=aaba
A=T7,9=aab
d(A,P1,4)=d(aab,aaba)=1
= 1 if there exists a suffix A of T1,i such that d(A,P1,j)≦k.= 0 otherwise.
),( jiRk
],[ mnRk
1)4,9(1 R
]5,13[1R
10
It will be shown later that is obtained from . Thus, it is important for us to know how to find . will be obtained by Shift-And algorithm later under bit-parallel. But now, we will explain it in dynamic programming [S80](String Matching with Errors).
can be found by dynamic programming.
for all i. for all j.
= 1 if and ti=pj .
= 0 if otherwise.
From the above, we can see that the ith column of can be obtained from the (i-1)th column of , ti and pj.
1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9
aabac
12345
a a b a a c a a b 100000
Example:
Text = aabaacaab
pattern = aabac
kR0R 0R
1kR
),(0 jiR
1)0,(0 iR 0),0(0 jR
)1,1(0 jiR
0R
0R
),(0 jiR
11
For instance, the 9th-colume is
0
0
0
1
1
0
0
1
0
012345
Example: T : aabaacaabacab, P : aabac and k=0.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
The 8th-colume is
12345
because and t9= p3 =‘ b’.
because and t9≠p2.
But, we do not use dynamic programming approach because we want to use the bit-parallel approach.
]5,13[0R
1)3,9(0 R
0)2,9(0 R
1)2,8(0 R
1)1,8(0 R
12
Let us denote the ith column of by whose length equals to m.
For instance
=( 1,0,0,1,0 )
=( 1,1,0,0,0 )
To find we consider two cases:
Case 1: j=1
If ti=pj,
otherwise,
Example:
T : aabaacaabacab,
P : aabac and k=0.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
Case 2: j>1
If and ti=pj,
otherwise,
]5,13[0R
kR kiR
))5,4(),4,4(),3,4(),2,4(),1,4(( 0000004 RRRRRR
05R
),,(0 jiR
;1),(0 jiR ;1),(0 jiR
0),(0 jiR 0),(0 jiR
1)1,1(0 jiR
13
We now define a right shift operation >>i on a vector A=(a1,a2,…,a
m) as follows:
That is, we delete the last i elements and add i 0’s at the front.
For instance, A=(1,0,0,1,1,1). Then A>>1=(0,1,0,0,1,1).
We use & and v to represent AND and OR operation. We further use to denote
For example
nmba ).,...,,...(nm
bbaa
)0,0,0,1(103
),...,,0,...0( 1 im
i
aaiA
14
Consider , m=5.
=( 1,0,0,1,0 )
Example:
T : aabaacaabacab,
P : aabac and k=0.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
]5,13[0R
09R
)0,0,1,0,0(09 R
)0,0,0,0,1()0,1,0,0,0(10)1( 409 R
15
Given an alphabet α and a pattern P, we need to know where α appears in P.
This information is contained in a vector Σ(α) which is defined as follows:Σ(α)[i]=1 if pi=α .Σ(α)[i]=0 if otherwise.
Example: P : aabacΣ(a) =( 1,1,0,1,0 )Σ(b) =( 0,0,1,0,0 )Σ(c) =( 0,0,0,0,1 )
16
Σ(a) =( 1,1,0,1,0 ) and p4=t10=‘a’
Since and p4=t10, we have .
Since , we have .
Using the following rule:
Example:
T : aabaacaabacab,
P : aabac and k=0.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
]5,13[0R
1)3,9(0 R 1)4,10(0 R
0)2,9(0 R 0)3,10(0 R
Case 1: j=1
If ti=pj,
otherwise,
Case 2: j>1
If and ti=pj,
otherwise,
;1),(0 jiR ;1),(0 jiR
0),(0 jiR 0),(0 jiR
1)1,1(0 jiR
17
Similarly, we can conclude that .1)1,10(0 R
]5,13[0RExample:
T : aabaacaabacab,
P : aabac and k=0.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
&)10)1(( 409
010 RR
=((0,0,0,1,0)v(1,0,0,0,0))&(1,1,0,1,0)
=(1,0,0,1,0)
Σ[t10]
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
18
Shift-And
In general we will update the i-th column of , which is , by using the formula for each new character of i . If , report a match at position i.
],[0 mnR0iR )(&)10)1(( 10
10
im
ii tRR
1),(0 miR
19
j=4
a a b a a c a1 2 3 4 5 6 7
a b a c8 9
a b. . .
a
a a
a a b a
a a b a c
j=1
j=2
j=3
j=5
a a b
Example:
Text = aabaacaabacab
pattern = aabac
when i=1
10000
11010
10000
&
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
110000
1 1 1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
a a b a a c a a b a c a b 100000
The Shift-And formula
Σ(a)
The operation “&” will examine whether the Σ(ti) equals to the pj or not currently.
100 10)1( mR
)(&)10)1(( 101
0i
mi tRR
01R
00R 0
1R
01R
0)2,1(0 R
0)3,1(0 R
0)4,1(0 R
0)5,1(0 R
1)1,1(0 R
20
j=4
a a b a a c a1 2 3 4 5 6 7
a b a c8 9
a b. . .
a
a a
a a b a
a a b a c
j=1
j=2
j=3
j=5
a a b
110000
1 1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
a a b a a c a a b a c a b 100000
111000
11000
11010
11000
&
The Shift-And formula
Example:
Text = aabaacaabacab
pattern = aabac
when i=2
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
Σ(a)
02R
02R0
1R
02R
1)2,2(0 R
0)3,2(0 R
0)4,2(0 R
0)5,2(0 R
1)1,2(0 R
)(&)10)1(( 101
0i
mi tRR
100 10)1( mR
21
j=4
a a b a a c a1 2 3 4 5 6 7
a b a c8 9
a b. . .
a
a a
a a b a
a a b a c
j=1
j=2
j=3
j=5
a a b
111000
1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
a a b a a c a a b a c a b 100000
100100
11100
00100
00100
&
110000
The Shift-And formula
Example:
Text = aabaacaabacab
pattern = aabac
when i=3
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
Σ(b)
03R
03R0
2R
03R
0)2,3(0 R
1)3,3(0 R
0)4,3(0 R
0)5,3(0 R
0)1,3(0 R
)(&)10)1(( 101
0i
mi tRR
100 10)1( mR
22
j=4
a a b a a c a1 2 3 4 5 6 7
a b a c8 9
a b. . .
a
a a
a a b a
a a b a c
j=1
j=2
j=3
j=5
a a b
100100
1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
a a b a a c a a b a c a b 100000
110010
10010
11010
10010
&
110000
111000
The Shift-And formula
Example:
Text = aabaacaabacab
pattern = aabac
when i=4
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
Σ(a)
04R
04R0
3R
04R
0)2,4(0 R
0)3,4(0 R
1)4,4(0 R
0)5,4(0 R
1)1,4(0 R
)(&)10)1(( 101
0i
mi tRR
100 10)1( mR
23
j=4
a a b a a c a1 2 3 4 5 6 7
a b a c8 9
a b. . .
a
a a
a a b a
a a b a c
j=1
j=2
j=3
j=5
a a b
110010
1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
a a b a a c a a b a c a b 100000
111000
11001
11010
11000
&
110000
111000
100100
The Shift-And formula
Example:
Text = aabaacaabacab
pattern = aabac
when i=5
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
Σ(a)
05R
04R 0
5R
05R
1)2,5(0 R
0)3,5(0 R
0)4,5(0 R
0)5,5(0 R
1)1,5(0 R
)(&)10)1(( 101
0i
mi tRR
100 10)1( mR
24
Example:
Text = aabaacaabacab
pattern = aabac 110000
111000
100100
110010
111000
100000
110000
111000
100100
110010
100001
110000
100000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
a a b a a c a a b a c a b
aabaacaabacab aabac
100000
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
25
Approximate String Matching
The approximate string matching problem is exact string matching problem with errors. We can use like Shift-And to achieve the
table. Here, we will introduce three operation which are insertion, deletion and substitution in approximate string matching.
Let , and denote the related to insertion, deletion and substitution respectively.
, and indicate that we can perform an insertion, deletion and substitution respectively without violating the error bound which is k.
],[ mnRk
),( jiRkI ),( jiRk
S),( jiRkD ),( jiRk
1),( jiRkI 1),( jiRk
S1),( jiRkD
where 1≦i≦n and 1≦j≦m.
= 1 if there exists a suffix A of T1,i such that d(A,P1,j)≦k.= 0 otherwise.
),( mnRk
26
If ti=pj , and
Otherwise,
Example:
T:aabaacaabacab, P:aabac and k=1.
Consider i=6 and j=5.
S=T1,6=aabaac and P1,5=aabac
∵t6=p5=‘c’ and R1(5,4)=1.
There exists a suffix A of S such that d(A,P) 1.≦
T:
P:
aabaa
aaba
i
j
i-1
c
c
j-1
.1),(,1)1,1( jiRjiR kk
).,(),(),(),( jiRjiRjiRjiR kS
kD
kI
k
1)5,6(1 R
27
if ti≠pj and
otherwise.
1),( jiRkI
0),( jiRkI
T:
P:
aabac
aabac
b
i
j
i-1
Example: when k=1.
T:
P:
aabac
aabac
b
binsertion
i
j
i-1
Insertion
.1),1(1 jiRk
28
Example: Text = aabaacaabacab. Pattern = aabac. k=1.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
(1) When i=6 and j=4.
t6=‘c’≠p4=‘a’,
(2) When i=11 and j=4.
t11=‘c’≠p4=‘a’,
10000
a a b a a c a a b a c a b 11000
11100
11110
11010
11001
11000
11000
11100
11110
10011
11001
10100
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
if ti≠pj and
otherwise.
1),( jiRkI
0),( jiRkI
.1),1(1 jiRk
]5,13[0R
0)4,5(0 R
0)4,6(1 IR
1)4,10(0 R
0)4,11(1 IR]5,13[IR
29
Example: Text = aabaacaabacab, Pattern = aabac and k=1
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
The insertion formula:
11101
00001
00001
11000
11001
Σ(c)&
v
aabaacaabacabaabaac insertion
11110
11010
11010
00100
11110
Σ(a)&
v
aabaacaabacabaaba insertion
10000
a a b a a c a a b a c a b 11000
11100
11110
11010
11001
11000
11000
11100
11110
10011
11001
10100
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
11
11 ))(&)10)1(((
kii
mki
ki RtRR
113 10)1( mR 1
6R14R
03R 0
5R
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
]5,13[0R ]5,13[1IR
115 10)1( mR
30
T:
P:
aabac
aabac bdeletion
i
jj-1
Example: when k=1.
T:
P:
aabac
aabac b
i
jj-1
Deletion
if ti≠pj and
otherwise.
1),( jiRkD
0),( jiRkD
.1)1,(1 jiRk
31
Example: Text = aabaacaabacab. Pattern = aabac. k=1.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
(1) When i=6 and j=4.
t6=‘c’≠p4=‘a’,
(2) When i=3 and j=4.
t3=‘b’≠p4=‘a’,
11000
a a b a a c a a b a c a b 11100
10110
11011
11100
10000
11000
11100
10110
11011
10001
11000
10100
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
if ti≠pj and
otherwise.
1),( jiRkD
0),( jiRkD
.1)1,(1 jiRk
0)4,6(1 DR
0)3,6(0 R
1)4,3(1 DR
1)3,3(0 R
]5,13[0R
]5,13[1DR
32
Example: Text = aabaacaabacab, Pattern = aabac and k=1.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
The deletion formula:
11000
a a b a a c a a b a c a b 11100
10110
11011
11100
10000
11000
11100
10110
11011
10001
11000
10100
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
11100
00100
00100
10000
10100
Σ(b)&
v
aabaacaabacab
aabdeletion
11110
00100
00100
10010
10110
Σ(b)&
v
aabaacaabacabaaba deletion
))10()1(())(&)10)1((( 1111
mk
iimk
iki RtRR
)10(1 112
mR113R
)10(1 103
mR )10(1 1013
mR
)10(1 1112
mR13R
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
]5,13[0R ]5,13[1DR
33
T:
P:
aabac
aabac a
b T:
P:
aabac
aabac bsubstitution
b
i
j
i-1
j-1
i
j
i-1
j-1
Example: when k=1.
Substitution
if ti≠pj and
otherwise.
1),( jiRkS
0),( jiRkS
.1)1,1(1 jiRk
34
Example: Text = aabaacaabacab. Pattern = aabac. k=1.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
(1) When i=6 and j=4.
t6=‘c’≠p4=‘a’,
(2) When i=5 and j=5.
t3=‘b’≠p4=‘a’,
10000
a a b a a c a a b a c a b 11000
11100
11010
11001
11100
11010
11000
11100
11010
11001
11000
11100
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
]5,13[0R
]5,13[1SR
if ti≠pj and
otherwise.
1),( jiRkS
0),( jiRkS
.1)1,1(1 jiRk
0)3,5(0 R
1)4,4(0 R
1)5,5(1 DR
0)4,6(1 DR
35
Example: Text = aabaacaabacab, Pattern = aabac and k=1.
10000
a a b a a c a a b a c a b 11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
The substitution formula:
10000
a a b a a c a a b a c a b 11000
11100
11010
11001
11100
11010
11000
11100
11010
11001
11000
11100
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
11100
00100
00100
11000
11100
&
v
aabaacaabacab
aabaabaacaabacab
cab11101
11010
11000
11001
11001
&
v
aabaacaabacabaabac aabaacaabacab
aabaa
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
]5,13[0R ]5,13[1sR
))10()1(())(&)10)1((( 111
11
mk
iimk
iki RtRR
1112 10)1( mR
104 10)1( mR
Σ(b)Σ(a)
114 10)1( mR
113R1
5R
1012 10)1( mR
36
The algorithm which contains insertion, deletion and substitution allows k error. In the physical meaning, we just combine the formulas of insertion, deletion and substitution. So now, we combine the three formula using the “or” operation as follows:
Initially,
insertion deletion substitution
The insertion :1
11
1 ))(&)10)1(((
k
iimk
iki RtRR
The deletion : ))10()1(())(&)10)1((( 1111
mk
iimk
iki RtRR
The substitution : ))10()1(())(&)10)1((( 111
11
mk
iimk
iki RTRR
kmkkR 010
111
1111 10)1()1()())(&)1((
mki
ki
kii
ki
ki RRRtRR
ti=pj
and 1)1,1( jiRk
37
Example: Text = aabaacaabacab pattern = aabac k=1
10000
11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
a a b a a c a a b a c a b 01111
11010
01010
00100
00010
01001
10000
11111
&
v
11000
a a b a a c a a b a c a b 11100
11110
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
11111
aabaacaabacabaabac
00001:
00000:10
00
R
R
]5,13[0R
]5,13[1R
← for deletion
14R 11
3 R
03R
103 R
104 R
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
110 m
Σ(a)
38
Example: Text = aabaacaabacab pattern = aabac k=1
10000
11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
a a b a a c a a b a c a b 01111
11010
01010
10010
01001
01100
10000
11111
&
v
aabaacaabacabaabaa
11000
a a b a a c a a b a c a b 11100
11110
11111
11111
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
00001:
00000:10
00
R
R
]5,13[0R
]5,13[1R
← for insertion
15R 1
4R
04R
04R
05R
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
110 m
Σ(a)
39
Example: Text = aabaacaabacab pattern = aabac k=1
10000
11000
00100
10010
11000
00000
10000
11000
00100
10010
00001
10000
00000
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
a a b a a c a a b a c a b 01111
00001
00001
11000
01100
00000
10000
11101
&
v
aabaacaabacab aab
11000
a a b a a c a a b a c a b 11100
11110
11111
11111
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
11101
aabaacaabacab aac
00001:
00000:10
00
R
R
]5,13[0R
]5,13[1R
← for substitution
16R 1
5R
05R
05R
06R
Σ(a)Σ(b)Σ(c)
*
11010001000000100000
110 m
Σ(c)
40
11000
a a b a a c a a b a c a b 11100
11110
11111
11111
1 2 3 4 5 6 7 8 9 10111213
aabac
12345
11101
11010
11100
11110
11111
11011
11001
11100
Example: Text = aabaacaabacab pattern = aabac k=1
00001:
00000:10
00
R
RΣ(a)Σ(b)Σ(c)
*
11010001000000100000
]5,13[1R
41
Transposition
Example: A=badec and B=bdace.
If transposition is allowed, we may transpose B as follow:
B=bdace → A=badec
Define for transposition:
if & ti=pj-1 & ti-1=pj
otherwise.
kTR
1),( jiRkT
0),( jiRkT
)2,2(1 jiRkT
42
Initially,
We combine the operations of insertion, deletion, substitution and transposition for k error. The formula is as follows:
)1)((&)(&)2( 12 iiki ttR
.0,01 10mkkmkk RR
1
11
1
11
1
10
)1(
)1(
)(
))(&)1((
m
ki
ki
ki
iki
R
R
R
tR
insertion
deletion
substitution
ti=pj and 1)1,1( jiRk
transposition
V
V
V
V
V
kiR
43
Example: T=abcbcaccbb, P=acbcb and k=1.
10000
00000
00000
00000
00000
10000
01000
00000
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b00000
00000
11000
11100
11110
10101
11010
11000
11100
11110
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b10111
10001
011100101001010000000000000000100000010011110
&
v
the original partial text:
a b c
a c b
transpose “bc” to be “cb”
b
a
c
00100001010010100100
&
a bc bcaccbba cb
a bc bcaccbba bc
k 1≦k 0≦
So we can tackle this within k 1.≦
+
0000000 R
1000010 R
0000001 R
Σ(c)
Σ(b)Σ(c)>>1
13R
112 R
02R
102 R
103 R
201 R
Σ(a)Σ(b)Σ(c)
*
10000001010101000000
110 m
]5,13[0R ]5,13[1R
44
Example: T=abcbcaccbb, P=acbcb and k=1.
10000
00000
00000
00000
00000
10000
01000
00000
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b00000
00000
11000
11100
11110
10101
11010
11000
11100
11110
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b10111
10001
011110010100101000000000000000100000001010111
&
v
the original partial text:
transpose “cb” to be “bc”
00010010100001000010
&
c c b
c b c
c
c
b
abcbcac cb bac bc
a
a
a
k 1≦k 0≦ +
So we can tackle this within k 1.≦
abcbcac cb bac cb
Σ(b)Σ(c)Σ(b)>>1
19R
118 R
08R
108 R
109 R
207 R
Σ(a)Σ(b)Σ(c)
*
10000001010101000000
110 m
]5,13[0R ]5,13[1R
0000000 R
1000010 R
0000001 R
45
Example: T=abcbcaccbb, P=acbcb and k=2.
11000
11100
11110
10101
11010
11000
11100
11110
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b10111
10001
11100
11110
11111
11111
11111
11111
11110
11111
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b11111
11111
011110101001010111000111001111100000010011111
&
v
00110001010010100100
&
a bc bcaccbba cb
the original partial text:a b c
a c b
transpose “bc” to be “cb”
b
a
c
k 1≦k 1≦
So we can tackle this within k 2.≦
+
a bc bcaccbba bc
1000010 R
1100020 R
0000021 R
Σ(c)
Σ(b)Σ(c)>>1
23R
122 R
12R
112 R
113 R
211 R
Σ(a)Σ(b)Σ(c)
*
10000001010101000000
110 m
]5,13[1R ]5,13[2R
46
Example: T=abcbcaccbb, P=acbcb and k=2.
11000
11100
11110
10101
11010
11000
11100
11110
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b10111
10001
11100
11110
11111
11111
11111
11111
11110
11111
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b11111
11111
011110010100101111100111101010100000001011111
&
v
the original partial text:
b c b
b b c
transpose “cb” to be “bc”c
b
b
00111010100001000010
&
a
a
a
ab cb caccbbac bc
ab cb caccbbab bc
k 1≦k 1≦ +
So we can tackle this within k 2.≦
1000010 R
1100020 R
0000021 R
Σ(b)Σ(c)Σ(b)>>1
24R
123 R
13R
113 R
114 R
212 R
Σ(a)Σ(b)Σ(c)
*
10000001010101000000
110 m
]5,13[1R ]5,13[2R
47
Example: T=abcbcaccbb, P=acbcb and k=2.
11000
11100
11110
10101
11010
11000
11100
11110
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b10111
10001
11100
11110
11111
11111
11111
11111
11110
11111
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b11111
11111
011110101001010101010101001101100000010111111
&
v
00111001010010100101
&
the original partial text:
c b c
c c b
transpose “cb” to be “bc”b
c
c
abc bc accbbc bc
abc bc accbba cb
k 1≦k 1≦ +
So we can tackle this within k 2.≦
1000010 R
1100020 R
0000021 R
Σ(c)Σ(b)Σ(c)>>1
25R
124 R
14R
114 R
115 R
213 R
Σ(a)Σ(b)Σ(c)
*
10000001010101000000
110 m
]5,13[1R ]5,13[2R
48
Example: T=abcbcaccbb, P=acbcb and k=2.
11000
11100
11110
10101
11010
11000
11100
11110
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b10111
10001
11100
11110
11111
11111
11111
11111
11110
11111
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b11111
11111
011110101001010101010101001101100000010111111
&
v
00111001010010100101
&
the original partial text:
c b c
c c b
transpose “cb” to be “bc”b
c
c
abc bc accbbabc bc
abc bc accbbacb cb
k 1≦k 1≦ +
So we can tackle this within k 2.≦
b
b
b
a
a
a
1000010 R
1100020 R
0000021 R
Σ(c)Σ(b)Σ(c)>>1
25R
124 R
14R
114 R
115 R
213 R
Σ(a)Σ(b)Σ(c)
*
10000001010101000000
110 m
]5,13[1R ]5,13[2R
49
Example: T=abcbcaccbb, P=acbcb and k=2.
11000
11100
11110
10101
11010
11000
11100
11110
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b10111
10001
11100
11110
11111
11111
11111
11111
11110
11111
1 2 3 4 5 6 7 8 9 10
acbcb
12345
a b c b c a c c b b11111
11111
011110010100101101010101001101100000001011111
&
v
00111010100001000010
&
the original partial text:
c b c
c c b
transpose “cb” to be “bc”b
c
c
abcbcac cb bac bc
k 1≦k 1≦ +
So we can tackle this within k 2.≦
b
b
b
a
a
a
1000010 R
1100020 R
0000021 R
Σ(b)Σ(c)Σ(b)>>1
29R
124 R
14R
114 R
115 R
217 R
Σ(a)Σ(b)Σ(c)
*
10000001010101000000
110 m
]5,13[1R ]5,13[2R
50
Time complexity
)( nw
mkO
k: error value m: the length of patternn : the length of textw : the size of word of computer
51
Reference
[WM92] Fast text searching: allowing errors, Wu and Udi Manber, Communications of the ACM, Vol. 35, 1992, pp. 83-91
52
~ Thanks for your attention ~
top related