approximate search ofvarmo/tday-rouge/tammeoja-slides.pdf · approximate search of regular...
TRANSCRIPT
![Page 1: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/1.jpg)
Approximate Search ofRegular Expressions Using Bit-Parallel Algorithms
Kristo TammeojaJaak Vilo
Teooriapäevad Rõuges, 2007
![Page 2: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/2.jpg)
Contents
Regular expression (RE) syntaxGlushkov’s automatonExisting bit-parallel algorithms
Exact matchingApproximate matching
New feature addedError-free regions
2
![Page 3: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/3.jpg)
Regular expression
Syntax(, )|Quantifier
*, +, ?, {m,n}, {m,}
Character classes (example [a-z])
3
![Page 4: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/4.jpg)
Regular expression
Syntax(, )|Quantifier
*, +, ?, {m,n}, {m,}
Character classes (example [a-z])
Matching as used in presentationRegular expression A*AAAAA matchBAAAC no match
4
![Page 5: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/5.jpg)
Regular expression1 error allowed
R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)*
1:R(E|G)<EX>*
5
![Page 6: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/6.jpg)
Regular expression1 error allowed
R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)*
R E
R G
R E E X
1:R(E|G)<EX>*R G E X
R E E X E X
6
![Page 7: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/7.jpg)
Regular expression1 error allowed
R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)*
R RR E
R G
R E E X
R G E X
R E E X E X
R
R E X
R E G E X
R E E E X E X
R E E R X E X
subst.
G
E
del.
1:R(E|G)<EX>*
ins.
7
![Page 8: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/8.jpg)
Regular expression1 error allowed
R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)*
R RR E no match
R G
R E E X
R G E X
R E E X E X
R
R E X
R E G E X
R E E E X E X
R E E R X E X
subst.
no matchG
E
del.match
1:R(E|G)<EX>*
ins.
8
![Page 9: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/9.jpg)
Regular expression1 error allowed
R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)*
R RR E no match
R G
R E E X
R G E X
R E E X E X
R
R E X
R E G E X
R E E E X E X
R E E R X E X
subst.
no matchG
E
del.match
1:R(E|G)<EX>*match
ins. match
no match
9
![Page 10: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/10.jpg)
Glushkov’s automaton
R ( E | G ) ( E X ) *
10
![Page 11: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/11.jpg)
Glushkov’s automaton
Character in RE = state in automaton
R E G E X
R ( E | G ) ( E X ) *
11
![Page 12: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/12.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RE
R ( E | G ) ( E X ) *
R E G E X
12
![Page 13: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/13.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
E G E XRR...
13
![Page 14: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/14.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
R
R
E G E XR...
14
![Page 15: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/15.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
R
R
E G E XR...
15
![Page 16: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/16.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
R E G
E
G
R
E XRE...RG...
16
![Page 17: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/17.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
R E G
E
G
R
E XRE...
17
![Page 18: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/18.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
R E G E
E
G
ER
XREE...
18
![Page 19: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/19.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
R E G E
E
G
E
E
R
XRGE...
19
![Page 20: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/20.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
R E G E X
R E
G
E
E XRGEX...
20
![Page 21: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/21.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
R E G E X
R E
G
E
E X
E
RGEXE...
21
![Page 22: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/22.jpg)
Glushkov’s automaton
Character in RE = state in automaton+ one state for the beginning of the RETransitions show which characters/positions can precede each other
R ( E | G ) ( E X ) *
R E G E X
R E
G
E
E X
E
22
![Page 23: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/23.jpg)
Glushkov’s automaton
All labels entering a node are labeled by the same character
R ( E | G ) ( E X ) *
R E G E X
R E
G
E
E X
E
23
![Page 24: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/24.jpg)
Glushkov’s automaton
All labels entering a node are labeled by the same character
R ( E | G ) ( E X ) *
R E G E X
R E
G
E
E X
E
24
![Page 25: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/25.jpg)
Glushkov’s automaton
All labels entering a node are labeled by the same character
for example after reading character ‘E’only states with label ‘E’ can be active
R E G E X
R E
G
E
E X
E
25
![Page 26: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/26.jpg)
Exact search
Simulation of NFA = changing active states based on the character read from the textWe use bit-vectors (one bit for each state) to hold active states
δ(D, a)D – bit-vector of active statesa – character readReturns new bit-vector
2|D| · |Σ| different sets of parameters|D| – number of states in automaton|Σ| - alphabet's size
26
![Page 27: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/27.jpg)
Exact search
“After reading character ‘E’ only states with label ‘E’ can be active” so ...δ(D, a) = T[D] & B[a]
T[D] – states that can be reachedfrom states in D by any character
B[a] – states that can be reachedby character a
27
![Page 28: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/28.jpg)
Exact search
δ(D, a) = T[D] & B[a]
‘A’ 0111010‘B’‘C’
A A A B A C
A B C
A A AAA|AB|AC
a B[a] D T[D]
10000000100000...0101010...
28
![Page 29: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/29.jpg)
Exact search
δ(D, a) = T[D] & B[a]
‘A’ 0111010‘B’ 0000100‘C’
A A A B A C
A B C
A A AAA|AB|AC
a B[a] D T[D]
10000000100000...0101010...
29
![Page 30: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/30.jpg)
Exact search
δ(D, a) = T[D] & B[a]
‘A’ 0111010‘B’ 0000100‘C’ 0000001
A A A B A C
A B C
A A AAA|AB|AC
a B[a] D T[D]
10000000100000...0101010...
30
![Page 31: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/31.jpg)
Exact search
δ(D, a) = T[D] & B[a]
‘A’ 0111010‘B’ 0000100‘C’ 0000001
A A A B A C
A B C
A A AAA|AB|AC
a B[a]
1000000 01010100100000...0101010...
D T[D]
31
![Page 32: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/32.jpg)
Exact search
δ(D, a) = T[D] & B[a]
‘A’ 0111010‘B’ 0000100‘C’ 0000001
A A A B A C
A B C
A A AAA|AB|AC
a B[a]
1000000 01010100100000 0010000...0101010...
D T[D]
32
![Page 33: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/33.jpg)
Exact search
δ(D, a) = T[D] & B[a]
‘A’ 0111010‘B’ 0000100‘C’ 0000001
A A A B A C
A B C
A A AAA|AB|AC
a B[a]
1000000 01010100100000 0010000...0101010 0010101...
D T[D]
33
![Page 34: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/34.jpg)
Exact search
δ(D, a) = T[D] & B[a]
1000000 01010100100000 0010000...0101010 0010101...
‘A’ 0111010‘B’ 0000100‘C’ 0000001
A A A B A C
A B C
A A AAA|AB|AC
δ(0101010, ‘A’)a B[a] D T[D]
34
![Page 35: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/35.jpg)
Exact search
δ(D, a) = T[D] & B[a]
1000000 01010100100000 0010000...0101010 0010101...
‘A’ 0111010‘B’ 0000100‘C’ 0000001
A A A B A C
A B C
A A AAA|AB|AC
δ(0101010, ‘A’)a B[a] D T[D]0010101 T[D]
& 0111010 B[a]0010000
35
![Page 36: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/36.jpg)
Exact search
D ← 100..00 // initial state activeF ← bit-vector of final statesFor pos ∈ 1 ... n Do // scanning text
D ← T[D] & B[tpos]If D & F ≠ 000..00 Then match
End of For
36
![Page 37: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/37.jpg)
Approximate search
ErrorsInsertionDeletionSubstitution
37
![Page 38: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/38.jpg)
Approximate search
When searching with k errors we make k+1 replicas of the automaton, one for each error-levelPlus we need transitions for errors
R ER E
GG
EE
XX No errors
? ? ? ? ?
Up to 1 errorR ER E
GG
EE
XX
38
![Page 39: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/39.jpg)
Approximate search
R0, R1 – current bit-vectorsR0’, R1’ – bit-vectors after processing
character a
R0’ = T[R0] & B[c]R1’ = ?
39
![Page 40: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/40.jpg)
Approximate search
R1’ = T[R1] & B[c] | ...
Same as in exact search
no errors
EGEX
R ER E
GG
EE
XX No errors
Up to 1 errorR ER E
GG
EE
XX
40
![Page 41: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/41.jpg)
Approximate search
R1’ = T[R1] & B[c] | R0 | ...
Active states remain the same
no errors del
RAEGEX
R E G E XR E G E X No errors
R E G E X Up to 1 errorR E G E X
Σ Σ Σ Σ Σ Σ
41
![Page 42: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/42.jpg)
Approximate search
R1’ = T[R1] & B[c] | R0 | T[R0’] | ...
Insert new character after the current oneJust one step in automaton
no errors del ins
R E G E XR E G E X No errors
Up to 1 errorR E G E XR E G E X
Σ Σ Σ Σ Σ Σε ε ε ε ε
REEX
42
![Page 43: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/43.jpg)
Approximate search
R1’ = T[R1] & B[c] | R0 | T[R0’] | T[R0]
Similar to exact matching except...... we don’t care about the character read
no errors del ins subst
R E G E XR E G E X No errors
Up to 1 errorR E G E XR E G E X
Σ Σ Σ Σ Σ Σε ε ε ε εΣ ΣΣΣ Σ
RAGEX
43
![Page 44: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/44.jpg)
Error-free regions
Approximate searchR1’ = T[R1] & B[c] | R0 | T [R0’] | T [R0]With no-error regionsR1’ = T[R1] & B[c] | R0 & I | Te[R0’] | Te[R0]
no errors del ins subst
44
![Page 45: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/45.jpg)
Error-free regions
Deletion from text
Make two copies of ‘C’Those characters that error-free regions can end with must be duplicated
A(BC)+B A<(BC)+>B A<BC>+BAXBCBCB match match matchABXCBCB match no match no matABCBCXB match match matchABCXBCB match no match match
45
![Page 46: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/46.jpg)
Error-free regions
A B C B
B CA(BC)+B A B
B
A B C1 C2 B
B BA
B
CError-free regionA<(BC)+>B
46
![Page 47: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/47.jpg)
Error-free regions
A B C1 C2 B
B B
A<(BC)+>B
A
B
C
A B C1 C2 B
B BA
B
CError-free region
R1’ = T[R1] & B[c] | R0 | T [R0’] | T [R0]R1’ = T[R1] & B[c] | ...
no errors
47
![Page 48: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/48.jpg)
Error-free regions
A B C1 C2 B
B B
A<(BC)+>B
A
B
C
A B C1 C2 B
B BA
B
CError-free region
Σ Σ Σ Σ Σ Σ
R1’ = T[R1] & B[c] | R0 | T [R0’] | T [R0]R1’ = T[R1] & B[c] | ???
no errors del
48
![Page 49: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/49.jpg)
Error-free regions
A B C1 C2 B
B B
A<(BC)+>B
A
B
C
A B C1 C2 B
B BA
B
CError-free region
Σ Σ Σ Σ
R1’ = T[R1] & B[c] | R0 | T [R0’] | T [R0]R1’ = T[R1] & B[c] | R0 & I | ...
I = 1 1 0 0 1 1
no errors del
49
![Page 50: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/50.jpg)
Error-free regions
A B C1 C2 B
B B
A<(BC)+>B
A
B
C
A B C1 C2 B
B BA
B
CError-free region
R1’ = T[R1] & B[c] | R0 | T [R0’] | T [R0]R1’ = T[R1] & B[c] | R0 & I | ???
no errors del ins
ε - transitions
50
![Page 51: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/51.jpg)
Error-free regions
A B C1 C2 B
B B
A<(BC)+>B
A
B
C
A B C1 C2 B
B BA
B
CError-free region
R1’ = T[R1] & B[c] | R0 | T [R0’] | T [R0]R1’ = T[R1] & B[c] | R0 & I | Te[R0’] | ...
no errors del ins
ε - transitions
51
![Page 52: Approximate Search ofvarmo/tday-rouge/tammeoja-slides.pdf · Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges,](https://reader034.vdocuments.us/reader034/viewer/2022043011/5fa4df0377bf290988418c64/html5/thumbnails/52.jpg)
Error-free regions
A B C1 C2 B
B B
A<(BC)+>B
A
B
C
A B C1 C2 B
B BA
B
CError-free region
R1’ = T[R1] & B[c] | R0 | T [R0’] | T [R0]R1’ = T[R1] & B[c] | R0 & I | Te[R0’] | Te[R0]
no errors del ins subst
Σ - transitions
52