1 amihood amir bar-ilan university and georgia tech uwsl 2006

58
1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

Upload: preston-lamb

Post on 25-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

1

Amihood Amir

Bar-Ilan University

and

Georgia TechUWSL 2006

Page 2: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

2

Page 3: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

3

Issues of Concern: Local Errors:

- Occlusion - Transmission and resolution

- Details Scaling Rotation Integration of all above issues

Page 4: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

4

It seems daunting, but…

Page 5: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

5

CPM 2003: Morelia, Mexico

Page 6: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

6

Some History… String Matching –

motivated by text editing.

over alphabet

ntttT 10

mpppP 10

Page 7: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

7

Historic Two Dimensional Model:

nnnn

n

n

ttt

ttt

ttt

T

,1,0,

,11,10,1

,01,00,0

mmmm

m

m

ppp

ppp

ppp

P

,1,0,

,11,10,1

,01,00,0

.,...,0,;,...,0,;, ,, mlknjipt lkji

Page 8: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

8

Bird-Baker Algorithm (1976)

Time: for bounded fixed alphabets.

for infinite alphabets.

Technique: linearization.

)( 2nO

)log( 2 mnO

Page 9: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

9

Linearization Concatenate rows of Text (or

pattern) and use string matching tools.

In this case – The Aho and Corasick algorithm.

Page 10: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

10

Find all pattern rows…then align them.

Page 11: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

11

Another linearization-pad with “don’t cares”

n-mm

Time: Fischer-Paterson (1972))log( 2 mnO

Page 12: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

12

Advantages and Disadvantages of Model Pros: Can use known techniques.

Cons: - Complexity degradation (e.g. extra

log factor in exact matching). - Inherent difficulties in definitions

(will be addressed later).

Page 13: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

13

First Truly 2d Algorithm – The Dueling Method Idea: Assume the situation is: All potential pattern “starts” agree on overlap. A i.e. all want to see

the same symbol in every text location.

(A-Benson-Farach 1991)

Page 14: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

14

Dueling Method … Time for checking every text element’s

correctness: linear.

Every candidate with incorrect element in its range is eliminated.

Method: The “wave”.

Total Time: )( 2nO

Page 15: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

15

Dueling Method…

How do we arrange for candidates to agree on overlap? – duel!

A A A A A A AA A A A A A AA A A A A A AA A A A V A AA A A A A A AA A A A A A A

A A A A A A AA A A A A A AA A A A A A AA A A A V A AA A A A A A AA A A A A A A

When there is conflict between two candidates, a single text check eliminates at least one candidate.

The text location can be pre-computed because of transitivity.

The dueling phase is thus linear time.

Page 16: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

16

Discrete Scaling (A-Landau- Vishkin 1990)

In our limited model, the meaning of scaling is“blowing up” a symbol.

Example: scaling a symbol A by 3, means a 3x3 matrix

X X X X X XX X X X X XX X O O X XX X O O X XX X X X X XX X X X X X

X X XX O XX X X

A A AA A AA A A

Scaling the matrix by 2 gives:

Page 17: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

17

X X XX O XX X X

Scaled Occurrences of Pattern in Text:

X X X X X X X X X X X X X X X X

X X X X X X X X X X X X X X X X X XX X X X X X X X X O X X X X X X X XX X O O X X X X X X X X X X X X X XX X O O X X X X X X X X X X X X X XX X X X X X X X X X X X X X X X X XX X X X X X X X X X X X X X X X X XX X X X X X X X X X X X X X X X X XX X X X X X X O O O X X X X X X X XX X X X X X X O O O X X X X X X X XX X X X X X X O O O X X X X X X X XX X X X X X X X X X X X X X X X X XX X X X X X X X X X X X X X X X X XX X X X X X X X X X X X X X X X X XX X X X X X X X X X X X X X X X X X

Scale 1Scale 2

X X X X X XX X X X X XX X O O X XX X O O X XX X X X X XX X X X X X

Scale 3

X X X X X X X X XX X X X X X X X XX X X X X X X X XX X X O O O X X XX X X O O O X X XX X X O O O X X XX X X X X X X X XX X X X X X X X XX X X X X X X X X

Page 18: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

18

Discrete Scaling Algorithms A-Landau-Vishkin 90: Can find all

discrete scales of pattern in linear time (alphabet dependent).

A-Calinescu 94: Alphabet independent and dictionary linear-time discrete scaling algorithm.

Page 19: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

19

Tools used: For comparing substrings in constant

time: Suffix trees and LCA or Weiner 1973, Harel-Tarjan 1984

Suffix arrays and LCP. Kärkkäinen-Sanders 2003 For computing number of sub-row

repetitions in constant time: Range-Minimum queries. Gabow-Bentley-Tarjan 1984

Page 20: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

20

How is it used?

Do LCA query to find out that the orange line occurs here

How many times does this line repeat?

How is this done?

Page 21: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

21

Construct an array of numbers where every location is the length of the LCP of this row and the next

k0kkkkkkkkk00

To make sure that the orange line appears in this range, the minimum number in this range has to be greater than k.

Page 22: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

22

How do we know what scale the orange line has? Run-length compression.

Find the symbol part, then the repetition factor.

This idea led to the compressed matching paradigm…

AAABBCCCCDAAAABBBBBBCA

A B C DA B CA 3 2 4 1 4 6 1 1

Page 23: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

23

Compressed Matching Suppose the text (and pattern?) are

compressed. Examples: run-length of rows (fax). LZ78 of rows (gif).

Find pattern in text without decompressing.

A-Benson 92, A-Benson-Farach 94, A-Landau-Sokol 03(x2)

This led to a decade of work in the stringology and data compression community.

Page 24: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

24

Compressed Matching (very partial list from citeseer…)

Pattern Matching in Compressed Raster Images -Pajarola,Widmayer (1996)

Direct Pattern Matching on Compressed Text - de Moura, Navarro, Ziviani (1998)

A General Practical Approach to Pattern Matching over.. - Navarro, Raffinot (1998)

Randomized Efficient Algorithms for Compressed Strings: the.. - Gasieniec, al. (1996)

Approximate String Matching over Ziv-Lempel Compressed Text - Kärkkäinen, Navarro, Ukkonen (2000)

Pattern Matching Machine for Text Compressed Using Finite State.. - Takeda (1997)

…  

Page 25: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

25

Model Deficiencies. How do we scale to non-discrete

sizes? (e.g. 1.35)

How do we model rotations?

Page 26: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

26

A Model of Digitization (Landau-Vishkin 1994)

“Real-Life” resolution is fine enough to be assumed continuous.

This is dealt with by a discrete sampling of space done by, e.g. the camera.

Digitized sample

“Real life”

Page 27: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

27

Rotation (Fredriksson-Ukkonen 1998) Consider the text as a grid of pixels, each

having a color. Consider the pattern as an m x m grid of

pixels with colors. Assume the center of every pattern pixel

has a “hole”. Lay the pattern grid on the text, with the

center declared the “rotation pivot”.

Page 28: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

28

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

T[1,1] T[1,2] T[1,3]

T[2,1] T[2,2] T[2,3]

T[3,1] T[3,2] T[3,3]

T[5,4]

T[7,7]

7x7 text

Page 29: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

29

4 3 2 1 0

4 3 2 1 0

The rotation pivot

4x4 pattern

Page 30: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

30

7 6 5 4 3 2 1 0 8

7 6 5 4 3 2 1 0 8

45O

4x4 pattern over 8x8 text in location

)45),4,3(( 0

Page 31: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

31

7 6 5 4 3 2 1 0 8

7 6 5 4 3 2 1 0 8

Page 32: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

32

7 6 5 4 3 2 1 0 8

7 6 5 4 3 2 1 0 8

Page 33: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

33

7 6 5 4 3 2 1 0 8

7 6 5 4 3 2 1 0 8

Page 34: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

34

7 6 5 4 3 2 1 0 8

7 6 5 4 3 2 1 0 8

Page 35: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

35

7 6 5 4 3 2 1 0 8

7 6 5 4 3 2 1 0 8

Page 36: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

36

Rotated Matching Algorithms Fredrikkson-Ukkonen 1998: Filter. Good

expected time. worst case. Fredrikkson-Navarro-Ukkonen 2000: A-Butman-Crochemore-Landau-Schaps

2004: Proved that output size is

A-Kapah-Tsur 2004:

)( 52mnO

)( 32mnO

)( 22mnO)( 32mnO

Page 37: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

37

A Taste of handling Rotations

16151413

1211109

8765

4321

16,1514

1211109,13

8,4765

32,1

12161514

8111013

4769

3215

Naïve Idea: Try all possible rotated patterns. Examples:

16

121514

8111013

4769

325

1

Original 19 rotation 21 rotation 26 rotationo o o

Page 38: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

38

Proposed Solution Every rotated pattern can be found in

the text using FFT in time

If there are N rotated patterns the total time is

N What is N?

)log( 2 mnO

)log( 2 mnO

Page 39: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

39

Upper Bound There are pixels.

Each pixel center crosses at most grid lines.

Therefore there are different rotated patterns.

2m

m4

)( 3mO

Page 40: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

40-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

-8

-7

-6

-5

-4

-3

-2

-

1

0

1

2

3

4

5

6

7

8

O

Page 41: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

41-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

-8

-7

-6

-5

-4

-3

-2

-

1

0

1

2

3

4

5

6

7

8

O

Page 42: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

42-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

-8

-7

-6

-5

-4

-3

-2

-

1

0

1

2

3

4

5

6

7

8

O

Page 43: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

43-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

-8

-7

-6

-5

-4

-3

-2

-

1

0

1

2

3

4

5

6

7

8

O

Page 44: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

44

Lower Bound

Could many points cross a gridline together?

We will show: Lower Bound:

Restriction:

We consider only points in set P defined as follows.

)( 3m

Page 45: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

45

Our Subset of Consideration:

P is a subset of pattern coordinates Such that:1) The coordinates are in quadrant I

I2) The coordinates are only the points (x,y) where x and y are co-prime

Page 46: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

46

Key Lemma (A-Butman-Crochemore-Landau-

Schaps 2003)

PXX 21,

2X

it is impossible that

and cross a grid line at

the same rotation angle.

1X

Page 47: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

47-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

-8

-7

-6

-5

-4

-3

-2

-

1

0

1

2

3

4

5

6

7

8

X1Y1

X2

Y2

O Z

Page 48: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

48

How does it help?

)log(6

||||2

2

mmom

P

Theorem (Geometry):

i.e.

)(|||| 2mP

Page 49: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

49

4,0,),(|),(4

myxPyxyxP m

P

4mPP

Consider

Schematically: shaded area.

In shaded area there are points.

So in there are at least

points, i.e.

points.

16

2m

16

6 2

2

2 mm

)(16

96 222

2

mm

Page 50: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

50

Each of the points in

(the yellow area) crosses

the grid times and no two

of them cross together.

Conclude: There are

different rotated patterns.

)( 2m

4mPP

)(m

)( 3m

Page 51: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

51

Real Scaled Matching (A-Butman-Lewentein-Porat 2003, A-Chencinsky 2006)

Assume the text and pattern grids are the unit scale.

A scale up of the pattern increases the grid.

The center of the underlying unit grid takes the color of the scaled pattern pixel under it.

Page 52: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

52

Pattern

Pattern scaled continuously to 1.6

Pattern scaled continuously to 1.6 with superimposed unit grid

Pattern discretely scaled to 1.6

Page 53: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

53

Does This work?

We tried it on “Lenna”…

Page 54: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

54

Scale 1.3 Lenna

Original Lenna

Scale 2 Lenna

Page 55: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

55

Lenna Today

Page 56: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

56

Algorithm’s running time

For text size n x n

and

pattern size m x m:

)( 22mnO

Page 57: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

57

The Future? Faster rotation: did not utilize

pattern, did not utilize neighboring information.

Faster scaling. The holy grail – INTEGRATION. Compressed Matching: lossy

compressions.

Page 58: 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

58

THANK YOU