smawk. revise global alignment (revise) alignment graph for s = aacgacga, t = ctacgaga complexity:...

Post on 26-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SMAWK

REVISE

Global alignment (Revise)

ag

a

g

c

a

t

c

agcagcaa 31

1

2

3

5

4 65 7 80

7

6

8

2

4

Alignment graph for S = aacgacga, T = ctacgaga

Complexity: O(n2)

V(i,j) = max {V(i-1,j-1) + (S[i], T[j]),V(i-1,j) + (S[i], -),V(i,j-1) + (-, T[j])

}

DIST and OUT matrix (Revise)

O

g

a

gca

G0

20

1

2 3 4

13

4

55

I

DIST matrix OUT matrixI (input borders)

Block – sub-sequences “acg”, “ag”

0 1 2 3 4 5

I0 0 -1 -2 -3 △ △

I1 -1 -1 -2 -1 -3 △

I2 -2 0 0 1 -1 -3

I3 △ -2 -2 0 -2 -2

I4 △ △ -2 0 -1 -1

I5 △ △ △ -2 -1 0

0 1 2 3 4 5

1 0 -1 -2 - -

1 1 0 1 -1 -

1 3 3 4 2 0

-12 0 0 2 0 0

-13 -13 -1 1 0 0

-14 -14 -14 1 2 3

I0=1

I1=2

I2=3

I3=2

I4=1

I5=3

O0 O1 O2 O3 O4 O5

1 3 3 4 2 3

max col

Compute O without explicit OUT

O

g

a

gca

G0

20

1

2 3 4

13

4

55

I

DIST matrix I (input borders)

Block – sub-sequences “acg”, “ag”

0 1 2 3 4 5

I0 0 -1 -2 -3 △ △

I1 -1 -1 -2 -1 -3 △

I2 -2 0 0 1 -1 -3

I3 △ -2 -2 0 -2 -2

I4 △ △ -2 0 -1 -1

I5 △ △ △ -2 -1 0

I0=1

I1=2

I2=3

I3=2

I4=1

I5=3

O0 O1 O2 O3 O4 O5

1 3 3 4 2 3

SMAWK

• Aggarwal, Park and Schmidt observed that DIST and OUT matrices are Monge arrays.

• Definition: a matrix M[0…m,0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d1. Convex condition:

M[a,c]M[b,c]M[a,d]M[b,d].2. Concave condition:

M[a,c]M[b,c]M[a,d]M[b,d].

SMAWK

• Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find

all row and column maxima of a totally monotone matrixby querying only O(n) elements of the matrix.

Presentation Outline

• What is Monge arrays?– Monge Totally monotone

• Why DIST alignment matrix is Monge arrays?

• How to compute totally monotone arrays efficiently?– SMAWK

• Given a totally monotone arrays• Compute all columns maxima in O(n)

MONGE AND TOTALLY MONOTONE PROPERTIES

Monge

• A matrix M[0…m, 0…n] is Monge if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1. M[a, c] + M[b, d] M[a, d] + M[b, c]2. M[a, c] + M[b, d] M[a, d] + M[b, c]

c d z

a M[a,c] M[a,d] …

b M[b,c] M[b,d]x … …

Totally monotone

• A matrix M[0…m, 0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1. Convex condition:

M[a,c]M[b,c] M[a,d]M[b,d]2. Concave condition:

M[a,c]M[b,c] M[a,d]M[b,d]• Monge Totally monotone

c d z

a M[a,c] M[a,d] …

b M[b,c] M[b,d]x … …

Intuition

• Monge: Quadrangle inequality:

a

cb

d

xz

c d z

a M[a,c] M[a,d] …

b M[b,c] M[b,d]

x … …

M[a, c] + M[b, d] M[a, d] + M[b, c]

History

• Computational Geometry• All nearest neighbor problem– Shamos and Hoey proved (n log n) in 1975

• All farthest neighbor problem– F.P.Reparata proved (n log n) in 1977

• All farthest neighbor problem in convex polygon– Lee and Preparata proved O(n) in 1978

SMAWK

• Aggarwal et.al. proved O(n) for farthest in convex polygon in 1987

• Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find

all row and column maxima of a totally monotone matrixby querying only O(n) elements of the matrix.

DIST AND OUT MATRICES

• Assumption– row and column maxima of a

totally monotone matrixcan be computed in O(n)

• Why DIST and OUT matrices of the alignment problem is totally monotone?

DIST and OUT matrix (Revise)

O

g

a

gca

G0

20

1

2 3 4

13

4

55

I

DIST matrix OUT matrixI (input borders)

Block – sub-sequences “acg”, “ag”

0 1 2 3 4 5

I0 0 -1 -2 -3 △ △

I1 -1 -1 -2 -1 -3 △

I2 -2 0 0 1 -1 -3

I3 △ -2 -2 0 -2 -2

I4 △ △ -2 0 -1 -1

I5 △ △ △ -2 -1 0

0 1 2 3 4 5

1 0 -1 -2 - -

1 1 0 1 -1 -

1 3 3 4 2 0

-12 0 0 2 0 0

-13 -13 -1 1 0 0

-14 -14 -14 1 2 3

I0=1

I1=2

I2=3

I3=2

I4=1

I5=3

O0 O1 O2 O3 O4 O5

1 3 3 4 2 3

max col

Compute O without explicit OUT

O

g

a

gca

G0

20

1

2 3 4

13

4

55

I

DIST matrix I (input borders)

Block – sub-sequences “acg”, “ag”

0 1 2 3 4 5

I0 0 -1 -2 -3 △ △

I1 -1 -1 -2 -1 -3 △

I2 -2 0 0 1 -1 -3

I3 △ -2 -2 0 -2 -2

I4 △ △ -2 0 -1 -1

I5 △ △ △ -2 -1 0

I0=1

I1=2

I2=3

I3=2

I4=1

I5=3

O0 O1 O2 O3 O4 O5

1 3 3 4 2 3

SMAWK

DIST is Monge

O

g

a

gca

G0

20

1

2 3 4

13

4

55

I

DIST is Monge array

• Monge• M[a, c] + M[b, d] M[a, d] + M[b, c]

• Totally monotone by Concave condition:• M[a,c]M[b,c] M[a,d]M[b,d]

Comment on this approach

• Advantages– Easy to parallelize– Easy to combine

• Disadvantages– Need to compute/keep more information

Applications

• Parallel sequence alignment– O(log m log n) time – Using O(m n / log m) processors (CREW PRAM)

• Best non-overlapping alignment score– O(n2 log2 n) time

• Tandem approximate repeat– O(n2 log n) time

• Common Substring Alignment

SMAWK

0 1 2 3 4 5 6 7 8 9

1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

[a b][c d]

Find all column mimimas of the following totally monotone arrays

b < d a < cb = d a c

0 1 2 3 4 5 6 7 8 9

1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

[a b][c d]

a > c b > da = c b d

Find all column mimimas of the following totally monotone arrays

b < d a < cb = d a c

0 1 2 3 4 5 6 7 8 9

1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

[a b][c d]

a > c b > da = c b d

b < d a < cb = d a c

Observation 1

0 1 2 3 4 5 6 7 8 9

1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

[a b][c d]

a > c b > da = c b d

Observation 2

b < d a < cb = d a c

0 1 2 3 4 5 6 7 8 9

1 25 42 57 78 90 103 123 142 151

2 21 35 48 65 76 85 105 123 130

3 13 26 35 51 58 67 86 100 104

4 10 20 28 42 48 56 75 86 88

5 20 29 33 44 49 55 73 82 80

6 13 21 24 35 39 44 59 65 59

7 19 25 28 38 42 44 57 61 52

8 35 37 40 48 48 49 62 62 49

9 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 29

11 58 56 54 55 47 41 50 47 29

12 66 64 61 61 51 44 52 45 24

13 82 76 72 70 56 49 55 46 23

14 99 91 83 80 63 56 59 46 20

15 124 116 107 100 80 71 72 58 28

16 133 125 113 106 86 75 74 59 25

17 156 146 131 120 97 84 80 65 31

18 178 164 146 135 110 96 92 73 39

[a b][c d]

a > c b > da = c b d

• SMAWK is a recursive algorithm of 2 steps– REDUCE– INTERPOLATE

b < d a < cb = d a c

0 1 2 3 4 5 6 7 8 9

1 25 42 57 78 90 103 123 142 151

2 21 35 48 65 76 85 105 123 130

3 13 26 35 51 58 67 86 100 104

4 10 20 28 42 48 56 75 86 88

5 20 29 33 44 49 55 73 82 80

6 13 21 24 35 39 44 59 65 59

7 19 25 28 38 42 44 57 61 52

8 35 37 40 48 48 49 62 62 49

9 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 29

11 58 56 54 55 47 41 50 47 29

12 66 64 61 61 51 44 52 45 24

13 82 76 72 70 56 49 55 46 23

14 99 91 83 80 63 56 59 46 20

15 124 116 107 100 80 71 72 58 28

16 133 125 113 106 86 75 74 59 25

17 156 146 131 120 97 84 80 65 31

18 178 164 146 135 110 96 92 73 39

[a b][c d]

a > c b > da = c b d

• SMAWK is a recursive algorithm of 2 steps– REDUCE– INTERPOLATE

• REDUCE removes rows• INTERPOLATE removes

half of the columns

b < d a < cb = d a c

0 1 2 3 4 5 6 7 8 9

1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528

9 37 36 37 42 39 39 51 50 3710 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528

9 42 39 39 51 50 3710 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528

9

10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528

9

10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015

16 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528

9

10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015

16 2517

18

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528

9

10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015

16 2517

18

REDUCE

0 1 2 3 4 5 6 7 8 9

1

2

3

4 10 20 28 42 48 56 75 86 885

6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528

9

10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015

16 2517

18

REDUCE

0 1 2 3 4 5 6 7 8 9

4 10 20 28 42 48 56 75 86 886 21 24 35 39 44 59 65 597 28 38 42 44 57 61 52

10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2016 25

REDUCE

0 1 2 3 4 5 6 7 8 9

4 10 20 28 42 48 56 75 86 886 21 24 35 39 44 59 65 597 28 38 42 44 57 61 52

10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2016 25

INTERPOLATE

Remove all odd indexed colums

0 2 4 6 8

4 20 42 56 866 21 35 44 657 38 44 61

10 42 33 4311 41 4712 44 4513 4614 4616

INTERPOLATE

0 2 4 6 8

4 20 42 56 866 21 35 44 657 38 44 61

10 42 33 4311 41 4712 44 4513 4614 4616

RECURSIVE

Find all row minima

0 1 2 3 4 5 6 7 8 9

4 10 20 28 42 48 56 75 86 886 21 24 35 39 44 59 65 597 28 38 42 44 57 61 52

10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2016 25

0 1 2 3 4 5 6 7 8 9

4 10 20 28 42 48 56 75 86 886 21 24 35 39 44 59 65 597 28 38 42 44 57 61 52

10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2016 25

0 1 2 3 4 5 6 7 8 9

4 10 20 286 24 35 397 42

10 35 33 44 43 2911 2912 2413 2314 2016 25

0 1 2 3 4 5 6 7 8 9

4 10 20 286 24 35 397 42

10 35 33 44 43 2911 2912 2413 2314 2016 25

0 1 2 3 4 5 6 7 8 9

1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37

10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39

APPROXIMATE TANDEM REPEATApplication of DIST and SMAWK

Tandem repeat

• IRQI QLWLR QIWIR LRQL

Social City

Observation

• Approximate tandem repeat– With the Mid-point c

– Alignments• start at column c• end at row c

c

c

0 n

n

• 4 cases– Cross column n/2– Cross row n/2– In side sub-triangle

[0,n/2]– In side sub-triangle

[n/2,n]

Algorithm

1. Find all repeats that cross– row n/2– column n/2

2. Recursively solve the – sub-array

[0..n/2, 0..n/2]– sub-array

[n/2..n, n/2..n]

c10n/2c2

c1

c2

c3

c3

n/2

Cross column n/2

• Combine– Best path from column c

to (k,n/2)– Best path from (k,n/2) to

row c

c

c

0 n

n

n/2

Cross column n/2

• Sub-problems:– DIST_col(c,n/2)[i,j]

– DIST_row(c,n/2)[i,j]

c10n/2c2

c1

c2

Cross column n/2

• DIST_col(c,n/2)[i,j] : O(n3) words

• Encode in array of binary trees • Using O(n2 log n) words • B[j,c] is a binary tree • B[j,c](i) is a leaf of the tree • Read an entry of DIST_col(c,n/2)[i,j] in O(log n)

c10n/2c2

c1

c2

Algorithm1. Find all repeats O(n2 logn)

– cross row n/2– column n/2

1. Recursively solve the – sub-array

[0..n/2, 0..n/2]– sub-array

[n/2..n, n/2..n]

c10n/2c2

c1

c2

c3

c3

n/2

References

• Aggarwal, A. and Park, J. Notes on Searching in Multidimensional Monotone Arrays. IEEE

• Jeanette P. Schmidt. All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM.

• Lawrence L. Larmore. The SMAWK Algorithm. UNLV.• Apostolico, A. and Atallah, M.J. and Larmore, L.L. and

McFaddin, S.. Efficient Parallel Algorithms for String Editing and Related Problems. SIAM J. Comput.

• Landau, G.M. and Ziv-Ukelson, M. On the Common Substring Alignment Problem. J. of Algorithms

top related