two decades of algorithms for the frechet distanceadriemel/shonan2016.pdf · two decades of...

124
Two Decades of Algorithms for the Frechet distance Anne Driemel based on work of Bringmann, Künnemann, Har-Peled, Wenk, Agarwal, Fox, Pan, Ying and others. May 30, 2016

Upload: others

Post on 01-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Two Decades of Algorithmsfor the Frechet distance

Anne Driemel

based on work of Bringmann, Künnemann, Har-Peled, Wenk,Agarwal, Fox, Pan, Ying and others.

May 30, 2016

2/56

/ department of mathematics and computer science

Overview of this talk

I Sequence Alignment MeasuresI Quadratic lower bounds based on SETHI Strongly subquadratic-time algorithms under input assumptionsI Partial distance measures based on Fréchet distanceI Data structures for Fréchet queriesI Median curves and clustering under the Fréchet distance

3/56

/ department of mathematics and computer science

Overview of this talk

I Sequence Alignment Measures• for Strings: LCS, Edit distance• for Time Series: Dynamic time warping, Discrete Fréchet distance• for Curves: Continuous Fréchet distance

I Quadratic lower bounds based on SETHI Strongly subquadratic-time algorithms under input assumptionsI Partial distance measures based on Fréchet distanceI Data structures for Fréchet queriesI Median curves and clustering under the Fréchet distance

4/56

/ department of mathematics and computer science

Sequences, Time series and Trajectories

I Examples of sequences:• DNA sequencesa

• Text documentsI Examples of time series:

• Stock prices• Sensor data (weather,

manufacturing, IT infrastructure ..)• Click streams and search trends

I Examples of trajectories:• Trajectories of taxis, migrating

animals, sports players, ...• Gesture-trajectories on

touchscreensa[D’haeseleer, 2006]

Anne Driemel, TU Dortmund

1

5/56

/ department of mathematics and computer science

Sequence Alignment Measures—Strings

Let A and B be sequences of elements from a fixed alphabet.

LCS:Longest common subsequence of Aand B.

A = jellybeans

B = bellybutton

Edit distance:The minimal number of operationsneeded to transform A into B usingthe basic edit operations (insert,delete, and replace).

A = jellybeans→ bellybeans→ . . . → bellybuttons →bellybutton = B

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11 12

10

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1),Traversal:

12

10

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2),Traversal:

12

10

replace

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal:

12

10

replace

match

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4),

12

10

replace

match

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5),

12

10

replace

match

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6),

12

10

replace

match

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6), (7, 7),

12

10

replace

match

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6), (7, 7), (8, 7),

12

10

deletion

replace

match

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6), (7, 7), (8, 7),

(9, 7),

12

10

deletion

replace

match

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6), (7, 7), (8, 7),

(9, 7), (9, 8),

12

10

deletion

replace

match

insertion

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6), (7, 7), (8, 7),

(9, 7), (9, 8), (9, 8),

12

10

deletion

replace

match

insertion

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6), (7, 7), (8, 7),

(9, 7), (9, 8), (9, 8), (9, 10),

12

10

deletion

replace

match

insertion

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6), (7, 7), (8, 7),

(9, 7), (9, 8), (9, 8), (9, 10), (9, 11),

12

10

deletion

replace

match

insertion

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6), (7, 7), (8, 7),

(9, 7), (9, 8), (9, 8), (9, 10), (9, 11), (10, 12)

12

10

deletion

replace

match

insertion

6/56

/ department of mathematics and computer science

Sequence Alignment Measures—Traversals

j1 e2 l3 l4 y5 b6 e7 a8 n9 s10

b1 e2 l3 l4 y5 b6 u7 t8 t9 o10 n11

((1, 1), (2, 2), (3, 3),Traversal: (4, 4), (5, 5), (6, 6), (7, 7), (8, 7),

(9, 7), (9, 8), (9, 8), (9, 10), (9, 11), (10, 12)

12

10

(11, 12))

deletion

replace

match

insertion

7/56

/ department of mathematics and computer science

Sequence Alignment Measures—Time Series

Let X and Y be sequences of elements from a metric space.

Dynamic time warping:The minimum summed costover all possible traversalsT = ((a1, b1), . . . , (at, bt)).

minT

∑16i6t

d(X[ai], Y[bi])

Discrete Fréchet distance:The minimum bottleneck-costover all possible traversalsT = ((a1, b1), . . . , (at, bt)).

minT

max16i6t

d(X[ai], Y[bi])

8/56

/ department of mathematics and computer science

Sequence Alignment Measures—Curves

Let X and Y be parametrized curves in IRd.

Continuous Fréchet distance:The minimum bottleneck-cost over allpossible monotone bijections f, g.

inff,g

sup16t60

‖X(f(t)) − Y(g(t))‖

9/56

/ department of mathematics and computer science

Computation of Sequence Alignments

I Basic quadratic-time algorithms:• Sequence alignment: simple dynamic programming• DTW, Discrete Fréchet: shortest path in a graph• Continuous Fréchet distance [Alt and Godau, 1995]

I Slightly below-quadratic time algorithms in word RAM model:• Discrete Fréchet distance [Agarwal et al., 2014]

• O(n2 log logn/ logn) time• Continuous Fréchet distance [Buchin et al., 2014a]

• O(n2(log logn)2/ logn) time (decision problem)

I ..many heuristics/ filtering techniques:• Dynamic time warping [Keogh and Ratanamahatana, 2005]• Edit distance [Gravano et al., 2001] (see also [Chen et al., 2005])

9/56

/ department of mathematics and computer science

Computation of Sequence Alignments

I Basic quadratic-time algorithms:• Sequence alignment: simple dynamic programming• DTW, Discrete Fréchet: shortest path in a graph• Continuous Fréchet distance [Alt and Godau, 1995]

I Slightly below-quadratic time algorithms in word RAM model:• Discrete Fréchet distance [Agarwal et al., 2014]

• O(n2 log logn/ logn) time• Continuous Fréchet distance [Buchin et al., 2014a]

• O(n2(log logn)2/ logn) time (decision problem)

I ..many heuristics/ filtering techniques:• Dynamic time warping [Keogh and Ratanamahatana, 2005]• Edit distance [Gravano et al., 2001] (see also [Chen et al., 2005])

9/56

/ department of mathematics and computer science

Computation of Sequence Alignments

I Basic quadratic-time algorithms:• Sequence alignment: simple dynamic programming• DTW, Discrete Fréchet: shortest path in a graph• Continuous Fréchet distance [Alt and Godau, 1995]

I Slightly below-quadratic time algorithms in word RAM model:• Discrete Fréchet distance [Agarwal et al., 2014]

• O(n2 log logn/ logn) time• Continuous Fréchet distance [Buchin et al., 2014a]

• O(n2(log logn)2/ logn) time (decision problem)

I ..many heuristics/ filtering techniques:• Dynamic time warping [Keogh and Ratanamahatana, 2005]• Edit distance [Gravano et al., 2001] (see also [Chen et al., 2005])

10/56

/ department of mathematics and computer science

Overview of this talk

I Sequence Alignment MeasuresI Quadratic lower bounds based on SETHI Strongly subquadratic-time algorithms under input assumptionsI Partial distance measures based on Fréchet distanceI Data structures for Fréchet queriesI Median curves and clustering under the Fréchet distance

11/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

Strong Exponential Time Hypothesis (SETH):For no ε > 0, there exists an algorithm for k-SAT that runs in timeO(2(1−ε)N), for any k > 3, whereN is the number of variables.

I If SETH is true, then the following alignment measures cannot be computed intimeO(n2−ε) for some fixed ε > 0 ("strongly subquadratic" time).

• Fréchet distance [Bringmann, 2014, Bringmann and Mulzer, 2016]• Edit distance [Backurs and Indyk, 2015]• Dynamic time warping [Abboud et al., 2015, Bringmann and Künnemann, 2015b]• LCS [Abboud et al., 2015, Bringmann and Künnemann, 2015b]

11/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

Strong Exponential Time Hypothesis (SETH):For no ε > 0, there exists an algorithm for k-SAT that runs in timeO(2(1−ε)N), for any k > 3, whereN is the number of variables.

I If SETH is true, then the following alignment measures cannot be computed intimeO(n2−ε) for some fixed ε > 0 ("strongly subquadratic" time).

• Fréchet distance [Bringmann, 2014, Bringmann and Mulzer, 2016]• Edit distance [Backurs and Indyk, 2015]• Dynamic time warping [Abboud et al., 2015, Bringmann and Künnemann, 2015b]• LCS [Abboud et al., 2015, Bringmann and Künnemann, 2015b]

12/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

k-SATN variables,M clauses

OrthogonalVectorssets of vectors ofsize |A| = |B| = n =2N/2 and dimensiond =M

→SequencealignmentSequences oflengthO(nd)

SETH is false ←Algorithm withrunning timeO(2(1−ε

′)N), ε ′ = ε2

←Assume Algorithmwith running timeO((nd)2−ε) forsome fixed ε > 0.

12/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

k-SATN variables,M clauses

OrthogonalVectorssets of vectors ofsize |A| = |B| = n =2N/2 and dimensiond =M

→SequencealignmentSequences oflengthO(nd)

SETH is false ←Algorithm withrunning timeO(2(1−ε

′)N), ε ′ = ε2

←Assume Algorithmwith running timeO((nd)2−ε) forsome fixed ε > 0.

13/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

I k-SAT Problem:• GivenN variables x1, . . . , xN andM CNF-clauses• Does there exist a truth assignment that satisfies all clauses?

I Orthogonal Vectors Problem:• Given sets of vectorsA and B from 0, 1d

• Does there exist a pair ai ∈ A, bj ∈ B with 〈ai, bj〉 = 0?I Reduction [Williams, 2004]:

• Split the set of variables and consider 2N/2 truth assignments• Record for each truth assignment z:

• az[i] = 0 iff z applied to x1, . . . , xN/2 satisfies ith clause• bz[i] = 0 iff z applied to xN/2+1, . . . , xN satisfies ith clause

• Yields two sets of vectorsA and B with size 2N/2 and dimensionM

13/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

I k-SAT Problem:• GivenN variables x1, . . . , xN andM CNF-clauses• Does there exist a truth assignment that satisfies all clauses?

I Orthogonal Vectors Problem:• Given sets of vectorsA and B from 0, 1d

• Does there exist a pair ai ∈ A, bj ∈ B with 〈ai, bj〉 = 0?

I Reduction [Williams, 2004]:• Split the set of variables and consider 2N/2 truth assignments• Record for each truth assignment z:

• az[i] = 0 iff z applied to x1, . . . , xN/2 satisfies ith clause• bz[i] = 0 iff z applied to xN/2+1, . . . , xN satisfies ith clause

• Yields two sets of vectorsA and B with size 2N/2 and dimensionM

13/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

I k-SAT Problem:• GivenN variables x1, . . . , xN andM CNF-clauses• Does there exist a truth assignment that satisfies all clauses?

I Orthogonal Vectors Problem:• Given sets of vectorsA and B from 0, 1d

• Does there exist a pair ai ∈ A, bj ∈ B with 〈ai, bj〉 = 0?I Reduction [Williams, 2004]:

• Split the set of variables and consider 2N/2 truth assignments• Record for each truth assignment z:

• az[i] = 0 iff z applied to x1, . . . , xN/2 satisfies ith clause• bz[i] = 0 iff z applied to xN/2+1, . . . , xN satisfies ith clause

• Yields two sets of vectorsA and B with size 2N/2 and dimensionM

14/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

I Reduction to Sequence Alignment [Bringmann and Künnemann, 2015b]• Let δ be a sequence alignment measure• Creates sequence X from setA and sequence Y from set B• Coordinate gadgets: encode 0X, 1X, 0Y , 1Y as substring s.t. ∃ρ0, ρ1 :

• ρ0 = δ(0X, 1Y) = δ(1X, 0Y) = δ(0X, 0Y)• ρ1 = δ(1X, 1Y)• ρ0 < ρ1

• Alignment gadgets: force optimal matching to use substrings atomically• Example encoding ai = 0101 and bj = 1010

0X 1X 0X 1X 0X 0X 0X 0X 0X 1X

0Y 1Y 0Y1Y 1Y

14/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

I Reduction to Sequence Alignment [Bringmann and Künnemann, 2015b]• Let δ be a sequence alignment measure• Creates sequence X from setA and sequence Y from set B• Coordinate gadgets: encode 0X, 1X, 0Y , 1Y as substring s.t. ∃ρ0, ρ1 :

• ρ0 = δ(0X, 1Y) = δ(1X, 0Y) = δ(0X, 0Y)• ρ1 = δ(1X, 1Y)• ρ0 < ρ1

• Alignment gadgets: force optimal matching to use substrings atomically• Example encoding ai = 0101 and bj = 1010

0X 1X 0X 1X 0X 0X 0X 0X 0X 1X

0Y 1Y 0Y1Y 1Y

(d+ 1) · ρ0

14/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

I Reduction to Sequence Alignment [Bringmann and Künnemann, 2015b]• Let δ be a sequence alignment measure• Creates sequence X from setA and sequence Y from set B• Coordinate gadgets: encode 0X, 1X, 0Y , 1Y as substring s.t. ∃ρ0, ρ1 :

• ρ0 = δ(0X, 1Y) = δ(1X, 0Y) = δ(0X, 0Y)• ρ1 = δ(1X, 1Y)• ρ0 < ρ1

• Alignment gadgets: force optimal matching to use substrings atomically• Example encoding ai = 0101 and bj = 1010

0X 1X 0X 1X 0X 0X 0X 0X 0X 1X

0Y 1Y 0Y1Y 1Y

d · ρ0 + ρ1

14/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

I Reduction to Sequence Alignment [Bringmann and Künnemann, 2015b]• Let δ be a sequence alignment measure• Creates sequence X from setA and sequence Y from set B• Coordinate gadgets: encode 0X, 1X, 0Y , 1Y as substring s.t. ∃ρ0, ρ1 :

• ρ0 = δ(0X, 1Y) = δ(1X, 0Y) = δ(0X, 0Y)• ρ1 = δ(1X, 1Y)• ρ0 < ρ1

• Alignment gadgets: force optimal matching to use substrings atomically• Example encoding ai = 0101 and bj = 1010

0X 1X 0X 1X 0X 0X 0X 0X 0X 1X

0Y 1Y 0Y1Y 1Y

non-atomic matching

15/56

/ department of mathematics and computer science

SETH-Hardness of alignment problems

“This edit distance is something that I’ve been trying to get betteralgorithms for since I was a graduate student, in the mid-’90s. I certainlyspent lots of late nights on that — without any progress whatsoever. So atleast now there’s a feeling of closure. The problem can be put to sleep.”Piotr Indyk (MIT News, June 10, 2015)

16/56

/ department of mathematics and computer science

Overview of this talk

I Sequence Alignment MeasuresI Quadratic lower bounds based on SETHI Strongly subquadratic-time algorithms under input assumptionsI Partial distance measures based on Fréchet distanceI Data structures for Fréchet queriesI Median curves and clustering under the Fréchet distance

17/56

/ department of mathematics and computer science

Overview of this talk

I Sequence Alignment MeasuresI Quadratic lower bounds based on SETHI Strongly subquadratic-time algorithms under input assumptions

• Realistic Input Models• Useful Tools and Techniques

• Distance matrix• Free space diagram• Curve simplification• Well-separated pair decomposition

• (1+ ε)-Approximation Algorithms

I Partial distance measures based on Fréchet distanceI Data structures for Fréchet queriesI Median curves and clustering under the Fréchet distance

18/56

/ department of mathematics and computer science

Zoo of Realistic Input Models

Type of curve Definition Illustration

κ-straight[Alt et al., 2004]

For any two points p, q on X:|X[p, q]| 6 κ‖p− q‖

p

p

X

κ-bounded[Alt et al., 2004]

For any two points p, q on X:X[p, q] ⊆ b(p, r) ∪ b(q, r)where r = κ‖p− q‖/2

p

p

X

r

backbone curves[Aronov et al., 2006]

For any two vertices v, u of X:‖v − u‖ > 1 and any edge of X haslength in [c1, c2] for constants c1, c2

κ-packed curves[Driemel et al., 2012]

For any ball b(p, r):|X[p, q] ∩ b(p, q)| 6 κr X

r

p

κ-low-density curves[de Berg et al., 2002]

The number of edges of X intersectingany smaller ball is at most κ

r

19/56

/ department of mathematics and computer science

Zoo of Realistic Input Models

Idea of κ-packed curves: Capture natural families of curves

19/56

/ department of mathematics and computer science

Zoo of Realistic Input Models

Idea of κ-packed curves: Capture natural families of curves

20/56

/ department of mathematics and computer science

Tools: Distance Matrix

I Two sequences X and YI Matrix entry δi,j stores distance

between ith element of X and jthelement of Y

I Traversal of X and Y corresponds tomonotone path through entries

I DTW = shortest path invertex-weighted graph

X

Y

xi

yj

δij

i

j δij

20/56

/ department of mathematics and computer science

Tools: Distance Matrix

I Two sequences X and YI Matrix entry δi,j stores distance

between ith element of X and jthelement of Y

I Traversal of X and Y corresponds tomonotone path through entries

I DTW = shortest path invertex-weighted graph

X

Y

20/56

/ department of mathematics and computer science

Tools: Distance Matrix

I Two sequences X and YI Matrix entry δi,j stores distance

between ith element of X and jthelement of Y

I Traversal of X and Y corresponds tomonotone path through entries

I DTW = shortest path invertex-weighted graph

X

Y

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

X

Y

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

X

Y

px

py

X(px)

Y(py)

p

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

X

Y

≤ δ

X(px)

Y(py)

p

px

py

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

X

Y

≤ δ

X(px)

Y(py)

p

px

py

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

X

Y

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

δ

X

Y

π

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

δ

X

Y

π

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

δ

X

Y

π

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

δ

X

Y

π

21/56

/ department of mathematics and computer science

Tools: Free space diagram

Due to [Alt and Godau, 1995]:I Parametric space of X and YI δ-free space = Pairs of points on X

and Y which are within distance δI Free space complexityN6δ:

Number of non-empty cellsI Fréchet distance d(X, Y) 6 δ

iff there exists a montone path πI Testing for π takesO(N6δ) time

using a BFS on the reachable cells

X

Y

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

δ

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

δ

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

δ

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

δ

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

δ

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

δ

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

δ

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

δ

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

22/56

/ department of mathematics and computer science

Tools: Curve Simplification

I Many more sophisticated algorithms:[Godau, 1991, Agarwal et al., 2005a, Abam et al., 2010, Agarwal et al., 2005b]

23/56

/ department of mathematics and computer science

Tools: Approximate Pairwise Distances

I s-well-separated pair• A and B are well-separated, iff there exist

two balls b(pA, r) and b(pB, r):(i)A ⊆ b(pA, r)(ii) B ⊆ b(pB, r)(iii) ‖pA − pB‖ > (s+ 2)r

I s-well-separated pair decomposition (WSPD)• For any set of points P one can compute a set

of pairs (A1, B1), . . . , (Am, Bm):(i) (Ai, Bi) are s-well-separated(ii) any p, q ∈ P covered by unique (Ai, Bi).

• Can be computed withm = O(sdn) in timeO(n logn) [Callahan and Kosaraju, 1995]

≥ sr

r

r

A

B

23/56

/ department of mathematics and computer science

Tools: Approximate Pairwise Distances

I s-well-separated pair• A and B are well-separated, iff there exist

two balls b(pA, r) and b(pB, r):(i)A ⊆ b(pA, r)(ii) B ⊆ b(pB, r)(iii) ‖pA − pB‖ > (s+ 2)r

I s-well-separated pair decomposition (WSPD)• For any set of points P one can compute a set

of pairs (A1, B1), . . . , (Am, Bm):(i) (Ai, Bi) are s-well-separated(ii) any p, q ∈ P covered by unique (Ai, Bi).

• Can be computed withm = O(sdn) in timeO(n logn) [Callahan and Kosaraju, 1995]

≥ sr

r

r

A

B

24/56

/ department of mathematics and computer science

(1+ ε)-Approximation—Fréchet distance

I Packing lemma from [Driemel et al., 2012]• Consider the pairs of edges that are within

distance δ to each other• We charge every pair to the shorter edge• Let u be such a shorter edge• Assume that ‖u‖ > εδ• We bound the charges to u:

‖b ∩ Y‖‖u‖ 6

κr

‖u‖ 6 O(κε

)

where r = 32‖u‖+ δ

• In totalO(κn/ε) pairs of edges that containpotential matches

X

Y

u

δ

b

24/56

/ department of mathematics and computer science

(1+ ε)-Approximation—Fréchet distance

I Packing lemma from [Driemel et al., 2012]• Consider the pairs of edges that are within

distance δ to each other• We charge every pair to the shorter edge• Let u be such a shorter edge• Assume that ‖u‖ > εδ• We bound the charges to u:

‖b ∩ Y‖‖u‖ 6

κr

‖u‖ 6 O(κε

)

where r = 32‖u‖+ δ

• In totalO(κn/ε) pairs of edges that containpotential matches

X

Y

u

δ

b

‖u‖

25/56

/ department of mathematics and computer science

(1+ ε)-Approximation—Fréchet distance

I Decision Algorithm:• Simplify the curves with parameter εδ• Compute reachability in Free space

diagram to answer either:(i) d(X, Y) 6 (1+ ε)δ(ii) d(X, Y) > δ

• By the packing lemma, this takesO(κn/ε) time

I Main Algorithm:• Compute candidate values using

WSPD on vertices• Binary search using decision

algorithm

π

σ

≤ µ

dF(π′, σ′)

≤ µ

π′

σ′

dF(π, σ)

26/56

/ department of mathematics and computer science

(1+ ε)-Approximation—Fréchet distance

I Result for various realistic input models [Driemel et al., 2012]

Type of curve Running time

κ-packed O(κn/ε+ κn logn)

κ-straight Same as 2κ-packed

κ-bounded O((κ/ε)dn+ κdn logn

)

O(1)-low density O(n2(d−1)/d

ε2+ n2(d−1)/d logn

)

c-packed & closed O(c2n/ε2 + c2n logn

)

I Slightly faster by factor 1√ε

: [Bringmann and Künnemann, 2015a]

I For κ-packed curves this is optimal unless SETH is false [Bringmann, 2014]I Previous similar approach [Aronov et al., 2006]

27/56

/ department of mathematics and computer science

(1+ ε)-Approximation—DTW

Due to [Agarwal et al., 2016]:I ’Plateau’ rectangles

• Decomposition of curves based on WSPD• Uniform δij inside a rectangle

I Algorithm• Truncate WSPD-quadtree usingn-approximation

• YieldsO(logn) levels of the tree• Cover optimal path with ’plateau’ rectangles• Compute paths ending on rectangle

boundaries only

i1 i2 i3 i4

j1

j2

X

Y

28/56

/ department of mathematics and computer science

(1+ ε)-Approximation—DTW

I Result for various realistic input models [Agarwal et al., 2016]

Type of curve Running time

κ-packed O((κn/ε) logn)

κ-bounded O((κ/ε)dn logn

)

backbone sequences O(ε−1n(2−(1/d)) logn

)

I Algorithm can be extended to version of Edit distance:

minC

∑16i6t

d(X[ai], Y[bi]) + g(m+ n− 2|C|)

where C is a subset of a traversal and g(·) is a gap penalty

29/56

/ department of mathematics and computer science

Overview of this talk

I Sequence Alignment MeasuresI Quadratic lower bounds based on SETHI Strongly subquadratic-time algorithms under input assumptionsI Partial distance measures based on Fréchet distanceI Data structures for Fréchet queriesI Median curves and clustering under the Fréchet distance

30/56

/ department of mathematics and computer science

Partial distance measures for curves

30/56

/ department of mathematics and computer science

Partial distance measures for curves

X

Y

30/56

/ department of mathematics and computer science

Partial distance measures for curves

X

Y

30/56

/ department of mathematics and computer science

Partial distance measures for curves

X

Y

31/56

/ department of mathematics and computer science

Partial distance measures for curves

Partial MatchingGiven two polygonal curves X and Y and a parameter δ > 0, does there exista contiguous part X of X, such that the Fréchet distance between X and Y isat most δ.

Y

X

32/56

/ department of mathematics and computer science

Partial distance measures for curves

Partial AlignmentGiven two polygonal curves X and Y and a parameter δ > 0, maximize thelength of subcurves of X and Y that are within distance δ to each otherunder the Fréchet distance.

X

Y

32/56

/ department of mathematics and computer science

Partial distance measures for curves

Partial AlignmentGiven two polygonal curves X and Y and a parameter δ > 0, maximize thelength of subcurves of X and Y that are within distance δ to each otherunder the Fréchet distance.

X

Y

33/56

/ department of mathematics and computer science

Partial distance measures for curves

k-Shortcut-Fréchet distanceGiven two polygonal curves X and Y and an integer parameter k, computethe minimal Fréchet distance between Y and any k-shortcut curve of X.

k-Shortcut-CurveX becomes a k shortcut curve of X by replacing at most k disjoint subcurvesby straight-line edges (shortcuts).

X

X

34/56

/ department of mathematics and computer science

Partial distance measures for curves

I Partial Matching• O(n2) time [Maheshwari et al., 2014]• Approach: ’free space map’

I Partial Alignment• O(n3 logn)-time exact algorithms for L1 and L∞ [Buchin et al., 2009]• O(n3/ε log(n/ε))-time numerical approximation for L2 [Maheshwari et al., 2014]• Exact computation not possible for L2 in the Algebraic Computation Model overthe Rational Numbers (ACMQ) [Maheshwari et al., 2014]

35/56

/ department of mathematics and computer science

Partial distance measures for curves

I k-Shortcut-Fréchet distance• Vertex-shortcutting can be (3+ ε)-approximated [Driemel and Har-Peled, 2013]

• κ-packed curves with at most k shortcuts: O(κ2kn log3 n)• κ-packed curves with unbounded k: O(κ2n log3 n)

• Discrete Fréchet distance [Avraham et al., 2015]• Shortcuts on one curve only: O((n)6/5+ε)-time (randomized) algorithm• Shortcuts on both curves: O(n4/3 log3 n)-time algorithm

• Continuous shortcutting is NP-hard [Buchin et al., 2014b]

36/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Find a path π in the parametric space:

I π has to stay inside the δ-free spaceI π has to be monotoneI π can use (affordable) shortcuts between

vertices of X

Definition (Tunnels and Gates)I Tunnel(p,q) matches a shortcut X[px, qx] to a

subcurve Y[py, qy]I Gates are the endpoints of a tunnelI Price is the Fréchet distance

Y

X

36/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Find a path π in the parametric space:

I π has to stay inside the δ-free spaceI π has to be monotoneI π can use (affordable) shortcuts between

vertices of X

Definition (Tunnels and Gates)I Tunnel(p,q) matches a shortcut X[px, qx] to a

subcurve Y[py, qy]I Gates are the endpoints of a tunnelI Price is the Fréchet distance

Y

X

π

36/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Find a path π in the parametric space:

I π has to stay inside the δ-free spaceI π has to be monotoneI π can use (affordable) shortcuts between

vertices of X

Definition (Tunnels and Gates)I Tunnel(p,q) matches a shortcut X[px, qx] to a

subcurve Y[py, qy]I Gates are the endpoints of a tunnelI Price is the Fréchet distance

p

q

Shortcut

Y

X

π

36/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Find a path π in the parametric space:

I π has to stay inside the δ-free spaceI π has to be monotoneI π can use (affordable) shortcuts between

vertices of X

Definition (Tunnels and Gates)I Tunnel(p,q) matches a shortcut X[px, qx] to a

subcurve Y[py, qy]I Gates are the endpoints of a tunnelI Price is the Fréchet distance

p

q

Tunnel

Gates

Shortcut

Y

X

Subcurve

π

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

dF(nodei)

nodei

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

Y[py, qy]

X[px, qx]

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

Y[py, qy]

X[px, qx]

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

Y[py, qy]

X[px, qx]

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

dF(nodei)

nodei

Y[py, qy]

X[px, qx]

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

37/56

/ department of mathematics and computer science

Shortcut Fréchet distance

Preprocessing:

I Balanced binary tree on edges of YI Every node: a subcurve of YI Store the Fréchet distance to the

shortcut in each node

Price of Tunnel(p,q):

I Find decomposition of subcurveY[py, qy] of sizem = O(logn)

I Compute Fréchet distance ∆ toX[px, qx] inO(m logm) time

I Return ∆+ max06i6m dF(nodei)

Y

dF(nodei)

nodei

Y[py, qy]

X[px, qx]

⇒ 3-approximation, more work to achieve (1+ ε)-approximation

38/56

/ department of mathematics and computer science

Overview of this talk

I Sequence Alignment MeasuresI Quadratic lower bounds based on SETHI Strongly subquadratic-time algorithms under input assumptionsI Partial distance measures based on Fréchet distanceI Data structures for Fréchet queriesI Median curves and clustering under the Fréchet distance

39/56

/ department of mathematics and computer science

Data structures for Fréchet queries

Task: Preprocess a set of curves to answer ...I Nearest-Neighbor queries

• returns the curve closest to a query curveI Range queries

• returns all curves within certain distance toa query curve

I Range counting queries• returns the number of curves within certain

distance to a query curveI Distance queries

• returns the distance to a query curve

VariantsI ApproximationI Probabilistic (e.g, LSH)I Preprocess ’all subcurves

of a given curve’I Preprocess ’all paths in a

given graph’

40/56

/ department of mathematics and computer science

Data structures for Fréchet queries

I Approximate-Nearest-Neighbor data structure [Indyk, 2002]:• for the discrete Fréchet distance• Performance

• Input: n sequences of maximum lengthm• ApproximationO((logn log logm)t−1) for any t > 1• Query timeO((m+ logn)O(t))• Space exponential in tm1/t

• Approach• Use ∞-product metric on subcurves• Store each input curve using different partitions• Optimal partition is used by the nearest neighbor• Defers alignment problem into the data structure

41/56

/ department of mathematics and computer science

Data structures for Fréchet queries

I Approximate range counting data structure [de Berg et al., 2013]:• Preprocesses all subcurves of a given curve• Range counting• Queries with line segments• Performance

• Approximation factor 2+ 3√2

• Query timeO( n√s

polylogn)• SpaceO(s polylogn)• For any n 6 s 6 n2

• Approach• Simplify the curve with range parameter ε• Organize all subcurves of 1,2, and 3 edges

in partition tree [Chan, 2012]• Use filtering to allow variable range size ε

Anne Driemel, TU Dortmund

1

41/56

/ department of mathematics and computer science

Data structures for Fréchet queries

I Approximate range counting data structure [de Berg et al., 2013]:• Preprocesses all subcurves of a given curve• Range counting• Queries with line segments• Performance

• Approximation factor 2+ 3√2

• Query timeO( n√s

polylogn)• SpaceO(s polylogn)• For any n 6 s 6 n2

• Approach• Simplify the curve with range parameter ε• Organize all subcurves of 1,2, and 3 edges

in partition tree [Chan, 2012]• Use filtering to allow variable range size ε

Anne Driemel, TU Dortmund

1

41/56

/ department of mathematics and computer science

Data structures for Fréchet queries

I Approximate range counting data structure [de Berg et al., 2013]:• Preprocesses all subcurves of a given curve• Range counting• Queries with line segments• Performance

• Approximation factor 2+ 3√2

• Query timeO( n√s

polylogn)• SpaceO(s polylogn)• For any n 6 s 6 n2

• Approach• Simplify the curve with range parameter ε• Organize all subcurves of 1,2, and 3 edges

in partition tree [Chan, 2012]• Use filtering to allow variable range size ε

Anne Driemel, TU Dortmund

1

42/56

/ department of mathematics and computer science

Data structures for Fréchet queries

I Approximate range searching data structure [Gudmundsson and Smid, 2015]:• Preprocess all paths of a geometric tree• Unit range testing• Assumptions: tree is κ-packed, edge lengthΩ(1)• Queries with line segments

• Approximation factor√2+ ε

• Query timeO(polylogn)• SpaceO(n polylogn)

• Queries with polygonal curves of size k• Approximation factor 3• Query timeO(k polylogn)• SpaceO(n polylogn)

• Approach• Path decomposition of tree [Cole and Vishkin, 1988]• Build data structure of [Driemel and Har-Peled, 2013] for each path

43/56

/ department of mathematics and computer science

Data structures for Fréchet queries

I Distance-queries data structure [Driemel and Har-Peled, 2013]:• Queries with line segments

• Approximation factor (1+ ε)• Query timeO(logn log logn)• SpaceO(n)

• Queries with polygonal curves of size k• Approximation factor 23• Query timeO(k2 logn log(k logn))• SpaceO(n logn)

• Used to determine prices of shortcuts

44/56

/ department of mathematics and computer science

Data structures for Fréchet queries

Rangequeries

Distancequeries

NNqueries

polygonal curve geometric tree

approx-factor: (1 + ε)q-time: O(log n loglog n)

unit-range testing:(*)approx-factor: (

√2 + ε)

q-time: O(polylog n)

range counting:approx-factor: (2 + 3

√2)

q-time: O(

n√spolylog n

)

Query

polygonalcurve

polygonalcurve

linesegment

linesegment

Input

approx-factor: 23O(|Q|2 polylog(|Q|n)

)

unit-range testing:(*)approx-factor: 3

q-time: O (|Q| polylog n)

Type ofQuery

(*) using input assumptions

set of curves

approx-factor:O((log n log logm)(t−1))

q-time: O((m+ log n)O(t)

)

for any t > 1

45/56

/ department of mathematics and computer science

Overview of this talk

I Sequence Alignment MeasuresI Quadratic lower bounds based on SETHI Strongly subquadratic-time algorithms under input assumptionsI Partial distance measures based on Fréchet distanceI Data structures for Fréchet queriesI Median curves and clustering under the Fréchet distance

46/56

/ department of mathematics and computer science

Median curves and clustering

Task: Summarize a set of curvesI Common alignment of k curves

• Find k reparametrizations that minimize the k-wise Fréchet distanceI Median curve of k curves

• Walking a pack of k dogs: Find optimal path of the dog walkerI Clustering of n curves

• Walking a pack of n dogs with a group of k dog walkers: Find k optimal paths andassignment of dogs to walkers

I Subcurve clustering• Find disjoint subcurves that together form a re-occurring pattern

47/56

/ department of mathematics and computer science

Median curves and clustering

I Median curve of k curves• Walking a pack of k dogs: Find optimal path

of the dog walker• Minimum enclosing ball in metric space• Dynamic programming inO(nk) time and

space for curves of length n• Studied for different distance measures:

• Continuous Fréchet distance[Dumitrescu and Rote, 2004]

• Discrete Fréchet distance [Ahn et al., 2016]• Weak Fréchet distance

[Har-Peled and Raichel, 2013]• Dynamic time warping

[Petitjean and Gançarski, 2012]

VariantsI min sum of distances

(Fermat-Weber problem)

I constraints on center curve

• vertices of the input• complexity

48/56

/ department of mathematics and computer science

Median curves and clustering

I Clustering of n curves• Walking a pack of n dogs with a group of k dog walkers: Find k optimal paths and

assignment of dogs to walkers• First studied by [Driemel et al., 2016]

• Constrain each center curve to use at most ` vertices• (k, `)-center: minimize maximum Fréchet distance to closest center• (k, `)-median: minimize sum of Fréchet distances to closest center

• Univariate curves:• (1+ ε)-approximation inO(nm logm) time (ignoring dependencies on k, `, ε)• m denotes the maximal length of the curves

• Curves in any dimension:• (k, `)-center: 8-approximation inO(knm` log(m`)) time• (k, `)-median: 65-approximation inO((nk+ k7 log5 n)m` log(m`)) time

• Sampling property for (1, `)-median (see also [Ackermann et al., 2010])

49/56

/ department of mathematics and computer science

Median curves and clustering

I Subcurve clustering• Find disjoint subcurves that together form a re-occurring pattern• First studied by [Buchin et al., 2011]• Does there exists a set of subcurves S with:

• size of set |S| > k• length of a representative is at least `• common Fréchet distance at most ε

• 2-approximation algorithmO(`n2) time andO(n`2) space• GPU Implementation [Gudmundsson and Valladares, 2015]

50/56

/ department of mathematics and computer science

Summary

I Sequence Alignment Measures• for Strings: LCS, Edit distance• for Time Series: Dynamic time warping, Discrete Fréchet distance• for Curves: Continuous Fréchet distance

I Quadratic lower bounds based on SETHI Strongly subquadratic-time algorithms under input assumptionsI Partial distance measures based on Fréchet distanceI Data structures for Fréchet queriesI Median curves and clustering under the Fréchet distance

51/56

/ department of mathematics and computer science

References IAbam, M., de Berg, M., Hachenberger, P., and Zarei, A. (2010).

Streaming algorithms for line simplification.Discrete & Computational Geometry, 43:497–515.

Abboud, A., Backurs, A., and Williams, V. V. (2015).

Tight hardness results for LCS and other sequence similarity measures.In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59–78.

Ackermann, M. R., Blömer, J., and Sohler, C. (2010).

Clustering for metric and nonmetric distance measures.ACM Trans. Algorithms, 6(4):59:1–59:26.

Agarwal, P. K., Avraham, R. B., Kaplan, H., and Sharir, M. (2014).

Computing the discrete fréchet distance in subquadratic time.SIAM J. Comput., 43(2):429–449.

Agarwal, P. K., Fox, K., Pan, J., and Ying, R. (2016).

Approximating dynamic time warping and edit distance for a pair of point sequences.In SOCG.to appear.

Agarwal, P. K., Har-Peled, S., Mustafa, N., and Wang, Y. (2005a).

Near-linear time approximation algorithms for curve simplification in two and three dimensions.Algorithmica, 42:203–219.

Agarwal, P. K., Har-Peled, S., Mustafa, N. H., and Wang, Y. (2005b).

Near-linear time approximation algorithms for curve simplification.Algorithmica, 42:203–219.

Ahn, H., Alt, H., Buchin, M., Oh, E., Scharf, L., and Wenk, C. (2016).

A middle curve based on discrete fréchet distance.In LATIN 2016: Theoretical Informatics - 12th Latin American Symposium, Ensenada, Mexico, April 11-15, 2016, Proceedings, pages 14–26.

52/56

/ department of mathematics and computer science

References IIAlt, H. and Godau, M. (1995).

Computing the Fréchet distance between two polygonal curves.International Journal of Computational Geometry & Applications, 5:75–91.

Alt, H., Knauer, C., and Wenk, C. (2004).

Comparison of distance measures for planar curves.Algorithmica, 38(1):45–58.

Aronov, B., Har-Peled, S., Knauer, C., Wang, Y., and Wenk, C. (2006).

Fréchet distance for curves, Revisited.In Proceedings of the 14th Annual European Symposium on Algorithms, pages 52–63.

Avraham, R. B., Filtser, O., Kaplan, H., Katz, M. J., and Sharir, M. (2015).

The discrete and semicontinuous fréchet distance with shortcuts via approximate distance counting and selection.ACM Trans. Algorithms, 11(4):29.

Backurs, A. and Indyk, P. (2015).

Edit distance cannot be computed in strongly subquadratic time (unless seth is false).In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC ’15, pages 51–58, New York, NY, USA. ACM.

Bringmann, K. (2014).

Why walking the dog takes time: Frechet distance has no strongly subquadratic algorithms unless SETH fails.In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 661–670.

Bringmann, K. and Künnemann, M. (2015a).

Improved approximation for fréchet distance on c-packed curves matching conditional lower bounds.In Algorithms and Computation - 26th International Symposium, ISAAC 2015, Nagoya, Japan, December 9-11, 2015, Proceedings, pages 517–528.

Bringmann, K. and Künnemann, M. (2015b).

Quadratic conditional lower bounds for string problems and dynamic time warping.In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 79–97.

53/56

/ department of mathematics and computer science

References IIIBringmann, K. and Mulzer, W. (2016).

Approximability of the discrete fréchet distance.JoCG, 7(2):46–76.

Buchin, K., Buchin, M., Gudmundsson, J., Löffler, M., and Luo, J. (2011).

Detecting commuting patterns by clustering subtrajectories.International Journal of Computational Geometry & Applications, 21(03):253–282.

Buchin, K., Buchin, M., Meulemans, W., and Mulzer, W. (2014a).

Four soviets walk the dog: With an application to alt’s conjecture.In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’14, pages 1399–1413. SIAM.

Buchin, K., Buchin, M., and Wang, Y. (2009).

Exact algorithms for partial curve matching via the Fréchet distance.In Proceedings of the 20th ACM-SIAM Symposium on Discrete Algorithms, pages 645–654.

Buchin, M., Driemel, A., and Speckmann, B. (2014b).

Computing the fréchet distance with shortcuts is np-hard.In 30th Annual Symposium on Computational Geometry, SOCG’14, Kyoto, Japan, June 08 - 11, 2014, page 367.

Callahan, P. B. and Kosaraju, S. R. (1995).

A decomposition of multidimensional point sets with applications to k-nearest-neighbors andn-body potential fields.Journal of the Association for Computing Machinery, 42:67–90.

Chan, T. M. (2012).

Optimal partition trees.volume 47, pages 661–690.

Chen, L., Özsu, M. T., and Oria, V. (2005).

Robust and fast similarity search for moving object trajectories.In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD ’05, pages 491–502, New York, NY, USA. ACM.

54/56

/ department of mathematics and computer science

References IVCole, R. and Vishkin, U. (1988).

The accelerated centroid decomposition technique for optimal parallel tree evaluation in logarithmic time.Algorithmica, 3(1):329–346.

de Berg, M., Cook IV, A. F., and Gudmundsson, J. (2013).

Fast Fréchet queries.Computational Geometry: Theory and Applications, 46(6):747 – 755.

de Berg, M., Katz, M. J., Stappen, A. v., and Vleugels, J. (2002).

Realistic input models for geometric algorithms.Algorithmica, 34:81–97.

D’haeseleer, P. (2006).

What are dna sequence motifs?Nature biotechnology, 24(4):423–425.

Driemel, A. and Har-Peled, S. (2013).

Jaywalking your dog – computing the Fréchet distance with shortcuts.SIAM Journal of Computing, 42(5):1830–1866.

Driemel, A., Har-Peled, S., and Wenk, C. (2012).

Approximating the Fréchet distance for realistic curves in near-linear time.Discrete & Computational Geometry, 48(1):94–127.

Driemel, A., Krivosija, A., and Sohler, C. (2016).

Clustering time series under the fréchet distance.In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages766–785.

Dumitrescu, A. and Rote, G. (2004).

On the fréchet distance of a set of curves.In Proceedings of the 16th Canadian Conference on Computational Geometry, CCCG’04, Concordia University, Montréal, Québec, Canada, August 9-11, 2004,pages 162–165.

55/56

/ department of mathematics and computer science

References VGodau, M. (1991).

A natural metric for curves — computing the distance for polygonal chains and approximation algorithms.In STACS 91: 8th Annual Symposium on Theoretical Aspects of Computer Science Hamburg, Germany, February 14–16, 1991 Proceedings, pages 127–136.

Gravano, L., Ipeirotis, P. G., Jagadish, H. V., Koudas, N., Muthukrishnan, S., and Srivastava, D. (2001).

Approximate string joins in a database (almost) for free.In Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01, pages 491–500, San Francisco, CA, USA. Morgan Kaufmann PublishersInc.

Gudmundsson, J. and Smid, M. (2015).

Fast algorithms for approximate fréchet matching queries in geometric trees.Computational Geometry, 48(6):479 – 494.

Gudmundsson, J. and Valladares, N. (2015).

A GPU approach to subtrajectory clustering using the fréchet distance.IEEE Trans. Parallel Distrib. Syst., 26(4):924–937.

Har-Peled, S. and Raichel, B. (2013).

The Fréchet distance revisited and extended.ACM Transactions on Algorithms.To appear.

Indyk, P. (2002).

Approximate nearest neighbor algorithms for frechet distance via product metrics.In Proceedings of the Eighteenth Annual Symposium on Computational Geometry, SOCG ’02, pages 102–106, New York, NY, USA. ACM.

Keogh, E. and Ratanamahatana, C. A. (2005).

Exact indexing of dynamic time warping.Knowledge and information systems, 7(3):358–386.

Maheshwari, A., Sack, J., Shahbaz, K., and Zarrabi-Zadeh, H. (2014).

Improved algorithms for partial curve matching.Algorithmica, 69(3):641–657.

56/56

/ department of mathematics and computer science

References VI

Petitjean, F. and Gançarski, P. (2012).

Summarizing a set of time series by averaging: From steiner sequence to compact multiple alignment.Theoretical Computer Science, 414(1):76 – 91.

Williams, R. (2004).

A new algorithm for optimal constraint satisfaction and its implications.In Automata, Languages and Programming, pages 1227–1237. Springer.