lower bounds on streaming algorithms for approximating the length of the longest increasing...

Post on 26-Mar-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lower Bounds on Streaming Algorithms for Approximating the

Length of theLongest Increasing Subsequence.

Anna Gal UT Austin

Parikshit Gopalan U. Washington & UT Austin

Data Stream Model of Computation

X1 X2 X3 … XnInput

Storage

• Single pass.

• Small storage space, update time.

• Surprisingly powerful [Alon-Matias-Szegedy, …]

Estimated Sortedness on Data-Streams

Cannot sort efficiently.

Can we tell if the data needs to be sorted?[Ajtai-Jayram-Kumar-Sivakumar, Gupta-Zane,Cormode-Muthukrishnan-Sahinalp, LibenNowell-Vee-Zhu,Woodruff-Sun, G.-Jayram-Kumar-Sivakumar]

Measuring Sortedness: Length of Longest Increasing Subsequence. Ulam/Edit distance Inversion/Kendall Tau distance

LIS(): Length of Longest Increasing Subsequence.

5 7 8 1 4 2 10 3 6 9

Longest Increasing Subsequence

LIS(): Length of Longest Increasing Subsequence.

5 7 8 1 4 2 10 3 6 9

Studied in statistics, biology, computer science … [Gusfeld, Pevzner, Aldous-Diaconis…]

Longest Increasing Subsequence

Prior Work

• Exact Computation of LIS() : – Patience Sorting [Ross,Mallows]

O(n) space, 1-pass streaming algorithm.– n) space lower bound. [G.-Jayram-Krauthgamer-

Kumar’07, Woodruff-Sun’07]

• Approximating LIS() :– Deterministic, O(n/)1/2 space, (1 + )-approx. [G.-Jayram-Krauthgamer-Kumar’07]

Conjecture [GJKK]: Every 1-pass deterministic algorithm that gives a 1.1-approximation to LIS() requires √n) space.

Our Results

Thm: Any det. O(1)-pass algorithm that gives a (1 + ) approximation to the LIS requires space √(n/). • Tight bounds in n, .

• Proof via direct sum approach.

• Direct sum for maximum communication in the private messages model.

• Separation between communication models.

A Communication Problem

Consider the following problem:

• t players, t numbers each.

• Goal: Approximate length of the LIS.

• Enough to show a lower bound of (t) on maximum message size.

1.6 2.8 3.5 4.6

1.8 2.9 3.7 4.9

1 2 3.2 4.2

A Communication Problem

Consider the following problem –

• t players, t numbers each.

• Goal: Approximate length of the LIS.

• Enough to show a lower bound of (t) on maximum message size.

1.8 2.9 3.7 4.9

1.6 2.8 3.5 4.6

1.3 2.5 3.3 4.5

1 2 3.2 4.2

P1

P2

Pt

A Communication Problem

1.8 2.9 3.7 4.9

1.6 2.8 3.5 4.6

1.3 2.5 3.3 4.5

1 2 3.2 4.2

No Yes

1.7 2.8 3.4 4.8

1.6 2.6 3.5 4.6

1.3 2.5 3.1 4.5

1.1 2.1 3.9 4.2

P1

P2

Pt

[GJKK]: Consider the following decision problem –

1.8 2.9 3.7 4.9

1.6 2.8 3.5 4.6

1.3 2.5 3.3 4.5

1 2 3.2 4.2

1.7 2.8 3.4 4.8

1.6 2.6 3.5 4.6

1.3 2.5 3.1 4.5

1.1 2.1 3.9 4.2

No Yes

All columns non-increasing

P1

P2

Pt

A Communication Problem

[GJKK]: Consider the following decision problem –

1.8

2.9

3.7

4.9

1.6 2.8 3.5 4.6

1.3 2.5 3.3 4.5

1 2 3.2 4.2

1.7 2.8 3.4 4.8

1.6 2.6 3.5 4.6

1.3 2.5 3.1 4.5

1.1 2.1 3.9 4.2

No Yes

P1

P2

Pt

A Communication Problem

[GJKK]: Consider the following decision problem –

All columns non-increasing

1.8

2.9

3.7

4.9

1.6 2.8 3.5 4.6

1.3 2.5 3.3 4.5

1 2 3.2 4.2

1.7 2.8 3.4 4.8

1.6 2.6 3.5 4.6

1.3 2.5 3.1 4.5

1.1 2.1 3.9 4.2

No Yes

Some column increasing

P1

P2

Pt

A Communication Problem

[GJKK]: Consider the following decision problem –

All columns non-increasing

1.8

2.9

3.7

4.9

1.6 2.8 3.5 4.6

1.3 2.5 3.3 4.5

1 2 3.2 4.2

1.7

2.8

3.4

4.8

1.6 2.63.5

4.6

1.3 2.5 3.1 4.5

1.1 2.13.9

4.2

No Yes

Some column increasing

P1

P2

Pt

A Communication Problem

[GJKK]: Consider the following decision problem –

All columns non-increasing

Direct Sum Paradigm

x1 y1

p(x1, y1)

Primitive Problem:

Direct Sum Paradigm

x1,…,xn

y1,…,yn

Çi p(xi,yi)

Can run n copies of protocol for p.

Direct-Sum Question: Is this the best possible?

Set-Disjointness, Inner Product…

Techniques for proving direct-sum theorems:

[KN,CKSW,BJKS,SS…]

Direct Sum Problem:

Primitive Problem

0.7

0.5

0.3

0.2

0.4

0.5

0.4

0.9

No Yes

P1

P2

Pt

Direct Sum of Primitive Problems

0.7

0.5

0.3

0.2

No Yes

P1

P2

Pt

0.9

0.8

0.5

0.2

0.9

0.6

0.5

0.2

0.8

0.6

0.3

0.0

0.7

0.6

0.3

0.1

0.8

0.6

0.5

0.1

0.4

0.5

0.1

0.9

0.8

0.6

0.5

0.2

All No instances

Direct Sum of Primitive Problems

0.7

0.5

0.3

0.2

No Yes

P1

P2

Pt

0.9

0.8

0.5

0.2

0.9

0.6

0.5

0.2

0.8

0.6

0.3

0.0

0.7

0.6

0.3

0.1

0.8

0.6

0.5

0.1

0.4

0.5

0.1

0.9

0.8

0.6

0.5

0.2

All No instances

One Yes instance

Direct Sum of Primitive Problems

No Yes

P1

P2

Pt

0.8 0.9 0.7 0.9

0.6 0.8 0.5 0.6

0.3 0.5 0.3 0.5

0.0 0.2 0.2 0.2

0.7 0.8 0.4 0.9

0.6 0.6 0.5 0.6

0.3 0.5 0.1 0.5

0.1 0.1 0.9 0.2

1.8 2.9 3.7 4.9

1.8 2.9 3.7 4.9

1.8 2.9 3.7 4.9

1.8 2.9 3.7 4.9

1.7 2.8 3.4 4.8

1.7 2.8 3.5 4.8

1.7 2.8 3.4 4.8

1.7 2.8 3.9 4.8

No Yes

[GG] An Easier Problem

Hope: Some player distinguishes between many No instances.

BlackBoard Model of One-Way Communication

• Players speak in order.

• Every message seen by all.

• Last player outputs answer.

1.8

2.9

3.7

4.9

1.8

2.9

3.7

4.9

1.8

2.9

3.7

4.9

1.8

2.9

3.7

4.9

1.7

2.8

3.4

4.8

1.7

2.8

3.5

4.8

1.7

2.8

3.4

4.8

1.7

2.8

3.9

4.8

No Yes

Problem is Easy in the BlackBoard model

BlackBoard protocol with max. communication 2 log(m).

1.8

2.9

3.7

4.9

1.8

2.9

3.7

4.9

1.8

2.9

3.7

4.9

1.8

2.9

3.7

4.9

1.7

2.8

3.4

4.8

1.7

2.8

3.5

4.8

1.7

2.8

3.4

4.8

1.7

2.8

3.9

4.8

No Yes

Problem is Easy in the BlackBoard model

BlackBoard protocol with max. communication 2 log(m).

Private Messages Model

• Messages seen by next player only.

• Suffices for streaming lower bound.

• Requires non-standard techniques.

1.8 2.9 3.7 4.9

1.8 2.9 3.7 4.9

1.8 2.9 3.7 4.9

1.8 2.9 3.7 4.9

1.7 2.8 3.4 4.8

1.7 2.8 3.5 4.8

1.7 2.8 3.4 4.8

1.7 2.8 3.9 4.8

No Yes

Strong lower bound for maximum communication in the private messages model.

Thm: Any det. O(1)-pass algorithm that gives a (1 + ) approximation to the LIS requires space √(n/).

Private Messages Model

Separation between blackboard and private messages.

Proof Outline

• Step 1: Primitive Problem (one round).• Step 2: Direct-sum Problem (one-round).• Multi-round Protocols.

Primitive Problem

3.4

3.4

3.4

3.4

3.4

3.5

3.4

3.9

No Yes

P1

P2

Pt

Alphabet of size m > t. Yes Case: LIS() > t/2.

Easy: Bound of ≈ (log m)/t on max communication.

Thm: Max communication is at least log (m/t).

Lower Bound for Primitive Problem

a a a a

Pis message is specified by prefix x1…xi.

Mi(a): Prefixes where Pi sends the same message as a…a.

qi(a): Length of longest IS in Mi(a) ending below a.

a…a

x1…xi

aa…a a…a

Lower Bound for Primitive Problem

Mi(a): Inputs where Pi sends the same message as a…a.

qi(a): Length of longest IS in Mi(a) ending below a.

i

qi(a)

• Monotone

x1…xi 2 Mi(a) ) x1…xia 2 Mi+1(a)

• Bounded by t/2

Correctness.

a a a a

Lower Bound for Primitive Problem

Mi(a): Inputs where Pi sends the same message as a…a.

qi(a): Length of longest IS in Mi(a) ending below a.

i

qi(a)

Map a to first i s.t

qi-1(a) = qi(a).

Some i occurs m/t times.

a a a a

Lower Bound for Primitive Problem

Pi-1 Pi

x1 < … < xi-1 = a

Claim: Pi-1 must distinguish a…a from b…b from c…c.

a…a x1…xi-

1

b…b

c…c

y1…yi-

1

z1…zi-1

m/t y1 < … < yi-1 = b

z1 < … < zi-1 = c

Lower Bound for Primitive Problem

Hence Pi-1 must distinguish a…a from b…b from c…c.

Gives log(m/t) lower bound.

a…a x1…xi-

1b…b

y1…yi-

1

a…ab x1…xi-

1bb…bb

y1…yi-1b

x1 · … · xi-1 = a · b

But qi(b) = i-1. Contradiction.

Pi-1 Pi

Lower Bound for General Problem

a1…at

Mi(a1…at): i £ t prefixes where Pi sends the same message as (a1…at)i.

qi,j(a1…at): Length of longest IS in column j ending at/before aj.

a1…at a1…at a1…at

x1,1 x1,2 … x1,t

… … … …

xi,1 xi,2 … xi,t

a1 a2 … at

… … … …

a1 a2 … at

Mi(a1…at): i £ t prefixes where Pi sends the same message as (a1…at)i.

qi,j(a1…at): Length of longest IS in column j ending at/before aj.

...qi,1(a)

qi,t(a)

Lower Bound for General Problem

a1…at a1…at a1…at a1…at

Mi(a1…at): i £ t prefixes where Pi sends the same message as (a1…at)i.

qi,j(a1…at): Length of longest IS in column j ending at/before aj.

Lower Bound for General Problem

a1…at a1…at a1…at a1…at

...qi,1(a)

qi,t(a)

Part II:

Show that Pi-1 distinguishes between inputs in I of ≈ (m/t)t inputs.

Gives a lower bound of log(|I|) ≈ t log (m/t)

Lower Bound for General Problem

a1…at a1…at

Part I: Messages sent by Pi in round 2 and beyond depend on entire input.

Need to change defn. of Mi(a1…at).

Lower Bound for Many Rounds

a1…at a1…at a1…at a1…at

Part I: Messages sent by Pi in round 2 and beyond depend on entire input.

Need to change defn. of Mi(a1…at).

Part II: Reduce to 2-player protocol involving Pi-1 and Pt.Thm: Any deterministic O(1)-pass algorithm that gives a (1 + ) approximation to the LIS requires space √(n/).

Lower Bound for Many Rounds

a1…at a1…at

Conclusions

• Exact Computation of LIS() : – Patience Sorting [Ross,Mallows]– O(n) space, 1-pass streaming algorithm.– n) space lower bound. [G.-Jayram-

Krauthgamer-Kumar, Woodruff-Sun]

• Approximating LIS() :

– O(n/)1/2 space, deterministic 1-pass algorithm. [G.-Jayram-Krauthgamer-Kumar]

– This paper: The bound is tight for deterministic, O(1)-pass algorithms.

– [Ergun-Jowhari’08]: Different proof.

Randomized Complexity of LIS

Problem: Is the a randomized streaming algorithm to approximate the LIS using space o(√n) ?

• [Woodruff-Sun] O(log m) lower bound

• [Chakrabarti]: Randomized private-messages protocol for the direct-sum problem.

top related