lower bounds on streaming algorithms for approximating the length of the longest increasing...
TRANSCRIPT
Lower Bounds on Streaming Algorithms for Approximating the
Length of theLongest Increasing Subsequence.
Anna Gal UT Austin
Parikshit Gopalan U. Washington & UT Austin
Data Stream Model of Computation
X1 X2 X3 … XnInput
Storage
• Single pass.
• Small storage space, update time.
• Surprisingly powerful [Alon-Matias-Szegedy, …]
Estimated Sortedness on Data-Streams
Cannot sort efficiently.
Can we tell if the data needs to be sorted?[Ajtai-Jayram-Kumar-Sivakumar, Gupta-Zane,Cormode-Muthukrishnan-Sahinalp, LibenNowell-Vee-Zhu,Woodruff-Sun, G.-Jayram-Kumar-Sivakumar]
Measuring Sortedness: Length of Longest Increasing Subsequence. Ulam/Edit distance Inversion/Kendall Tau distance
LIS(): Length of Longest Increasing Subsequence.
5 7 8 1 4 2 10 3 6 9
Longest Increasing Subsequence
LIS(): Length of Longest Increasing Subsequence.
5 7 8 1 4 2 10 3 6 9
Studied in statistics, biology, computer science … [Gusfeld, Pevzner, Aldous-Diaconis…]
Longest Increasing Subsequence
Prior Work
• Exact Computation of LIS() : – Patience Sorting [Ross,Mallows]
O(n) space, 1-pass streaming algorithm.– n) space lower bound. [G.-Jayram-Krauthgamer-
Kumar’07, Woodruff-Sun’07]
• Approximating LIS() :– Deterministic, O(n/)1/2 space, (1 + )-approx. [G.-Jayram-Krauthgamer-Kumar’07]
Conjecture [GJKK]: Every 1-pass deterministic algorithm that gives a 1.1-approximation to LIS() requires √n) space.
Our Results
Thm: Any det. O(1)-pass algorithm that gives a (1 + ) approximation to the LIS requires space √(n/). • Tight bounds in n, .
• Proof via direct sum approach.
• Direct sum for maximum communication in the private messages model.
• Separation between communication models.
A Communication Problem
Consider the following problem:
• t players, t numbers each.
• Goal: Approximate length of the LIS.
• Enough to show a lower bound of (t) on maximum message size.
1.6 2.8 3.5 4.6
1.8 2.9 3.7 4.9
1 2 3.2 4.2
A Communication Problem
Consider the following problem –
• t players, t numbers each.
• Goal: Approximate length of the LIS.
• Enough to show a lower bound of (t) on maximum message size.
1.8 2.9 3.7 4.9
1.6 2.8 3.5 4.6
1.3 2.5 3.3 4.5
1 2 3.2 4.2
P1
P2
…
Pt
A Communication Problem
1.8 2.9 3.7 4.9
1.6 2.8 3.5 4.6
1.3 2.5 3.3 4.5
1 2 3.2 4.2
No Yes
1.7 2.8 3.4 4.8
1.6 2.6 3.5 4.6
1.3 2.5 3.1 4.5
1.1 2.1 3.9 4.2
P1
P2
…
Pt
[GJKK]: Consider the following decision problem –
1.8 2.9 3.7 4.9
1.6 2.8 3.5 4.6
1.3 2.5 3.3 4.5
1 2 3.2 4.2
1.7 2.8 3.4 4.8
1.6 2.6 3.5 4.6
1.3 2.5 3.1 4.5
1.1 2.1 3.9 4.2
No Yes
All columns non-increasing
P1
P2
…
Pt
A Communication Problem
[GJKK]: Consider the following decision problem –
1.8
2.9
3.7
4.9
1.6 2.8 3.5 4.6
1.3 2.5 3.3 4.5
1 2 3.2 4.2
1.7 2.8 3.4 4.8
1.6 2.6 3.5 4.6
1.3 2.5 3.1 4.5
1.1 2.1 3.9 4.2
No Yes
P1
P2
…
Pt
A Communication Problem
[GJKK]: Consider the following decision problem –
All columns non-increasing
1.8
2.9
3.7
4.9
1.6 2.8 3.5 4.6
1.3 2.5 3.3 4.5
1 2 3.2 4.2
1.7 2.8 3.4 4.8
1.6 2.6 3.5 4.6
1.3 2.5 3.1 4.5
1.1 2.1 3.9 4.2
No Yes
Some column increasing
P1
P2
…
Pt
A Communication Problem
[GJKK]: Consider the following decision problem –
All columns non-increasing
1.8
2.9
3.7
4.9
1.6 2.8 3.5 4.6
1.3 2.5 3.3 4.5
1 2 3.2 4.2
1.7
2.8
3.4
4.8
1.6 2.63.5
4.6
1.3 2.5 3.1 4.5
1.1 2.13.9
4.2
No Yes
Some column increasing
P1
P2
…
Pt
A Communication Problem
[GJKK]: Consider the following decision problem –
All columns non-increasing
Direct Sum Paradigm
x1 y1
p(x1, y1)
Primitive Problem:
Direct Sum Paradigm
x1,…,xn
y1,…,yn
Çi p(xi,yi)
Can run n copies of protocol for p.
Direct-Sum Question: Is this the best possible?
Set-Disjointness, Inner Product…
Techniques for proving direct-sum theorems:
[KN,CKSW,BJKS,SS…]
Direct Sum Problem:
Primitive Problem
0.7
0.5
0.3
0.2
0.4
0.5
0.4
0.9
No Yes
P1
P2
…
Pt
Direct Sum of Primitive Problems
0.7
0.5
0.3
0.2
No Yes
P1
P2
…
Pt
0.9
0.8
0.5
0.2
0.9
0.6
0.5
0.2
0.8
0.6
0.3
0.0
0.7
0.6
0.3
0.1
0.8
0.6
0.5
0.1
0.4
0.5
0.1
0.9
0.8
0.6
0.5
0.2
All No instances
Direct Sum of Primitive Problems
0.7
0.5
0.3
0.2
No Yes
P1
P2
…
Pt
0.9
0.8
0.5
0.2
0.9
0.6
0.5
0.2
0.8
0.6
0.3
0.0
0.7
0.6
0.3
0.1
0.8
0.6
0.5
0.1
0.4
0.5
0.1
0.9
0.8
0.6
0.5
0.2
All No instances
One Yes instance
Direct Sum of Primitive Problems
No Yes
P1
P2
…
Pt
0.8 0.9 0.7 0.9
0.6 0.8 0.5 0.6
0.3 0.5 0.3 0.5
0.0 0.2 0.2 0.2
0.7 0.8 0.4 0.9
0.6 0.6 0.5 0.6
0.3 0.5 0.1 0.5
0.1 0.1 0.9 0.2
1.8 2.9 3.7 4.9
1.8 2.9 3.7 4.9
1.8 2.9 3.7 4.9
1.8 2.9 3.7 4.9
1.7 2.8 3.4 4.8
1.7 2.8 3.5 4.8
1.7 2.8 3.4 4.8
1.7 2.8 3.9 4.8
No Yes
[GG] An Easier Problem
Hope: Some player distinguishes between many No instances.
BlackBoard Model of One-Way Communication
• Players speak in order.
• Every message seen by all.
• Last player outputs answer.
1.8
2.9
3.7
4.9
1.8
2.9
3.7
4.9
1.8
2.9
3.7
4.9
1.8
2.9
3.7
4.9
1.7
2.8
3.4
4.8
1.7
2.8
3.5
4.8
1.7
2.8
3.4
4.8
1.7
2.8
3.9
4.8
No Yes
Problem is Easy in the BlackBoard model
BlackBoard protocol with max. communication 2 log(m).
1.8
2.9
3.7
4.9
1.8
2.9
3.7
4.9
1.8
2.9
3.7
4.9
1.8
2.9
3.7
4.9
1.7
2.8
3.4
4.8
1.7
2.8
3.5
4.8
1.7
2.8
3.4
4.8
1.7
2.8
3.9
4.8
No Yes
Problem is Easy in the BlackBoard model
BlackBoard protocol with max. communication 2 log(m).
Private Messages Model
• Messages seen by next player only.
• Suffices for streaming lower bound.
• Requires non-standard techniques.
1.8 2.9 3.7 4.9
1.8 2.9 3.7 4.9
1.8 2.9 3.7 4.9
1.8 2.9 3.7 4.9
1.7 2.8 3.4 4.8
1.7 2.8 3.5 4.8
1.7 2.8 3.4 4.8
1.7 2.8 3.9 4.8
No Yes
Strong lower bound for maximum communication in the private messages model.
Thm: Any det. O(1)-pass algorithm that gives a (1 + ) approximation to the LIS requires space √(n/).
Private Messages Model
Separation between blackboard and private messages.
Proof Outline
• Step 1: Primitive Problem (one round).• Step 2: Direct-sum Problem (one-round).• Multi-round Protocols.
Primitive Problem
3.4
3.4
3.4
3.4
3.4
3.5
3.4
3.9
No Yes
P1
P2
…
Pt
Alphabet of size m > t. Yes Case: LIS() > t/2.
Easy: Bound of ≈ (log m)/t on max communication.
Thm: Max communication is at least log (m/t).
Lower Bound for Primitive Problem
a a a a
Pis message is specified by prefix x1…xi.
Mi(a): Prefixes where Pi sends the same message as a…a.
qi(a): Length of longest IS in Mi(a) ending below a.
a…a
x1…xi
aa…a a…a
Lower Bound for Primitive Problem
Mi(a): Inputs where Pi sends the same message as a…a.
qi(a): Length of longest IS in Mi(a) ending below a.
i
qi(a)
• Monotone
x1…xi 2 Mi(a) ) x1…xia 2 Mi+1(a)
• Bounded by t/2
Correctness.
a a a a
Lower Bound for Primitive Problem
Mi(a): Inputs where Pi sends the same message as a…a.
qi(a): Length of longest IS in Mi(a) ending below a.
i
qi(a)
Map a to first i s.t
qi-1(a) = qi(a).
Some i occurs m/t times.
a a a a
Lower Bound for Primitive Problem
Pi-1 Pi
x1 < … < xi-1 = a
Claim: Pi-1 must distinguish a…a from b…b from c…c.
a…a x1…xi-
1
b…b
c…c
y1…yi-
1
z1…zi-1
m/t y1 < … < yi-1 = b
z1 < … < zi-1 = c
Lower Bound for Primitive Problem
Hence Pi-1 must distinguish a…a from b…b from c…c.
Gives log(m/t) lower bound.
a…a x1…xi-
1b…b
y1…yi-
1
a…ab x1…xi-
1bb…bb
y1…yi-1b
x1 · … · xi-1 = a · b
But qi(b) = i-1. Contradiction.
Pi-1 Pi
Lower Bound for General Problem
a1…at
Mi(a1…at): i £ t prefixes where Pi sends the same message as (a1…at)i.
qi,j(a1…at): Length of longest IS in column j ending at/before aj.
a1…at a1…at a1…at
x1,1 x1,2 … x1,t
… … … …
xi,1 xi,2 … xi,t
a1 a2 … at
… … … …
a1 a2 … at
Mi(a1…at): i £ t prefixes where Pi sends the same message as (a1…at)i.
qi,j(a1…at): Length of longest IS in column j ending at/before aj.
...qi,1(a)
qi,t(a)
Lower Bound for General Problem
a1…at a1…at a1…at a1…at
Mi(a1…at): i £ t prefixes where Pi sends the same message as (a1…at)i.
qi,j(a1…at): Length of longest IS in column j ending at/before aj.
Lower Bound for General Problem
a1…at a1…at a1…at a1…at
...qi,1(a)
qi,t(a)
Part II:
Show that Pi-1 distinguishes between inputs in I of ≈ (m/t)t inputs.
Gives a lower bound of log(|I|) ≈ t log (m/t)
Lower Bound for General Problem
a1…at a1…at
Part I: Messages sent by Pi in round 2 and beyond depend on entire input.
Need to change defn. of Mi(a1…at).
Lower Bound for Many Rounds
a1…at a1…at a1…at a1…at
Part I: Messages sent by Pi in round 2 and beyond depend on entire input.
Need to change defn. of Mi(a1…at).
Part II: Reduce to 2-player protocol involving Pi-1 and Pt.Thm: Any deterministic O(1)-pass algorithm that gives a (1 + ) approximation to the LIS requires space √(n/).
Lower Bound for Many Rounds
a1…at a1…at
Conclusions
• Exact Computation of LIS() : – Patience Sorting [Ross,Mallows]– O(n) space, 1-pass streaming algorithm.– n) space lower bound. [G.-Jayram-
Krauthgamer-Kumar, Woodruff-Sun]
• Approximating LIS() :
– O(n/)1/2 space, deterministic 1-pass algorithm. [G.-Jayram-Krauthgamer-Kumar]
– This paper: The bound is tight for deterministic, O(1)-pass algorithms.
– [Ergun-Jowhari’08]: Different proof.
Randomized Complexity of LIS
Problem: Is the a randomized streaming algorithm to approximate the LIS using space o(√n) ?
• [Woodruff-Sun] O(log m) lower bound
• [Chakrabarti]: Randomized private-messages protocol for the direct-sum problem.