element distinctness, frequency moments, and sliding windows · previous element distinctness lower...

Post on 22-Sep-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Element Distinctness, Frequency Moments, andSliding Windows

Raphael Clifford

University of Bristol, UK

arXiv:1309.3690

FOCS 2013

Joint work with Paul Beame and Widad Machmouchi

Time-space tradeoffs

Wikipedia:Beer

1. Frequency moments. E.g. how many different beer cans?

2. Element distinctness (ED). Have I had the same can twice?

Particularly simple to solve if presorted.

Time-space tradeoffs

What is the complexity of these problems using small space?

I Any solution using sorting requires T ∈ Ω(n2/S)[Borodin-Cook 82, Beame 91].

I What is the true complexity using small space?

I Are both problems really as hard as sorting? (No.)

I How about multi-output or sliding window versions?

Time-space tradeoffs

What is the complexity of these problems using small space?

I Any solution using sorting requires T ∈ Ω(n2/S)[Borodin-Cook 82, Beame 91].

I What is the true complexity using small space?

I Are both problems really as hard as sorting? (No.)

I How about multi-output or sliding window versions?

Time-space tradeoffs

What is the complexity of these problems using small space?

I Any solution using sorting requires T ∈ Ω(n2/S)[Borodin-Cook 82, Beame 91].

I What is the true complexity using small space?

I Are both problems really as hard as sorting? (No.)

I How about multi-output or sliding window versions?

Time-space tradeoffs

What is the complexity of these problems using small space?

I Any solution using sorting requires T ∈ Ω(n2/S)[Borodin-Cook 82, Beame 91].

I What is the true complexity using small space?

I Are both problems really as hard as sorting? (No.)

I How about multi-output or sliding window versions?

Sliding window ED and frequency moments

A B C B A B C

I #distinct elements (F0) = 3, 2, 3, 2, 3.

I Sliding window ED gives 1, 0, 1, 0, 1.

I We show sliding window ED is easier than sorting but slidingwindow F0 mod 2 is as hard as sorting.

Our new results

Our new upper and lower bounds:

Single window Sliding window

Frequency moments T ∈ Ω(n√

log(n/S)/ log log(n/S)) [BSSV 03]T ∈ Ω(n2/S) (New)T ∈ O(n2/S) (New)

Element distinctnessT ∈ Ω(n

√log(n/S)/ log log(n/S)) [BSSV 03]

T ∈ O(n√n/S) (New)

T ∈ O(n√n/S) (New)

F0 mod 2 T ∈ O(n2/S) [PR 98]T ∈ O(n2/S) (New)T ∈ Ω(n2/S) (New)

Previous element distinctness lower bounds:

Comparison model Multi-way branching

Borodin et al. 1987 T ∈ Ω(n3/2√

log n/S) -

Yao 1988 T ∈ Ω(n2−ε(n)/S) -Ajtai 1999 - S ∈ o(n)⇒ T ∈ ω(n)

Beame et al. 2003 - T ∈ Ω(n√

log(n/S)/ log log(n/S))

Our new results

Our new upper and lower bounds:

Single window Sliding window

Frequency moments T ∈ Ω(n√

log(n/S)/ log log(n/S)) [BSSV 03]T ∈ Ω(n2/S) (New)T ∈ O(n2/S) (New)

Element distinctnessT ∈ Ω(n

√log(n/S)/ log log(n/S)) [BSSV 03]

T ∈ O(n√n/S) (New)

T ∈ O(n√n/S) (New)

F0 mod 2 T ∈ O(n2/S) [PR 98]T ∈ O(n2/S) (New)T ∈ Ω(n2/S) (New)

Previous element distinctness lower bounds:

Comparison model Multi-way branching

Borodin et al. 1987 T ∈ Ω(n3/2√

log n/S) -

Yao 1988 T ∈ Ω(n2−ε(n)/S) -Ajtai 1999 - S ∈ o(n)⇒ T ∈ ω(n)

Beame et al. 2003 - T ∈ Ω(n√

log(n/S)/ log log(n/S))

Our new results

Our new upper and lower bounds:

Single window Sliding window

Frequency moments T ∈ Ω(n√

log(n/S)/ log log(n/S)) [BSSV 03]T ∈ Ω(n2/S) (New)T ∈ O(n2/S) (New)

Element distinctnessT ∈ Ω(n

√log(n/S)/ log log(n/S)) [BSSV 03]

T ∈ O(n√n/S) (New)

T ∈ O(n√n/S) (New)

F0 mod 2 T ∈ O(n2/S) [PR 98]T ∈ O(n2/S) (New)T ∈ Ω(n2/S) (New)

Previous element distinctness lower bounds:

Comparison model Multi-way branching

Borodin et al. 1987 T ∈ Ω(n3/2√

log n/S) -

Yao 1988 T ∈ Ω(n2−ε(n)/S) -Ajtai 1999 - S ∈ o(n)⇒ T ∈ ω(n)

Beame et al. 2003 - T ∈ Ω(n√

log(n/S)/ log log(n/S))

The overall method for our upper bounds

I Using Floyd’s (Pollard’s rho) cycle finding algorithm, we

construct a T ∈ O(n√

n/S) randomised branching programalgorithm for single window ED.

I 1-sided error, inversely polynomial in n

I Reduction from sliding-window ED to single window ED.I T ∈ O(n

√n/S) for a single window gives T ∈ O(n

√n/S) for

sliding windows

I Sliding window frequency moments T ∈ O(n2/S) in thecomparison model.

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Randomised cycle finding

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

157

ED input

Randomised cycle finding

Random

Hash function

: → []

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

2…

6…

3…

5…

7…

3…

1…

…4

157

37

98

134

157

396

452

812

921

ED input

Randomised cycle finding

Random

Hash function

: → []

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

2…

6…

3…

5…

7…

3…

1…

…4

157

37

98

134

157

396

452

812

921

ED input

Randomised cycle finding

Random

Hash function

: → []

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

2…

6…

3…

5…

7…

3…

1…

…4

157

37

98

134

157

396

452

812

921

ED input

475

2

1

3

8 6

9

Induced Graph

A new small-space upper bound for ED

Sampling uniformly from [n]:

I Expect to find a repeated value after Θ(√n) samples.

I Prob. of any fixed pair of numbers appearing before arepeated value approaches 2/n.

There is a reasonable chance of finding a real input duplicate inone cycle. Using constant space we can repeat Θ(n) times.

But to run faster using more space, we can’t just run S instancesin parallel.

A new small-space upper bound for ED

Sampling uniformly from [n]:

I Expect to find a repeated value after Θ(√n) samples.

I Prob. of any fixed pair of numbers appearing before arepeated value approaches 2/n.

There is a reasonable chance of finding a real input duplicate inone cycle. Using constant space we can repeat Θ(n) times.

But to run faster using more space, we can’t just run S instancesin parallel.

A new small-space upper bound for ED

I Maintain a redirection list to split cycles. Update listwhenever a new collision is found.

I Cycles roughly halve in length each time they are visited.

We find all collisions reachable from any S distinct starting pointsusing O(S) items of space and time roughly proportional to thesize of the subgraph explored.

A new small-space upper bound for ED

Theorem

There is a randomised branching program algorithm computing EDwith 1-sided error that uses space S and T ∈ O(n

√n/S).

1. Run roughly n/S independent runs of collision-finding withindependent random choices of hash functions andindependent choices of roughly S starting indices.

2. Use run-time cut-off bounding the number of explored verticesat 2√Sn.

3. On each run, check if any of the collisions found is a duplicatein x , in which case output ED(x) = 0 and halt.

4. If none is found in any round then output ED(x) = 1.

A new small-space upper bound for ED

Theorem

There is a randomised branching program algorithm computing EDwith 1-sided error that uses space S and T ∈ O(n

√n/S).

1. Run roughly n/S independent runs of collision-finding withindependent random choices of hash functions andindependent choices of roughly S starting indices.

2. Use run-time cut-off bounding the number of explored verticesat 2√Sn.

3. On each run, check if any of the collisions found is a duplicatein x , in which case output ED(x) = 0 and halt.

4. If none is found in any round then output ED(x) = 1.

Sliding window ED

Theorem

Sliding window ED can be solved in time T ∈ O(n√n/S) with

1-sided error probability o(1/n).

Idea:

I Reduce to single window ED.

I A duplicate in one window determines a large number ofoutputs.

A general sequential lower bound for sliding window F0

Framework of [Borodin-Cook 82, Abrahamson 91]

T ∈ Ω(n2/S) follows if, for some random input distribution, nomatter how cn input values are fixed, any fixed set of Θ(S) outputvalues occurs with prob. 2−S .

I Informally, for the first S outputs, it is hard to predict thenext output value even when a constant fraction of the inputvector is known.

A general sequential lower bound for sliding window F0

Framework of [Borodin-Cook 82, Abrahamson 91]

T ∈ Ω(n2/S) follows if, for some random input distribution, nomatter how cn input values are fixed, any fixed set of Θ(S) outputvalues occurs with prob. 2−S .

I Informally, for the first S outputs, it is hard to predict thenext output value even when a constant fraction of the inputvector is known.

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a ? ? ? ? a

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a a a a a ?

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a b c d e ?

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a b c d e ?

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a b c d e ?

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

Summing up

I Finding two identical items is easier than sorting the input.

I Sliding window element distinctness is still easier than sorting

I F0 mod 2 may be better than ED as an example of a harddecision problem to study.

I Sliding window F0 mod 2 has the same complexity as sorting(ignoring log factors).

I Is our new complexity for element distinctness in fact tight?

Thank youwww.background-free.com

top related