![Page 1: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/1.jpg)
Trading off space for passes in graph streaming problems
Camil DemetrescuIrene FinocchiAndrea Ribichini
University of Rome “La Sapienza”
Dagstuhl Seminar 05361
![Page 2: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/2.jpg)
Processing massive data streams
Large body of work in recent years
Practically motivated, raises interesting theoretical questions
Areas:Databases, Sensors, Networking, Hardware, Programming lang.
Core problems: Algorithms, Complexity, Statistics, Probability, Approximation theory
![Page 3: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/3.jpg)
Classical streaming
inputstream
M1stpass
M M MM2ndpass
M M M
p = number of passess = size of working memory M (space in bits)
n = size of input stream (# of items)
![Page 4: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/4.jpg)
Classical streaming
Seminal work by Munro and Paterson (1980): pass-efficient selection and sorting
Several problems shown to be solvable with polylog(n) space and passes in the 90’s (e.g., approximating frequency moments)
Classical streaming is very restrictive: for many fundamental problems (e.g., on graphs)provably impossible to achieve polylog(n) space and passes
![Page 5: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/5.jpg)
Graph streaming problems
For many basic graph problems(e.g., connectivity, shortest paths):
passes = Ω (N/space)( N = number of vertices )
Recent interest in graph problems in “semi-streaming” models, where:
space = O( N · polylog(N) )passes = O( polylog(N) )
[Feigenbaum et al., ICALP 2004]
O(N · polylog(N)) space “sweet spot” for graph streaming problems [Muthukrishnan, 2001]
![Page 6: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/6.jpg)
Graph algorithms in classical streaming
Approximate triangle counting[Bar-Yossef et al., SODA 2002]
Matching, bipartiteness, connectivity, MST, t-spanners, …[Feigenbaum et al., ICALP 2004, SODA 2005]
All of them make one, or very few passes, but require Ω(N) space
![Page 7: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/7.jpg)
Trading off space for passes
Natural question:Can we reduce space if we do more passes?
[Munro and Paterson ‘80, Henzinger et al. ‘99]
Example:
Processing a 50 GB graph on a 1 GB RAM PC(4 billion vertices, 6 billion edges)
s = (N/p) algorithm: ~16 passes (a few hours)
s = (N) algorithm: out of memory (16 GB RAM would be required)
![Page 8: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/8.jpg)
Some facts on modern commodity I/O
A RAID disk controller can deliver 100 MB/s access rateOn a 1+ GHz Pentium PC, random access to 2GB of main memory in 32 byte chunks: 80 MB/s effective access rate
•
•
Sequential access rates are comparable to (or even faster than) random access rates in main memory:
Sequential access uses caches optimally(this makes algorithms cache-oblivious)
[Ruhl ‘03 - Rajagopalan ‘02]
![Page 9: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/9.jpg)
Some facts on modern commodity I/O
Classical read-only streaming perhaps overly pessimistic?
Why not exploiting temporary storage?
Above facts imply that both reading and writing sequentially can improve performances
External memory storage is cheap (less than a dollar per gigabyte) and readily available
![Page 10: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/10.jpg)
interm.stream
M1stpass
The StreamSort model [Aggarwal et al.’04]
inputstream
M M M M M M M
outputstream
2ndpass M M M M M M M M
use a sorting primitiveto reorder the stream
![Page 11: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/11.jpg)
How much power does sorting yield?
Open problem:No clue on how to get polylog(N) bounds for Shortest Paths (even BFS) in StreamSort
Good news:Undirected connectivity can be solved in polylog(N) space and passesin StreamSort
[Aggarwal et al., FOCS 2004]
![Page 12: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/12.jpg)
Dish of the day
In this model, we show effective space/passes tradeoffs for natural graph streaming problems
- Connectivity - Single-source shortest paths
We address:
We show that StreamSort can yield interestingresults even without using sorting at all
(call this more restrictive model W-Stream: allows intermediate streams, but no sorting)
![Page 13: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/13.jpg)
Graph connectivity
UCON: G=(V,E) undirected graph with N vertices given as stream of edges in arbitrary order. Find out if G is connected.
Lower bound: UCON in W-Stream p = Ω(N/s)
Upper bound: UCON in W-Stream p = O(N · log N / s)
We now show the following:
![Page 14: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/14.jpg)
Input stream Output stream
G G’
passF
Graph connectivity: algorithm
1 2
3 7 5
811
12
11
12
8
5
9
610
1
9104
Generic pass: two phasesRed phaseBlue phase
![Page 15: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/15.jpg)
Graph connectivity: analysis
How many passes?
At each pass we loose at least |V(F)| / 2 = (s/log N) vertices
Invariant: F is induced by a set of edges each tree in F contains at least two vertices
p = O( N ·log N / s)
All vertices of F that are not component representatives disappear from the output graph
![Page 16: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/16.jpg)
Single-source shortest paths
SSSP: G=(V,E,w) weighted directed graph with N vertices given as arbitrary stream of edges. Find distances from a given source t to all other vertices.
Lower bound 1: BFS in W-Stream: p = Ω(N / s)
Lower bound 2: finding vertices up to constant distance d: p ≤ d s = Ω( N1+1/(2d) ) [Feigenbaum et al., SODA 2005]
Space-efficient algorithms for SSSPalways require multiple passes
![Page 17: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/17.jpg)
Single-source shortest paths
Hard even using sorting as a primitive
No sublinear-space streaming algorithm for SSSP previously known.
We make a first step, showing that we can solve SSSP in W-Stream in sublinear space and passes simultaneously in directed graphs with small integer edge weights
Previous results on distances in streaming: approximate (spanners) in undirected graphs only
![Page 18: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/18.jpg)
Single-source shortest paths: bound
For C = O(s1/2-) and polynomial sublinear space, we also get sublinear p
Thm: For any space restriction s, there is a randomized one-sided error algorithm for directed SSSP in W-Stream with edge weights in 1,2,…,C s.t.:
p = O C ·N ·log3/2 N
√s
In this talk we focus on C=1 (BFS)
p = O N√s
~p = Ω
Ns
![Page 19: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/19.jpg)
Single-source shortest paths: approach
For a given space restriction, this helps us reduce the number of passes to find long paths
Overall approach: First build many short paths “in parallel”, then stitch them together to form long paths.
![Page 20: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/20.jpg)
Single-source shortest paths: step 1/5
Pick a set K of (s/log N)1/2 random vertices including source t
1 6 10 5 8 3 7 2 4 9t
Example: (chain)
![Page 21: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/21.jpg)
1 1 1 12 223 3 3
Single-source shortest paths: step 2/5
Find distances up to (N log N) / |K| from each vertex in K (short distances)
1 6 10 5 8 3 7 2 4 9t
Example: (chain)
N log N|K|
0 0 0 0
The more memory we have,the larger |K|, and thus the smaller the # of passes
![Page 22: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/22.jpg)
Single-source shortest paths: step 3/5
Build a graph G’ = (K, E’), where: (x,y) E’ dist(x,y) ≤ (N log N) / |K| in G
1 6 10 5 8 3 7 2 4 9t
Example: (chain)
1 5 7 4t3 3 2
G’
1 1 1 12 223 3 3
0 0 0 0
![Page 23: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/23.jpg)
0 3 6 8
Single-source shortest paths: step 4/5
Find in G’ distancesfrom t to all other vertices of K
1 6 10 5 8 3 7 2 4 9t
Example: (chain)
1 5 7 4t3 3 2
G’
0 3 6 8
![Page 24: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/24.jpg)
Single-source shortest paths: step 5/5
For each v, let: dist(t,v) = min c K dist(t,c) + dist(c,v)
(final distances)
1 6 10 5 8 3 7 2 4 9t
Example: (chain)
0 3 6 8
1 1 1 12 223 3 3
0 0 0 0
1 2 4 5 7 9
![Page 25: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/25.jpg)
Results are correct with high prob.
[Greene & Knuth,’80]
Sampling thm. Let K be a set of vertices chosen uniformly at random. Then the probability that a simple path with more than (c ·N · log N) / |K| vertices intersects K is at least 1-1/nc for any c > 0
![Page 26: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/26.jpg)
Conclusions and further work
We have shown effective space/passes tradeoffs for problems that seem hard in classical streaming (graph connectivity & shortest paths)
Can we close the gap between upper and lower bound for BFS in W-Stream?
Can we do the same in the classical read-only streaming model?
Can we prove stronger lower bounds in classicalstreaming?
Space/passes tradeoffs for other problems?
![Page 27: Trading off space for passes in graph streaming problems](https://reader036.vdocuments.us/reader036/viewer/2022081516/5681424a550346895dae73f0/html5/thumbnails/27.jpg)
Thank you