i/o-efficient graph algorithms norbert zeh duke university eef summer school on massive data sets...
TRANSCRIPT
![Page 1: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/1.jpg)
I/O-Efficient Graph Algorithms
Norbert ZehDuke University
EEF Summer School on Massive Data SetsÅrhus, Denmark
June 26 – July 1, 2002
![Page 2: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/2.jpg)
Motivation
For theoreticians:• Graph problems are neat, often difficult, hence
interesting
For practitioners:• Massive graphs arise in GIS, web modelling, ...• Problems in computational geometry can be
expressed as graph problems• Many abstract problems best viewed as graph
problems• Extreme: Pointer-based data structures =
graphs with extra information at their nodes
![Page 3: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/3.jpg)
Outline
Fundamental graph problems• List ranking• Algorithms for trees
• Euler tour• Tree labelling
• Graph searching• BFS/DFS
• Connectivity• Connected components• Minimum spanning tree
• Single source shortest paths
![Page 4: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/4.jpg)
Outline
• Techniques and data structures• Graph contraction• Time-forward processing• Tournament tree• Buffered repository tree
• Lower bounds• List ranking• Connectivity
• Planar graphs
![Page 5: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/5.jpg)
Introduction and “Simple” Problems
• List ranking• Euler tour• Tree labelling• Evaluating directed acyclic graphs• Greedy graph algorithms
![Page 6: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/6.jpg)
List Ranking
1 2 3 4 5 6
3 1 5 2 3 1
3 4 9 11 14 15
![Page 7: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/7.jpg)
Why Is List Ranking Non-Trivial?
1 2 3 45 6 7 89 10 11 1213 14 15 16
1 5 9 13 2 6 10 14 3 47 811 1215 16
1 5 9 13 2 6 10 14 3 47 811 1215 16
1 5 9 13 2 6 10 14 3 47 811 1215 16
1 25 69 1013 14 3 47 811 1215 16
1 25 69 1013 14 3 47 811 1215 16
The internal memory algorithm spends (N) I/Os in the worst case.
![Page 8: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/8.jpg)
An Efficient List Ranking Algorithm
• Assume an independent set of size at least N/3 can be found efficiently (in O(sort(N)) I/Os).
3 1 5 2 3 1
3 1 7 4
3 4 11 15
3 1 5 2 3 1
3 4 9 11 14 15
![Page 9: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/9.jpg)
An Efficient List Ranking Algorithm
• Compressing L:• Sort elements in L \ I• Sort elements in I by their successor
pointers• Scan the two lists to update the label of
succ(v), for every element v I
• The I/O-complexity of this procedure is
Theorem: A list of size N can be ranked in O(sort(N)) I/Os.
NsortONsortOINI 3N2
![Page 10: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/10.jpg)
The Euler Tour Technique
Goal: Given a tree T, represent it by a list L so that certain computations on T can be performed by ranking L.
r
![Page 11: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/11.jpg)
The Euler Tour Technique
Theorem: Given the adjacency lists of the vertices in T, an Euler tour can be constructed in O(scan(N)) I/Os.
• Let {v,w1},…,{v,wr} be the edgesincident to v
• Then succ((wi,v)) = (v,wi+1)) v
w4
w3
w2
w1
![Page 12: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/12.jpg)
Rooting a Tree
• Choosing a vertex r as the root of a tree T defines parent-child relationships between adjacent nodes
• Rooting tree T =computing for every edge{v,w} who is the parentand who is the child
• v = p(w) if and only ifrank((v,w)) < rank((w,v))
Theorem: A tree can be rooted in O(sort(N)) I/Os.
![Page 13: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/13.jpg)
Computing a Preorder Numbering
Theorem: A preorder numbering of a rooted tree T can be computed in O(sort(N)) I/Os.
0
1
2
3 4
5
6
7 8
9
10
1
11
0
110
0
0
1 0
0
0
01
1
18
2
34
4
564
3
8
7 8
5
7
99
8
preorder#(v) = rank((p(v),v))
![Page 14: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/14.jpg)
Computing Subtree Sizes
Theorem: The nodes of T can be labelled with their subtree sizes in O(sort(N)) I/Os.
10
8
3
1 1
1
3
1 1
1
1
16
2
35
6
8107
4
11 14
9
1817
1312
15
2
1v,vprankvp,vrankvT
![Page 15: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/15.jpg)
Evaluating a Directed Acyclic Graph
• More general: Given a labelling , compute a labelling so that (v) is computed from (v) and (u1),…,(ur), where u1,…,ur are v’s in-neighbors
0
1
0
0
1
0
1
00 00
1
0
11
0
0
1
0
1
0
1 1
00
1
0
![Page 16: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/16.jpg)
Q:
0
1
0
1
2
3 4
5
6
7
8
10
9 11
12
Time-Forward Processing
• Assume nodes are given in topologically sorted order.
000
111
000 00
11
00
11
00
11
00
11
00
Use priority queue Q to send data along the edges.
(6,1,0)(4,2,1) (5,2,1) (6,1,0)(4,2,1) (4,3,0) (5,2,1) (5,3,0) (6,1,0)(4,2,1) (4,3,0) (5,2,1) (5,3,0) (6,1,0)(5,2,1) (5,3,0) (6,1,0)(5,2,1) (5,3,0) (6,1,0) (7,4,0) (8,4,0)(5,2,1) (5,3,0) (6,1,0) (7,4,0) (8,4,0)(6,1,0) (7,4,0) (8,4,0)(6,1,0) (6,5,1) (7,4,0) (7,5,1) (8,4,0) (8,5,1)(6,1,0) (6,5,1) (7,4,0) (7,5,1) (8,4,0) (8,5,1)(7,4,0) (7,5,1) (8,4,0) (8,5,1)(7,4,0) (7,5,1) (8,4,0) (8,5,1) (10,6,0)(7,4,0) (7,5,1) (8,4,0) (8,5,1) (10,6,0)(8,4,0) (8,5,1) (10,6,0)(8,4,0) (8,5,1) (9,7,1) (10,6,0) (10,7,1)(8,4,0) (8,5,1) (9,7,1) (10,6,0) (10,7,1)(9,7,1) (10,6,0) (10,7,1)(9,7,1) (9,8,0) (10,6,0) (10,7,1)(9,7,1) (9,8,0) (10,6,0) (10,7,1)(10,6,0) (10,7,1)(10,6,0) (10,7,1) (11,9,1) (12,9,1)(10,6,0) (10,7,1) (11,9,1) (12,9,1)(11,9,1) (12,9,1)(11,9,1) (11,10,0) (12,9,1) (12,10,0)(11,9,1) (11,10,0) (12,9,1) (12,10,0)(12,9,1) (12,10,0)
![Page 17: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/17.jpg)
Time-Forward Processing
Analysis:• Vertex set + adjacency lists scanned O(scan(|V| + |E|)) I/Os• Priority queue:
• Every edge inserted into and deleted from Qexactly once
O(|E|) priority queue operations O(sort(|E|)) I/Os
![Page 18: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/18.jpg)
Time-Forward Processing
Analysis:• Vertex set + adjacency lists scanned O(scan(|V| + |E|)) I/Os• Priority queue:
• Every edge inserted into and deleted from Qexactly once
O(|E|) priority queue operations O(sort(|E|)) I/Os
Theorem: A directed acyclic graph G = (V,E) can be evaluated in O(sort(|V| + |E|)) I/Os.
![Page 19: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/19.jpg)
Maximal Independent Set (MIS)
Algorithm GREEDYMIS:1. I 02. for every vertex v G do3. if no neighbor of v is in I then4. Add v to I5. end if6. end for
![Page 20: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/20.jpg)
Maximal Independent Set (MIS)
Algorithm GREEDYMIS:1. I 02. for every vertex v G do3. if no neighbor of v is in I then4. Add v to I5. end if6. end for
Observation: It suffices to consider all neighbors of v which have been visited in a previous iteration.
![Page 21: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/21.jpg)
Maximal Independent Set (MIS)
1
2
34
5
6
7
8 9
10
11
1
2
34
5
6
![Page 22: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/22.jpg)
1
2
34
5
6
7
8 9
10
11
11
22
3344
55
66
7
8
7
8 99
1010
1111
Maximal Independent Set (MIS)
Theorem: A maximal independent set of a graphG = (V,E) can be computed in O(sort(|V|+|E|)) I/Os.
![Page 23: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/23.jpg)
Large Independent Set of a List
Corollary: An independent set of size at least N/3 for a list L of size N can be found in O(sort(N)) I/Os.
• Every vertex in an MIS I prevents two other vertices from being in I:
Every MIS has size at least N/3.
![Page 24: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/24.jpg)
Graph Connectivity
• Connected components• Minimum spanning tree
![Page 25: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/25.jpg)
ConnectivityA Semi-External Algorithm
![Page 26: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/26.jpg)
ConnectivityA Semi-External Algorithm
Analysis:• Scan vertex set to load vertices into main
memory• Scan edge set to carry out algorithm• O(scan(|V| + |E|)) I/Os
Theorem: The connected components of a graph can be computed in O(scan(|V| + |E|)) I/Os, provided that |V| M.
![Page 27: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/27.jpg)
ConnectivityThe General Case
Idea:• If |V| M
• Use semi-external algorithm• If |V| > M
• Identify simple connected subgraphs of G• Contract these subgraphs to obtain graph
G’ = (V’,E’) with |V’| c|V|, c < 1• Recursively compute connected
components of G’• Obtain labelling of connected components
of G from labelling of components of G’
![Page 28: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/28.jpg)
A
BC
D
E
ConnectivityThe General Case
a
b
c
de
f
gh
i
j
k
lm
n
A
BC
D
E
1
12
2
2
1
1
1
1
1
12
2
22 2
2
2
2
![Page 29: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/29.jpg)
ConnectivityThe General Case
Main steps:• Find smallest neighbors (easy)• Compute connected components of graph
H induced by selected edges• Contract each component into a single vertex
(easy)• Call the procedure recursively• Copy label of every vertex v G’ to all vertices
in G represented by v (easy)
![Page 30: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/30.jpg)
ConnectivityThe General Case
• Every connected component of H has size at least 2 |V’| |V|/2 recursive calls
Theorem: The connected components of a graphG = (V,E) can be computed in I/Os. logEsortO M
V
MVlog
![Page 31: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/31.jpg)
ConnectivityThe General Case
• Later: BFS in O(|V| + sort(|E|)) I/Os Can be used to identify connected components
• When |V| = |E|/B, algorithm takes O(sort(|E|)) I/Os
• Can stop recursion after recursive calls
Theorem: The connected components of a graphG = (V,E) can be computed in I/Os.
EBVlog
EBVlogEsortO
![Page 32: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/32.jpg)
Minimum Spanning Tree (MST)
Observation: Connectivity algorithm can be augmented to produce a spanning tree of G.
a
b
c
de
f
gh
i
j
k
lm
n
A
BC
D
E
![Page 33: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/33.jpg)
Minimum Spanning Tree (MST)
To obtain a minimum spanning tree:• Choose edge of minimum weight incident to v
• Some book-keeping:• The weight of an edge e in the compressed
graph = the min weight of all edges represented by e
• When “e is added” to T, add in fact this minimum edge
4
15
3
va
b c
d
![Page 34: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/34.jpg)
Minimum Spanning Tree (MST)
a
b
c
de
f
gh
i
j
k
lm
n
A
BC
D
E
Theorem: A MST of a graph G = (V,E) can be computed in I/Os. logEsortO M
V
![Page 35: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/35.jpg)
A Fast MST Algorithm
• Idea:• Assume MST can be computed in
O(|V| + sort(|E|)) I/Os• Again recursion can be stopped after
iterations
• Prim’s algorithm:
EBVlog
![Page 36: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/36.jpg)
A Fast MST Algorithm
• Maintain superset of blue edges in priority queue Q
• When edge {v,w} of minimum weight is retrieved, test whether v,w are both in T• Yes discard edge• No Add edge to MST and add all edges
incident to w to Q, except {v,w}(assuming that w T)
Problem: How to testwhether v,w T.
![Page 37: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/37.jpg)
A Fast MST Algorithm
• If v,w T, but {v,w} T, then both v and w have inserted edge {v,w} into Q
There are two copies of {v,w} in Q• They are consecutive Perform two DELETEMIN operations
• If {v,w} = {y,z}, discard both• Otherwise, add {v,w} to T and re-insert {y,z}
v
w
![Page 38: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/38.jpg)
A Fast MST Algorithm
Analysis:• O(|V| + scan(|E|)) I/Os for retrieving adjacency
lists• O(sort(|E|)) I/Os for priority queue operations
Theorem: A MST of a graph G = (V,E) can be found in O(|V| + sort(|E|)) I/Os.
Corollary: A MST of a graph G = (V,E) can be found in I/Os.
EBVlogEsortO
![Page 39: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/39.jpg)
Graph Contraction and Sparse Graphs
• A graph G = (V,E) is sparse if for any graph H obtainable from G through a series of edge contractions, |E(H)| = O(|V(H)|).
• For a sparse graph, the number of vertices and edges in G reduces by a constant factor in each iteration of the connectivity and MST algorithms.
Theorem: The connected components or a MST of a sparse graph with N vertices can be computed in O(sort(N)) I/Os.
![Page 40: I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002](https://reader033.vdocuments.us/reader033/viewer/2022052414/56649e4c5503460f94b40fa0/html5/thumbnails/40.jpg)
Three Techniques for Graph Algorithms
• Time-forward processing:• Express graph problems as evaluation
problems of DAGs• Graph contraction:
• Reduce the size of G while maintaining the properties of interest
• Solve problem recursively on compressed graph
• Construct solution for G from solution for compressed graph
• Bootstrapping:• Switch to generally less efficient algorithm as
soon as (part of the) input is small enough