i/o-efficient graph algorithms norbert zeh duke university eef summer school on massive data sets...
Post on 28-Dec-2015
220 Views
Preview:
TRANSCRIPT
I/O-Efficient Graph Algorithms
Norbert ZehDuke University
EEF Summer School on Massive Data SetsÅrhus, Denmark
June 26 – July 1, 2002
Motivation
For theoreticians:• Graph problems are neat, often difficult, hence
interesting
For practitioners:• Massive graphs arise in GIS, web modelling, ...• Problems in computational geometry can be
expressed as graph problems• Many abstract problems best viewed as graph
problems• Extreme: Pointer-based data structures =
graphs with extra information at their nodes
Outline
Fundamental graph problems• List ranking• Algorithms for trees
• Euler tour• Tree labelling
• Graph searching• BFS/DFS
• Connectivity• Connected components• Minimum spanning tree
• Single source shortest paths
Outline
• Techniques and data structures• Graph contraction• Time-forward processing• Tournament tree• Buffered repository tree
• Lower bounds• List ranking• Connectivity
• Planar graphs
Introduction and “Simple” Problems
• List ranking• Euler tour• Tree labelling• Evaluating directed acyclic graphs• Greedy graph algorithms
List Ranking
1 2 3 4 5 6
3 1 5 2 3 1
3 4 9 11 14 15
Why Is List Ranking Non-Trivial?
1 2 3 45 6 7 89 10 11 1213 14 15 16
1 5 9 13 2 6 10 14 3 47 811 1215 16
1 5 9 13 2 6 10 14 3 47 811 1215 16
1 5 9 13 2 6 10 14 3 47 811 1215 16
1 25 69 1013 14 3 47 811 1215 16
1 25 69 1013 14 3 47 811 1215 16
The internal memory algorithm spends (N) I/Os in the worst case.
An Efficient List Ranking Algorithm
• Assume an independent set of size at least N/3 can be found efficiently (in O(sort(N)) I/Os).
3 1 5 2 3 1
3 1 7 4
3 4 11 15
3 1 5 2 3 1
3 4 9 11 14 15
An Efficient List Ranking Algorithm
• Compressing L:• Sort elements in L \ I• Sort elements in I by their successor
pointers• Scan the two lists to update the label of
succ(v), for every element v I
• The I/O-complexity of this procedure is
Theorem: A list of size N can be ranked in O(sort(N)) I/Os.
NsortONsortOINI 3N2
The Euler Tour Technique
Goal: Given a tree T, represent it by a list L so that certain computations on T can be performed by ranking L.
r
The Euler Tour Technique
Theorem: Given the adjacency lists of the vertices in T, an Euler tour can be constructed in O(scan(N)) I/Os.
• Let {v,w1},…,{v,wr} be the edgesincident to v
• Then succ((wi,v)) = (v,wi+1)) v
w4
w3
w2
w1
Rooting a Tree
• Choosing a vertex r as the root of a tree T defines parent-child relationships between adjacent nodes
• Rooting tree T =computing for every edge{v,w} who is the parentand who is the child
• v = p(w) if and only ifrank((v,w)) < rank((w,v))
Theorem: A tree can be rooted in O(sort(N)) I/Os.
Computing a Preorder Numbering
Theorem: A preorder numbering of a rooted tree T can be computed in O(sort(N)) I/Os.
0
1
2
3 4
5
6
7 8
9
10
1
11
0
110
0
0
1 0
0
0
01
1
18
2
34
4
564
3
8
7 8
5
7
99
8
preorder#(v) = rank((p(v),v))
Computing Subtree Sizes
Theorem: The nodes of T can be labelled with their subtree sizes in O(sort(N)) I/Os.
10
8
3
1 1
1
3
1 1
1
1
16
2
35
6
8107
4
11 14
9
1817
1312
15
2
1v,vprankvp,vrankvT
Evaluating a Directed Acyclic Graph
• More general: Given a labelling , compute a labelling so that (v) is computed from (v) and (u1),…,(ur), where u1,…,ur are v’s in-neighbors
0
1
0
0
1
0
1
00 00
1
0
11
0
0
1
0
1
0
1 1
00
1
0
Q:
0
1
0
1
2
3 4
5
6
7
8
10
9 11
12
Time-Forward Processing
• Assume nodes are given in topologically sorted order.
000
111
000 00
11
00
11
00
11
00
11
00
Use priority queue Q to send data along the edges.
(6,1,0)(4,2,1) (5,2,1) (6,1,0)(4,2,1) (4,3,0) (5,2,1) (5,3,0) (6,1,0)(4,2,1) (4,3,0) (5,2,1) (5,3,0) (6,1,0)(5,2,1) (5,3,0) (6,1,0)(5,2,1) (5,3,0) (6,1,0) (7,4,0) (8,4,0)(5,2,1) (5,3,0) (6,1,0) (7,4,0) (8,4,0)(6,1,0) (7,4,0) (8,4,0)(6,1,0) (6,5,1) (7,4,0) (7,5,1) (8,4,0) (8,5,1)(6,1,0) (6,5,1) (7,4,0) (7,5,1) (8,4,0) (8,5,1)(7,4,0) (7,5,1) (8,4,0) (8,5,1)(7,4,0) (7,5,1) (8,4,0) (8,5,1) (10,6,0)(7,4,0) (7,5,1) (8,4,0) (8,5,1) (10,6,0)(8,4,0) (8,5,1) (10,6,0)(8,4,0) (8,5,1) (9,7,1) (10,6,0) (10,7,1)(8,4,0) (8,5,1) (9,7,1) (10,6,0) (10,7,1)(9,7,1) (10,6,0) (10,7,1)(9,7,1) (9,8,0) (10,6,0) (10,7,1)(9,7,1) (9,8,0) (10,6,0) (10,7,1)(10,6,0) (10,7,1)(10,6,0) (10,7,1) (11,9,1) (12,9,1)(10,6,0) (10,7,1) (11,9,1) (12,9,1)(11,9,1) (12,9,1)(11,9,1) (11,10,0) (12,9,1) (12,10,0)(11,9,1) (11,10,0) (12,9,1) (12,10,0)(12,9,1) (12,10,0)
Time-Forward Processing
Analysis:• Vertex set + adjacency lists scanned O(scan(|V| + |E|)) I/Os• Priority queue:
• Every edge inserted into and deleted from Qexactly once
O(|E|) priority queue operations O(sort(|E|)) I/Os
Time-Forward Processing
Analysis:• Vertex set + adjacency lists scanned O(scan(|V| + |E|)) I/Os• Priority queue:
• Every edge inserted into and deleted from Qexactly once
O(|E|) priority queue operations O(sort(|E|)) I/Os
Theorem: A directed acyclic graph G = (V,E) can be evaluated in O(sort(|V| + |E|)) I/Os.
Maximal Independent Set (MIS)
Algorithm GREEDYMIS:1. I 02. for every vertex v G do3. if no neighbor of v is in I then4. Add v to I5. end if6. end for
Maximal Independent Set (MIS)
Algorithm GREEDYMIS:1. I 02. for every vertex v G do3. if no neighbor of v is in I then4. Add v to I5. end if6. end for
Observation: It suffices to consider all neighbors of v which have been visited in a previous iteration.
Maximal Independent Set (MIS)
1
2
34
5
6
7
8 9
10
11
1
2
34
5
6
1
2
34
5
6
7
8 9
10
11
11
22
3344
55
66
7
8
7
8 99
1010
1111
Maximal Independent Set (MIS)
Theorem: A maximal independent set of a graphG = (V,E) can be computed in O(sort(|V|+|E|)) I/Os.
Large Independent Set of a List
Corollary: An independent set of size at least N/3 for a list L of size N can be found in O(sort(N)) I/Os.
• Every vertex in an MIS I prevents two other vertices from being in I:
Every MIS has size at least N/3.
Graph Connectivity
• Connected components• Minimum spanning tree
ConnectivityA Semi-External Algorithm
ConnectivityA Semi-External Algorithm
Analysis:• Scan vertex set to load vertices into main
memory• Scan edge set to carry out algorithm• O(scan(|V| + |E|)) I/Os
Theorem: The connected components of a graph can be computed in O(scan(|V| + |E|)) I/Os, provided that |V| M.
ConnectivityThe General Case
Idea:• If |V| M
• Use semi-external algorithm• If |V| > M
• Identify simple connected subgraphs of G• Contract these subgraphs to obtain graph
G’ = (V’,E’) with |V’| c|V|, c < 1• Recursively compute connected
components of G’• Obtain labelling of connected components
of G from labelling of components of G’
A
BC
D
E
ConnectivityThe General Case
a
b
c
de
f
gh
i
j
k
lm
n
A
BC
D
E
1
12
2
2
1
1
1
1
1
12
2
22 2
2
2
2
ConnectivityThe General Case
Main steps:• Find smallest neighbors (easy)• Compute connected components of graph
H induced by selected edges• Contract each component into a single vertex
(easy)• Call the procedure recursively• Copy label of every vertex v G’ to all vertices
in G represented by v (easy)
ConnectivityThe General Case
• Every connected component of H has size at least 2 |V’| |V|/2 recursive calls
Theorem: The connected components of a graphG = (V,E) can be computed in I/Os. logEsortO M
V
MVlog
ConnectivityThe General Case
• Later: BFS in O(|V| + sort(|E|)) I/Os Can be used to identify connected components
• When |V| = |E|/B, algorithm takes O(sort(|E|)) I/Os
• Can stop recursion after recursive calls
Theorem: The connected components of a graphG = (V,E) can be computed in I/Os.
EBVlog
EBVlogEsortO
Minimum Spanning Tree (MST)
Observation: Connectivity algorithm can be augmented to produce a spanning tree of G.
a
b
c
de
f
gh
i
j
k
lm
n
A
BC
D
E
Minimum Spanning Tree (MST)
To obtain a minimum spanning tree:• Choose edge of minimum weight incident to v
• Some book-keeping:• The weight of an edge e in the compressed
graph = the min weight of all edges represented by e
• When “e is added” to T, add in fact this minimum edge
4
15
3
va
b c
d
Minimum Spanning Tree (MST)
a
b
c
de
f
gh
i
j
k
lm
n
A
BC
D
E
Theorem: A MST of a graph G = (V,E) can be computed in I/Os. logEsortO M
V
A Fast MST Algorithm
• Idea:• Assume MST can be computed in
O(|V| + sort(|E|)) I/Os• Again recursion can be stopped after
iterations
• Prim’s algorithm:
EBVlog
A Fast MST Algorithm
• Maintain superset of blue edges in priority queue Q
• When edge {v,w} of minimum weight is retrieved, test whether v,w are both in T• Yes discard edge• No Add edge to MST and add all edges
incident to w to Q, except {v,w}(assuming that w T)
Problem: How to testwhether v,w T.
A Fast MST Algorithm
• If v,w T, but {v,w} T, then both v and w have inserted edge {v,w} into Q
There are two copies of {v,w} in Q• They are consecutive Perform two DELETEMIN operations
• If {v,w} = {y,z}, discard both• Otherwise, add {v,w} to T and re-insert {y,z}
v
w
A Fast MST Algorithm
Analysis:• O(|V| + scan(|E|)) I/Os for retrieving adjacency
lists• O(sort(|E|)) I/Os for priority queue operations
Theorem: A MST of a graph G = (V,E) can be found in O(|V| + sort(|E|)) I/Os.
Corollary: A MST of a graph G = (V,E) can be found in I/Os.
EBVlogEsortO
Graph Contraction and Sparse Graphs
• A graph G = (V,E) is sparse if for any graph H obtainable from G through a series of edge contractions, |E(H)| = O(|V(H)|).
• For a sparse graph, the number of vertices and edges in G reduces by a constant factor in each iteration of the connectivity and MST algorithms.
Theorem: The connected components or a MST of a sparse graph with N vertices can be computed in O(sort(N)) I/Os.
Three Techniques for Graph Algorithms
• Time-forward processing:• Express graph problems as evaluation
problems of DAGs• Graph contraction:
• Reduce the size of G while maintaining the properties of interest
• Solve problem recursively on compressed graph
• Construct solution for G from solution for compressed graph
• Bootstrapping:• Switch to generally less efficient algorithm as
soon as (part of the) input is small enough
top related