data structures and algorithms in parallel computing lecture 3

26
Data Structures and Algorithms in Parallel Computing Lecture 3

Upload: ashlyn-baldwin

Post on 19-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Structures and Algorithms in Parallel Computing Lecture 3

Data Structures and Algorithms in Parallel Computing

Lecture 3

Page 2: Data Structures and Algorithms in Parallel Computing Lecture 3

Graphs

• G=(V,E) where V is the set of vertices and E the set of edges

• Weighted graph – edges have weights w(e)• Sparse graph – m=|E| << n=|V|• Diameter of a graph D(G) the maximum, over

all pairs of vertices (u, v), of the minimum number of edges that must be traversed to get from u to v

Page 3: Data Structures and Algorithms in Parallel Computing Lecture 3

Examples

• Twitter graph– Vertices are users– Mentions or retweets are edges– Fast changing topology

• 5,700 tweets/second• Highest peak: 143,199 tweets/second (August 3, 2013)

Page 4: Data Structures and Algorithms in Parallel Computing Lecture 3

Examples

• Romanian road network– Vertices are cities– Edges are main national roads– Slow changing topology

Page 5: Data Structures and Algorithms in Parallel Computing Lecture 3

Big Data• Large volume

• > 2 B internet users1 • > 1 B active Facebook users1

• > 2.5 M daily active Twitter users2

• High velocity• >7.5 K Tweets/second1

• >1.5 K Skype calls/second1

• >2000k emails/second1

1http://www.internetlivestats.com2http://www.statista.com/

Kat Wiki

Pam

JohnHan

@Pam …

@Wiki …

RT @Pam…

@Wiki …

Twitter Interactions

Page 6: Data Structures and Algorithms in Parallel Computing Lecture 3

Applications

• Community detection– Detect communities in graphs• Users which interact a lot

• Graph dynamics– Spread of viral infections– Viral marketing

• Shortest drive path given real traffic data• …

Page 7: Data Structures and Algorithms in Parallel Computing Lecture 3

Limitations

• Memory– Too large to fit in memory– Requires large shared memory architectures or

distributed systems• Compute– Size and speed of change puts a lower limit on the

processing speed

Page 8: Data Structures and Algorithms in Parallel Computing Lecture 3

Parallel graph processing

• Sequential techniques are often hard to parallelize

• Easy– Minimum spanning tree– Biconnected components

• Hard– Single source shortest path

• Efficiency depends also on the type of graph– Sparse, dense, power law

Page 9: Data Structures and Algorithms in Parallel Computing Lecture 3

Graph representations– Edge lists

• List of edges• Each edge is a vertex pair

– Adjacency lists• Array of lists G[i]• Each element corresponds to a vertex and contains the set

of vertex neighbors• Convert to edge list:

– Adjacency matrices• Binary matrix where Aij=1 iff (i,j) in E• Used only for sparse graphs as it requires O(n2) space

Page 10: Data Structures and Algorithms in Parallel Computing Lecture 3

Graph representations

• For parallel algorithms– Replace linked lists with arrays– Makes it easer to process each array in parallel

Page 11: Data Structures and Algorithms in Parallel Computing Lecture 3

Some examples

• Breadth first search• Connected components• Minimum spanning tree

Page 12: Data Structures and Algorithms in Parallel Computing Lecture 3

Breadth first search

• Traverse a tree or graph from the root by visiting neigbours first, before moving to the next level

• Keep a queue of not visited vertices

Page 13: Data Structures and Algorithms in Parallel Computing Lecture 3

Parallel BFS• Graph is given as adjacency array• Visit vertices in each level in parallel• No need for a queue• Keep a set of frontier vertices generated at each step by collecting

the neighbors of current frontier set• Initial frontier set is the single source vertex (root for trees)• Due to parallelism same new frontier vertex may be collected by

more vertices– Requires an extra step to clean duplicates– New frontier vertex linked randomly with only one parent

• Work is O(m+n)• Concurrent write model ensures a total depth of O(D)

Page 14: Data Structures and Algorithms in Parallel Computing Lecture 3

Example

Page 15: Data Structures and Algorithms in Parallel Computing Lecture 3

Algorithm

Page 16: Data Structures and Algorithms in Parallel Computing Lecture 3

Connected components

• Label 2 vertices with same label iff there is a path between the two

• Sequentially it can be achieved by depth first or breadth first search

Page 17: Data Structures and Algorithms in Parallel Computing Lecture 3

Parallel CC

• Use parallel BFS?– Inneficient due to its O(D) depth– Good only for small diameter graphs

• Solution– Use graph contraction – Start with set of vertices• Contract neighborhood to a single vertex recursively• O(log n) steps for each component to contract

Page 18: Data Structures and Algorithms in Parallel Computing Lecture 3

Random mate graph contraction

Page 19: Data Structures and Algorithms in Parallel Computing Lecture 3

Algorithm

Page 20: Data Structures and Algorithms in Parallel Computing Lecture 3

Deterministic graph contraction

• Create disjoint trees by assigning decremental labels to each child and only linking a vertex if its label is smaller than that of the last child

Page 21: Data Structures and Algorithms in Parallel Computing Lecture 3

Algorithm

• No relabeling required

Page 22: Data Structures and Algorithms in Parallel Computing Lecture 3

Worse case scenario• Star graph with root in the middle

Page 23: Data Structures and Algorithms in Parallel Computing Lecture 3

Spanning trees

• CC can be extended to finding – the spanning of a graph tree – minimum spanning tree of a weighted graph

Page 24: Data Structures and Algorithms in Parallel Computing Lecture 3

Minimum spanning tree

• All weights are distinct– There exists a unique minimum spanning tree

• Based on random mate CC technique– Instead of randomly picking a parent, pick edge

with minimum weight and hook to vertex if it is a parent

– Keep track of the edges used for hooking

Page 25: Data Structures and Algorithms in Parallel Computing Lecture 3

Example

Page 26: Data Structures and Algorithms in Parallel Computing Lecture 3

What’s next?

• BSP model– SSSP, connected components, pagerank

• Vertex centric vs. subgraph centric• Load balancing– Importance of partitioning and graph type

• ...