cloud computingmhhammou/15319-s12/lectures/... · 2012-05-14 · cloud computing cs 15-319 dryad...
TRANSCRIPT
![Page 1: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/1.jpg)
Cloud ComputingCS 15-319
Dryad and GraphLabLecture 11, Feb 22, 2012
Majd F. Sakr, Suhail Rehman and Mohammad Hammoud
![Page 2: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/2.jpg)
Today…
Last session Pregel
Today’s session Dryad and GraphLab
Announcement: Project Phases I-A and I-B are due today
2© Carnegie Mellon University in Qatar
![Page 3: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/3.jpg)
Objectives
Discussion on Programming Models
Why parallelism?
Parallel computer architectures
Traditional models of parallel programming
Examples of parallel processing
Message Passing Interface (MPI)
Pregel, Dryad and GraphLab
Last 3 Sessions
MapReduce
Pregel, Dryad and GraphLab
© Carnegie Mellon University in Qatar 3
![Page 4: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/4.jpg)
4
Dryad
© Carnegie Mellon University in Qatar
![Page 5: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/5.jpg)
Dryad In this part, the following concepts of Dryad will
be described:
Dryad Model Dryad Organization Dryad Description Language and An Example Program Fault Tolerance in Dryad
5© Carnegie Mellon University in Qatar
![Page 6: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/6.jpg)
Dryad In this part, the following concepts of Dryad will
be described:
Dryad Model Dryad Organization Dryad Description Language and An Example Program Fault Tolerance in Dryad
6© Carnegie Mellon University in Qatar
![Page 7: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/7.jpg)
Dryad Dryad is a general purpose, high-performance, distributed
computation engine
Dryad is designed for: High-throughput Data-parallel computation Use in a private datacenter
Computation is expressed as a directed-acyclic-graph (DAG) Vertices represent programs Edges represent data channels between vertices
7© Carnegie Mellon University in Qatar
![Page 8: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/8.jpg)
Unix Pipes vs. Dryad DAG
8© Carnegie Mellon University in Qatar
![Page 9: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/9.jpg)
Dryad Job Structure
9
grep
sed
sortawk
perlgrep
grepsed
sort
sortawk
Inputfiles
Vertices (processes)
Outputfiles
ChannelsStage
grep1000 | sed500 | sort1000 | awk500 | perl50
© Carnegie Mellon University in Qatar
![Page 10: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/10.jpg)
Dryad In this part, the following concepts of Dryad will
be described:
Dryad Model Dryad Organization Dryad Description Language and An Example Program Fault Tolerance in Dryad
10© Carnegie Mellon University in Qatar
![Page 11: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/11.jpg)
Dryad System Organization
11
There are 3 roles for machines in Dryad Job Manager (JM) Name Server (NS) Daemon (D)
© Carnegie Mellon University in Qatar
![Page 12: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/12.jpg)
Program Execution (1) The Job Manager (JM):
Creates the job communication graph (job schedule) Contacts the NS to determine the
number of Ds and the topology Assigns Vs to each D (using a
simple task scheduler-not described) for execution
Coordinates data flow through the data plane
Data is distributed using a distributed storage system that shares with the Google File System some properties (e.g., data are split into chunks and replicated across machines)
Dryad also supports the use of NTFS for accessing files locally
12© Carnegie Mellon University in Qatar
![Page 13: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/13.jpg)
JM code
vertex code
Program Execution (2)1. Build
2. Send .exe
3. Start JM
5. Generate graph
7. Serializevertices
8. MonitorVertex execution
4. Querycluster resources
Cluster services6. Initialize vertices
© Carnegie Mellon University in Qatar 13
![Page 14: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/14.jpg)
Data Channels in Dryad Data items can be shuffled between vertices through
data channels
Data channels can be: Shared Memory FIFOs (intra-machine)
TCP Streams (inter-machine)
SMB/NTFS Local Files (temporary)
Distributed File System (persistent)
The performance and fault tolerance of these mechanisms vary
Data channels are abstracted for maximum flexibility
14
X
M
Items
© Carnegie Mellon University in Qatar
![Page 15: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/15.jpg)
Dryad In this part, the following concepts of Dryad will
be described:
Dryad Model Dryad Organization Dryad Description Language and An Example Program Fault Tolerance in Dryad
15© Carnegie Mellon University in Qatar
![Page 16: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/16.jpg)
Dryad Graph Description Language
16
A An
AS = A^n
(Cloning)
A An
AS >= BS
(Pointwise Composition)
B Bn
A An
AS >> BS
(Bipartite Composition)
B Bn
B
C D
(B>=C) || (B>=D)
(Merge)
Here are some operators in the Dryad graph description language:
© Carnegie Mellon University in Qatar
![Page 17: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/17.jpg)
Example Program in Dryad (1) Skyserver SQL Query (Q18):
Find all the objects in the database that have neighboring objects within 30 arc seconds such that at least one of the neighbors has a color similar to the primary object’s color
There are two tables involved photoObjAll and it has 354,254,163
records Neighbors and it has 2,803,165,372
records
For the equivalent Dryad computation, they extracted the columns of interest into two binary files, “ugriz.bin” and “neighbors.bin”
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
L L
© Carnegie Mellon University in Qatar 17
![Page 18: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/18.jpg)
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
L L
Example Program in Dryad (2)
Took SQL plan Manually coded in Dryad Manually partitioned data
u: objid, colorn: objid, neighborobjid[partition by objid]
selectu.color,n.neighborobjid
from u join nwhere
u.objid = n.objid
(u.color,n.neighborobjid)[re-partition by n.neighborobjid][order by n.neighborobjid]
[distinct][merge outputs]
selectu.objid
from u join <temp>where
u.objid = <temp>.neighborobjid and|u.color - <temp>.color| < d
© Carnegie Mellon University in Qatar 18
![Page 19: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/19.jpg)
Example Program in Dryad (3)
19
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
L L
GraphBuilder XSet = moduleX^N;GraphBuilder DSet = moduleD^N;GraphBuilder MSet = moduleM^(N*4);GraphBuilder SSet = moduleS^(N*4);GraphBuilder YSet = moduleY^N;GraphBuilder HSet = moduleH^1;
GraphBuilder XInputs = (ugriz1 >= XSet) || (neighbor >= XSet);
GraphBuilder YInputs = ugriz2 >= YSet;GraphBuilder XToY = XSet >= DSet >> MSet >= SSet;
for (i = 0; i < N*4; ++i){XToY = XToY || (SSet.GetVertex(i) >= YSet.GetVertex(i/4));}
GraphBuilder YToH = YSet >= HSet;GraphBuilder HOutputs = HSet >= output;
GraphBuilder final = XInputs || YInputs || XToY || YToH || HOutputs;
Here is the corresponding Dryad code:
© Carnegie Mellon University in Qatar
![Page 20: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/20.jpg)
Dryad In this part, the following concepts of Dryad will
be described:
Dryad Model Dryad Organization Dryad Description Language and An Example Program Fault Tolerance in Dryad
20© Carnegie Mellon University in Qatar
![Page 21: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/21.jpg)
Fault Tolerance in Dryad (1) Dryad is designed to handle two types of failures:
Vertex failures Channel failures
Vertex failures are handled by the JM and the failed vertex is re-executed on another machine
Channel failures cause the preceding vertex to be re-executed
21© Carnegie Mellon University in Qatar
![Page 22: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/22.jpg)
X[0] X[1] X[3] X[2] X’[2]
Completed vertices Slow vertex
Duplicatevertex
Fault Tolerance in Dryad (2)
Duplication Policy = f(running times, data volumes)
© Carnegie Mellon University in Qatar 22
![Page 23: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/23.jpg)
23
GraphLab
© Carnegie Mellon University in Qatar
![Page 24: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/24.jpg)
GraphLab In this part, the following concepts of GraphLab will
be described:
Motivation for GraphLab GraphLab Data Model and Update Mechanisms Scheduling in GraphLab Consistency Models in GraphLab PageRank in GraphLab
24© Carnegie Mellon University in Qatar
![Page 25: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/25.jpg)
GraphLab In this part, the following concepts of GraphLab will
be described:
Motivation for GraphLab GraphLab Data Model and Update Mechanisms Scheduling in GraphLab Consistency Models in GraphLab PageRank in GraphLab
25© Carnegie Mellon University in Qatar
![Page 26: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/26.jpg)
Motivation for GraphLab Shortcomings of MapReduce Interdependent data computation difficult to perform Overheads of running jobs iteratively – disk access and
startup overhead Communication pattern is not user definable/flexible
Shortcomings of Pregel BSP model requires synchronous computation One slow machine can slow down the entire computation considerably
Shortcomings of Dryad Very flexible but steep learning curve for the programming model
26© Carnegie Mellon University in Qatar
![Page 27: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/27.jpg)
GraphLab GraphLab is a framework for parallel machine learning
27
Data Graph
Shared Data Table
Scheduling
Update Functions and Scopes
© Carnegie Mellon University in Qatar
![Page 28: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/28.jpg)
GraphLab In this part, the following concepts of GraphLab will
be described:
Motivation for GraphLab GraphLab Data Model and Update Mechanisms Scheduling in GraphLab Consistency Models in GraphLab PageRank in GraphLab
28© Carnegie Mellon University in Qatar
![Page 29: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/29.jpg)
Data Graph A graph in GraphLab is associated with data at every vertex and edge
29
Data Graph
Arbitrary blocks of data can be assigned to vertices and edges
© Carnegie Mellon University in Qatar
![Page 30: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/30.jpg)
Update Functions The data graph is modified using update functions
The update function can modify a vertex v and its neighborhood, defined as thescope of v (Sv)
30
v
Sv
© Carnegie Mellon University in Qatar
![Page 31: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/31.jpg)
Shared Data Table
Certain algorithms require global information that is shared among all vertices (Algorithm Parameters, Statistics, etc.) GraphLab exposes a Shared Data Table (SDT)
SDT is an associative map between keys and arbitrary blocks of data T[Key] → Value
The shared data table is updated using the sync mechanism
31
Shared Data Table
© Carnegie Mellon University in Qatar
![Page 32: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/32.jpg)
Sync Mechanism Similar to Reduce in MapReduce
User can define fold, merge and apply functions that are triggered during the global sync mechanism
Fold function allows the user to sequentially aggregate information across all vertices
Merge optionally allows user to perform a parallel tree reduction on the aggregated data collected during the fold operation
Apply function allows the user to finalize the resulting value from the fold/merge operations (such as normalization etc.)
32
syncShared Data Table
© Carnegie Mellon University in Qatar
![Page 33: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/33.jpg)
GraphLab In this part, the following concepts of GraphLab will
be described:
Motivation for GraphLab GraphLab Data Model and Update Mechanisms Scheduling in GraphLab Consistency Models in GraphLab PageRank in GraphLab
33© Carnegie Mellon University in Qatar
![Page 34: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/34.jpg)
Scheduling in GraphLab (1)
CPU 1
CPU 2
The scheduler determines the order that vertices are updated
ee ff gg
kkjjiihh
ddccbbaa bb
iihh
aa
ii
bb ee ff
jj
cc
Sche
dule
r
The process repeats until the scheduler is empty
© Carnegie Mellon University in Qatar 34
![Page 35: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/35.jpg)
Scheduling in GraphLab (2)
An update schedule defines the order in which update functions are applied to vertices A parallel data-structure called the scheduler represents an abstract list of tasks
to be executed in Graphlab
Base (Vertex) schedulers in GraphLab Synchronous scheduler Round-robin scheduler
Job Schedulers in GraphLab FIFO scheduler Priority scheduler
Custom schedulers can be defined by the set scheduler Termination Assessment
If the scheduler has no remaining tasks Or, a termination function can be defined to check for convergence in the data
35© Carnegie Mellon University in Qatar
![Page 36: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/36.jpg)
GraphLab In this part, the following concepts of GraphLab will
be described:
Motivation for GraphLab GraphLab Data Model and Update Mechanisms Scheduling in GraphLab Consistency Models in GraphLab PageRank in GraphLab
36© Carnegie Mellon University in Qatar
![Page 37: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/37.jpg)
Need for Consistency Models How much can computation overlap?
© Carnegie Mellon University in Qatar 37
![Page 38: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/38.jpg)
Consistency Models in GraphLab
GraphLab guarantees sequential consistency Guaranteed to give the same result as a sequential execution of the
computational steps
User-defined consistency models Full Consistency Vertex Consistency Edge Consistency
38© Carnegie Mellon University in Qatar
![Page 39: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/39.jpg)
GraphLab In this part, the following concepts of GraphLab will
be described:
Motivation for GraphLab GraphLab Data Model and Update Mechanisms Scheduling in GraphLab Consistency Models in GraphLab PageRank in GraphLab
39© Carnegie Mellon University in Qatar
![Page 40: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/40.jpg)
PageRank (1) PageRank is a link analysis algorithm
The rank value indicates an importance of a particular web page
A hyperlink to a page counts as a vote of support
A page that is linked to by many pages with high PageRank receives a high rank itself
A PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank
© Carnegie Mellon University in Qatar 40
![Page 41: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/41.jpg)
PageRank (2) Iterate:
Where: α is the random reset probability L[j] is the number of links on page j
1 32
4 65
© Carnegie Mellon University in Qatar 41
![Page 42: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/42.jpg)
pagerank(i, scope){// Get Neighborhood data(R[i], Wij, R[j]) scope;
// Update the vertex data
// Reschedule Neighbors if neededif R[i] changes then reschedule_neighbors_of(i);
}
;][)1(][][
∈
×−+←iNj
ji jRWiR αα
PageRank Example in GraphLab PageRank algorithm is defined as a per-vertex operation working on the scope
of the vertex
Dynamic computation
© Carnegie Mellon University in Qatar 42
![Page 43: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/43.jpg)
43
How MapReduce, Pregel, Dryad and GraphLab
Compare Against Each Other?
© Carnegie Mellon University in Qatar
![Page 44: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/44.jpg)
Comparison of the Programming Models
MapReduce Pregel Dryad GraphLab
ProgrammingModel
Fixed Functions –Map and Reduce
Supersteps over a data graph with messages passed
DAG with program vertices and data edges
Data graph with shared data table and update functions
44
MapReduce Pregel Dryad GraphLab
ProgrammingModel
Fixed Functions –Map and Reduce
Supersteps over a data graph with messages passed
DAG with program vertices and data edges
Data graph with shared data table and update functions
Parallelism Concurrent execution of tasks within map and reduce phases
Concurrent executionof user functions over vertices within a superstep
Concurrent execution of vertices during a stage
Concurrent execution of non-overlapping scopes, defined by consistency model
MapReduce Pregel Dryad GraphLab
ProgrammingModel
Fixed Functions –Map and Reduce
Supersteps over a data graph with messages passed
DAG with program vertices and data edges
Data graph with shared data table and update functions
Parallelism Concurrent execution of tasks within map and reduce phases
Concurrent executionof user functions over vertices within a superstep
Concurrent execution of vertices during a stage
Concurrent execution of non-overlapping scopes, defined by consistency model
Data Handling Distributed file system
Distributed file system
Flexible data channels: Memory, Files, DFS etc.
Undefined – Graphs can be in memory or on disk
MapReduce Pregel Dryad GraphLab
ProgrammingModel
Fixed Functions –Map and Reduce
Supersteps over a data graph with messages passed
DAG with program vertices and data edges
Data graph with shared data table and update functions
Parallelism Concurrent execution of tasks within map and reduce phases
Concurrent executionof user functions over vertices within a superstep
Concurrent execution of vertices during a stage
Concurrent execution of non-overlapping scopes, defined by consistency model
Data Handling Distributed file system
Distributed file system
Flexible data channels: Memory, Files, DFS etc.
Undefined – Graphs can be in memory or on disk
Task Scheduling Fixed Phases –HDFS Locality based map task assignment
Partitioned Graph and Inputs assigned by assignment functions
Job and Stage Managers assign vertices to avaiableddaemons
Pluggable schedulers to schedule update functions
MapReduce Pregel Dryad GraphLab
ProgrammingModel
Fixed Functions –Map and Reduce
Supersteps over a data graph with messages passed
DAG with program vertices and data edges
Data graph with shared data table and update functions
Parallelism Concurrent execution of tasks within map and reduce phases
Concurrent executionof user functions over vertices within a superstep
Concurrent execution of vertices during a stage
Concurrent execution of non-overlapping scopes, defined by consistency model
Data Handling Distributed file system
Distributed file system
Flexible data channels: Memory, Files, DFS etc.
Undefined – Graphs can be in memory or on disk
Task Scheduling Fixed Phases –HDFS Locality based map task assignment
Partitioned Graph and Inputs assigned by assignment functions
Job and Stage Managers assign vertices to avaiableddaemons
Pluggable schedulers to schedule update functions
Fault Tolerance DFS replication + Task reassignment / Speculative execution of Tasks
Checkpointing and superstep re-execution
Vertex and Edge failure recovery
Synchronous and asychronoussnapshots
MapReduce Pregel Dryad GraphLab
ProgrammingModel
Fixed Functions –Map and Reduce
Supersteps over a data graph with messages passed
DAG with program vertices and data edges
Data graph with shared data table and update functions
Parallelism Concurrent execution of tasks within map and reduce phases
Concurrent executionof user functions over vertices within a superstep
Concurrent execution of vertices during a stage
Concurrent execution of non-overlapping scopes, defined by consistency model
Data Handling Distributed file system
Distributed file system
Flexible data channels: Memory, Files, DFS etc.
Undefined – Graphs can be in memory or on disk
Task Scheduling Fixed Phases –HDFS Locality based map task assignment
Partitioned Graph and Inputs assigned by assignment functions
Job and Stage Managers assign vertices to availabledaemons
Pluggable schedulers to schedule update functions
Fault Tolerance DFS replication + Task reassignment / Speculative execution of Tasks
Checkpointing and superstep re-execution
Vertex and Edge failure recovery
Synchronous and asychronoussnapshots
Developed by Google Google Microsoft Carnegie Mellon
© Carnegie Mellon University in Qatar
![Page 45: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/45.jpg)
References This presentation has elements borrowed from various papers and
presentations:
Papers: Pregel: http://kowshik.github.com/JPregel/pregel_paper.pdf Dryad: http://research.microsoft.com/pubs/63785/eurosys07.pdf GraphLab: http://www.select.cs.cmu.edu/publications/paperdir/uai2010-low-gonzalez-kyrola-
bickson-guestrin-hellerstein.pdf
Presentations: Dryad Presentation at Berkeley by M. Budiu:
http://budiu.info/work/dryad-talk-berkeley09.pptx GraphLab1 Presentation: http://graphlab.org/uai2010_graphlab.pptx GraphLab2 Presentation: http://graphlab.org/presentations/nips-biglearn-2011.pptx
45© Carnegie Mellon University in Qatar
![Page 46: Cloud Computingmhhammou/15319-s12/lectures/... · 2012-05-14 · Cloud Computing CS 15-319 Dryad and GraphLab Lecture 11, Feb 22, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud](https://reader034.vdocuments.us/reader034/viewer/2022042909/5f3b323d5f18fc103b2ef5cb/html5/thumbnails/46.jpg)
Next Class
Distributed File Systems
© Carnegie Mellon University in Qatar 46