preprocessingcomputepost proc. xml raw data etl slicecompute repeat subgraph pagerank initial graph...
TRANSCRIPT
Preprocessing Compute Post Proc.
< / >< / >< / >
XML
RawData
ETL SliceComput
e
Repeat
Subgraph PageRankInitial Graph
Analyze
TopUsers
GraphX
HDFSHDFS
ComputeSpark Preprocess Spark Post.
Raw Wikipedia
< / >< / >< / >XML
Hyperlinks PageRank Top 20 Pages
GraphLab + Spark
Giraph + Spark
0 200 400 600 800 1000 1200 1400 1600
342
1492
Total Runtime (in Seconds)
605
375
Id
3
7
5
2
SrcId DstId
3 7
5 3
2 5
5 7
Property (E)
Collaborator
Advisor
Colleague
PI
Property (V)
(rxin, student)
(jgonzal, postdoc)
(franklin, professor)
(istoica, professor)
3
7
5
2
Property Graph Vertex Table
Edge Table
rxinstu.
franklin, prof.
istoicaprof.
jgonzal, pst.doc.
Colla
b. PI
Advisor
Colle
agu
e
Data-Parallel Graph-Parallel
Property Graph
Pregel
Table
Result
Row
Row
Row
Row
Raw Wikipedia
< / >< / >< / >XML
Hyperlinks PageRank Top 20 Pages
Title PRTextTable
Title BodyTopic Model
(LDA) Word Topics
Word
Topic
Editor Graph
CommunityDetection
User Community
UserCom
.
Term-DocGraph
DiscussionTable
User Disc.
CommunityTopic
TopicCom
.
Part. 2
Part. 1
Vertex Table (RDD)
B C
A D
F E
A D
Property GraphEdge Table
(RDD)
A B
A C
C D
B C
A E
A F
E F
E D
B
C
D
E
A
F
RoutingTable (RDD)
B
C
D
E
A
F
1
2
1 2
1 2
1
2
2D Vertex Cut Heuristic
Vertex CutEdge Cut