preprocessingcomputepost proc. xml raw data etl slicecompute repeat subgraph pagerank initial graph...

Post on 17-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Preprocessing Compute Post Proc.

< / >< / >< / >

XML

RawData

ETL SliceComput

e

Repeat

Subgraph PageRankInitial Graph

Analyze

TopUsers

GraphX

HDFSHDFS

ComputeSpark Preprocess Spark Post.

Raw Wikipedia

< / >< / >< / >XML

Hyperlinks PageRank Top 20 Pages

GraphLab + Spark

Giraph + Spark

0 200 400 600 800 1000 1200 1400 1600

342

1492

Total Runtime (in Seconds)

605

375

Id

3

7

5

2

SrcId DstId

3 7

5 3

2 5

5 7

Property (E)

Collaborator

Advisor

Colleague

PI

Property (V)

(rxin, student)

(jgonzal, postdoc)

(franklin, professor)

(istoica, professor)

3

7

5

2

Property Graph Vertex Table

Edge Table

rxinstu.

franklin, prof.

istoicaprof.

jgonzal, pst.doc.

Colla

b. PI

Advisor

Colle

agu

e

Data-Parallel Graph-Parallel

Property Graph

Pregel

Table

Result

Row

Row

Row

Row

Raw Wikipedia

< / >< / >< / >XML

Hyperlinks PageRank Top 20 Pages

Title PRTextTable

Title BodyTopic Model

(LDA) Word Topics

Word

Topic

Editor Graph

CommunityDetection

User Community

UserCom

.

Term-DocGraph

DiscussionTable

User Disc.

CommunityTopic

TopicCom

.

Part. 2

Part. 1

Vertex Table (RDD)

B C

A D

F E

A D

Property GraphEdge Table

(RDD)

A B

A C

C D

B C

A E

A F

E F

E D

B

C

D

E

A

F

RoutingTable (RDD)

B

C

D

E

A

F

1

2

1 2

1 2

1

2

2D Vertex Cut Heuristic

Vertex CutEdge Cut

top related