lighthouse: large-scale graph pattern matching on giraph
TRANSCRIPT
![Page 1: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/1.jpg)
LighthouseLarge-scale graph pattern matching on Giraph
![Page 2: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/2.jpg)
2
![Page 3: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/3.jpg)
Timeline• Inspired by Google Pregel (2010)
• Donated to ASF by Yahoo! in 2011
• Top-level project in 2012
• 1.0 release in January 2013
• 1.1 release in November 2014
• Used at Facebook, LinkedIn, Yahoo!
3
![Page 4: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/4.jpg)
Vertex-centric API
5
?
?
?
2
3
Iteration i+1Iteration i
4
![Page 5: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/5.jpg)
PU 1
PU 2
PU 3
PU 4
PU 5
Iteration i Iteration i+1
BSP/Pregel implementation
5
![Page 6: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/6.jpg)
Architecture
Netty Netty Netty Netty
...
Hadoop File System (HDFS)
Zookeeper
Master Coordinator
Worker 1 Worker 2 Worker N Master
Compute threads
Vertices
Message Inbox
Message Outbox
6
![Page 7: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/7.jpg)
Lighthouse
![Page 8: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/8.jpg)
Giraph execution algebra
Binding Table. Matching and potential graph patterns are stored in a table that is distributed across the messages sent around by vertices. !• Scan: starts traversals from certain vertices. • Select: prunes traversals based on expressions. • Project: adds data to the binding table. • Hash Join: joins paths generated from different traversals • Step Join: performs a further hop in the traversal. • Move: continues a traversal from different vertices.
8
![Page 9: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/9.jpg)
5
?
?
?
2
3
Iteration i+1Iteration i
V1 John … VN
… … … …
V4 Paul … VJ
V7 Mark … VL
Distributed Binding Table
9
![Page 10: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/10.jpg)
MATCH (person:Person {firstName:"Antonio"}) -[:WORK_AT]-> (company), (company) -[:IS_LOCATED_IN]-> (country)
WHERE person.browser = "Chrome" RETURN person.id, person.lastName, company.id, country.id
10
![Page 11: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/11.jpg)
MATCH (person:Person) -[:WORK_AT]-> (company) RETURN person.id, person.birthDate, company.id
11
![Page 12: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/12.jpg)
Scan
Project12
![Page 13: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/13.jpg)
StepJoin
13
![Page 14: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/14.jpg)
Cypher path-queriesDesired functionality: • weighted shortest paths • multiple source and destinations • top N shortest paths for each pair • provide both paths and their costs • restrict search to subset of graph
Restrictions: • Monotonic cost function • Path-independent local vertex/edge restrictions
14
![Page 15: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/15.jpg)
ProposalMATCH p = (a:Start) -[e* | not(endNode(e)).danger ]-> (b:Finish)
CHEAPEST 3 SUM e.distance * e.maxSpeed AS length RETURN a, b, path, length
Features: • Selector applied before WHERE condition (optional) • Number of paths for each pair (e.g. 3) (optional) • User-defined cost function (required) • AS keyword to bind distance to variable (optional)
15
![Page 16: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/16.jpg)
Giraph implementation
Two phases: !• First phase: we compute the routes of each top K
shortest paths. Each vertex discovers and registers the precedent vertex in the shortest paths (similar to Pregel BFS).
• Second phase: starting from “leaves”, we traverse back the structure building the paths.
16
![Page 17: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/17.jpg)
Preliminary results
17
![Page 18: Lighthouse: Large-scale graph pattern matching on Giraph](https://reader031.vdocuments.us/reader031/viewer/2022021500/58ee94561a28abd4488b45a7/html5/thumbnails/18.jpg)
Thanks.