property graphs with time - amazon s3 · october 25, 2017 opencypher meetup system architecture 17...

Post on 23-May-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Property graphs with time

Julia Stoyanovich, joint work with Vera Moffitt

Drexel UniversityPhiladelphia, PA USA

stoyanovich.org

openCypher MeetupOctober 25, 2017

openCypher MeetupOctober 25, 2017 2

2008 20092007

20112010

openCypher MeetupOctober 25, 2017 3

https://www.kenedict.com/apples-internal-innovation-network-unraveled-part-1-evolving-networks/

openCypher MeetupOctober 25, 2017 4

https://arxiv.org/abs/1709.06176

openCypher MeetupOctober 25, 2017

Exploratory analysis of evolving graphs

• Which nodes are showing an increasing popularity trend?

• Have any changes in network connectivity been observed?

• At what time scale can interesting trends be observed?

• How can multiple data sources be used jointly to complement or corroborate information about network evolution?

5

openCypher MeetupOctober 25, 2017

Goal

6

Principled and systematics support for usable, scalable and extensible analysis of evolving graphs

openCypher MeetupOctober 25, 2017

Are Alice and Bill connected?

7

TNGP

… by a path?

openCypher MeetupOctober 25, 2017

Snapshot reducibility

8

openCypher MeetupOctober 25, 2017

Are Alice and Bill connected?

extended snapshot reducibility9

… by a journey?

… by a path that persists over >2 time instants

openCypher MeetupOctober 25, 2017

TGraph: an evolving property graph

10

openCypher MeetupOctober 25, 2017

TGA: Temporal Graph Algebra

• Temporal variants of standard graph operators + novel time-specific operators

• Compositional: TGraph (or a pair of TGraphs) as input - TGraph as output

• Operations maintain model integrity

- graph integrity at each time instant: no dangling edges, a node/edge appears at most once

- temporal integrity: semantics of temporal operations are automatically enforced (formally: point semantics)

11

openCypher MeetupOctober 25, 2017

TGA operations

• trim

• temporal versions of

- vertex-map, edge-map

- subgraph, path

- aggregate messages

- union, intersection, difference - binary

• snapshot analytics

- PageRank, connected components,… - Pregel

12

openCypher MeetupOctober 25, 2017

TGA operations

• node creation

• based on temporal window: temporal zoom

• attribute-based: structural zoom

• edge creation

13

openCypher MeetupOctober 25, 2017

Structural zoom

14

add university nodes Drexel and CMU, and edges between students and these universities

openCypher MeetupOctober 25, 2017

Structural zoom

15

openCypher MeetupOctober 25, 2017

Temporal zoom

16

coarsen taxi trip start-times into 10-min intervals

openCypher MeetupOctober 25, 2017

System architecture

17

Portal

InteractiveShell

QueryParser

SparkRuntime

GraphXDataStructures

WorkerSparkRuntime

HDFS

WorkerSparkRuntime

HDFS

SystemCatalog

SparkSQL

PortalRuntime(optimizer,operators,etc)

Spark 2.0, interoperable with SparkSQL and with BigDatalog

openCypher MeetupOctober 25, 2017

Physical data representation• On-disk: Apache Parquet

- vertex / edge files

- broken down into snapshot groups

- each file sorted on start time followed by node /edge id

• In-memory:

- nested relational (Vertex-Edge RDDs)

- GraphX-based: RepresentativeGraphs (RG), One Graph (OG), HybridGraph (HG)

18

1 2 3

BitSet(p1,p2,p3,p4) BitSet(p2,p3,p4,p5)

BitSet(p5)

BitSet(p1,p2,p3,p4,p5)

BitSet(p2,p3)

JULIA’S VERSION

openCypher MeetupOctober 25, 2017

Performance highlights

• 16-node Open Stack cluster

• Apache Spark 2.0

• 4 cores, 16GB / RAM per node

19

openCypher MeetupOctober 25, 2017

PageRank on wiki-talk

20

openCypher MeetupOctober 25, 2017

PageRank on nGrams

21

openCypher MeetupOctober 25, 2017

PageRank on Twitter

22

openCypher MeetupOctober 25, 2017

Aggregate messages on wiki-talk

23

openCypher MeetupOctober 25, 2017

Vertex-subgraph on wiki-talk

24

openCypher MeetupOctober 25, 2017

Portal vs. G*

25

average node degree, wiki-talk

openCypher MeetupOctober 25, 2017

Take-aways

• TGraph: a logical model of property graphs with time

• TGA: a compositional temporal graph algebra under point semantics

• Portal: a library on top of Apache Spark, inter-operable with SparkSQL

• Ongoing work on a declarative language, multi-operator query optimization, benchmarking

• Planned open source release this Fall

26

openCypher MeetupOctober 25, 2017

References

• Temporal Graph Algebra, Moffitt & Stoyanovich, DBPL 2017.

• Zooming in on NYC taxi data with Portal, Stoyanovich, Gilbride and Moffitt, DSSG 2017 (arXiv).

• Towards sequenced semantics for evolving graphs, Moffitt & Stoyanovich, EDBT 2017.

• Towards a distributed infrastructure for evolving graph analytics, Moffitt & Stoyanovich, TempWeb 2016.

• Vera Moffitt’s Ph.D. thesis.

27

openCypher MeetupOctober 25, 2017

Thank you!

top related