11:40 adopt: suvee - fluxgraph

57
FluxGraph: A time-machine for your graphs Davy Suvee Michel Van Speybroeck Janssen Pharmaceutica

Upload: graphconnect

Post on 05-Dec-2014

532 views

Category:

Documents


2 download

DESCRIPTION

A retrospective cohort study is a medical research study in which the patient records of a group of similar individuals are compared for a particular outcome. For instance, a study can try to assess the impact of smoking behavior with respect to getting lung cancer in a group of 40-year old construction workers who also have been exposed to asbestos. As retrospective case studies are historical in nature, researchers require accurate representations of patient records over time in order to correctly assess the importance of particular time-dependent patient characteristics. During this presentation, we will show how a state-of-the-art Graph Database such as Neo4J can be extended with a set of temporal primitives that effectively aid researchers at gathering the required insights from a set of longitudinal medical records. Graph Databases are the ideal platform to model and store the multi-dimensional data points of the individual patient records and the cohorts to which they are belonging. By introducing a temporal notion within Graph Databases, physicians are given the power to query beyond time boundaries and get historical access to individual patient characteristics or combinations thereof. Patterns for individual patients can be compared and evaluated against the patterns for the cohort. In order to validate our proposed approach, we have implemented FluxGraph, a proof-of-concept Temporal Graph Database. Being Blueprints-compatible, it should be straightforward to integrate the proposed API changes within mature Graph Database products such as Neo4J. The explicit notion of time, combined with the flexible modelling offered by Graph Databases, provides users with an expressive and powerful data store and analysis platform which is difficult or even impossible to implement with traditional relational database technologies. Davy Suvée (IT Lead - Software Architect at Johnson & Johnson) Davy Suvee is currently working as an IT Lead/Software Architect in the Research and Development IT division of Janssen Pharmaceutica (Johnson & Johnson). Required to work with big and unstructured scientific data sets, Davy gathered hands-on expertise and insights in the best practices on Big Data and NOSQL. He is also the founder of Datablend and frequently blogs about the practical application of various NOSQL technologies.

TRANSCRIPT

Page 1: 11:40 Adopt: Suvee - Fluxgraph

FluxGraph: A time-machine for your graphs

Davy SuveeMichel Van Speybroeck

Janssen Pharmaceutica

Page 2: 11:40 Adopt: Suvee - Fluxgraph

about me

➡ working as an it lead / software architect @ janssen pharmaceutica• dealing with big scientific data sets

• hands-on expertise in big data and NoSQL technologies

who am i ...

Davy Suvee@DSUVEE

➡ founder of datablend• provide big data and NoSQL consultancy

• share practical knowledge and big data use cases via blog

Page 3: 11:40 Adopt: Suvee - Fluxgraph
Page 4: 11:40 Adopt: Suvee - Fluxgraph

graphs and time ...

➡ graphs are continuously changing ...

Page 5: 11:40 Adopt: Suvee - Fluxgraph

graphs and time ...

➡ graphs are continuously changing ...

➡ graphs and time ... ★ neo-versioning by david montag 1

★ representing time dependent graphs in neo4j by the isi foundation 2

★ modeling a multilevel index in neo4j by peter neubauer 3

1. http://github.com/dmontag/neo4j-versioning 2. http://github.com/ccattuto/neo4j-dynagraph/wiki 3. http://blog.neo4j.org/2012/02/modeling-multilevel-index-in-neoj4.html

Page 6: 11:40 Adopt: Suvee - Fluxgraph

graphs and time ...

➡ graphs are continuously changing ...

➡ graphs and time ... ★ neo-versioning by david montag 1

★ representing time dependent graphs in neo4j by the isi foundation 2

★ modeling a multilevel index in neo4j by peter neubauer 3

copy and relink semantics

1. http://github.com/dmontag/neo4j-versioning 2. http://github.com/ccattuto/neo4j-dynagraph/wiki 3. http://blog.neo4j.org/2012/02/modeling-multilevel-index-in-neoj4.html

๏ graph size

๏ object identity

๏ mixing data-model and time-model

Page 7: 11:40 Adopt: Suvee - Fluxgraph

FluxGraph ...

➡ towards a time-aware graph ...

Page 8: 11:40 Adopt: Suvee - Fluxgraph

FluxGraph ...

➡ implement a blueprints-compatible graph on top of Datomic

➡ towards a time-aware graph ...

Page 9: 11:40 Adopt: Suvee - Fluxgraph

FluxGraph ...

➡ implement a blueprints-compatible graph on top of Datomic

➡ make FluxGraph fully time-aware ★ travel your graph through time★ time-scoped iteration of vertices and edges★ temporal graph comparison

➡ towards a time-aware graph ...

Page 10: 11:40 Adopt: Suvee - Fluxgraph

travel through time

FluxGraph fg = new FluxGraph();

Page 11: 11:40 Adopt: Suvee - Fluxgraph

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Davy

Page 12: 11:40 Adopt: Suvee - Fluxgraph

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Davy

Peter

Vertex peter = ...

Page 13: 11:40 Adopt: Suvee - Fluxgraph

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Michael

Davy

Peter

Vertex peter = ...Vertex michael = ...

Page 14: 11:40 Adopt: Suvee - Fluxgraph

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Michael

Davy

Peter

Vertex peter = ...Vertex michael = ...

Edge e1 = fg.addEdge(davy, peter,“knows”);

knows

Page 15: 11:40 Adopt: Suvee - Fluxgraph

travel through time

Date checkpoint = new Date();

Michael

Davy

Peter

knows

Page 16: 11:40 Adopt: Suvee - Fluxgraph

travel through time

Date checkpoint = new Date();

davy.setProperty(“name”,”David”);

Michael

Peter

knows

David

Page 17: 11:40 Adopt: Suvee - Fluxgraph

travel through time

Date checkpoint = new Date();

davy.setProperty(“name”,”David”);

Michael

Peter

Edge e2 = fg.addEdge(davy, michael,“knows”);

knows

David

knows

Page 18: 11:40 Adopt: Suvee - Fluxgraph

travel through time

Davy

Peter

Michael

knows

time

Page 19: 11:40 Adopt: Suvee - Fluxgraph

travel through time

Davy

Peter

Michael

knows

checkpoint

time

Page 20: 11:40 Adopt: Suvee - Fluxgraph

travel through time

Michael

Davy

Peter

DavidDavy

Peter

knows

knows

Michael

knows

checkpoint

time

Page 21: 11:40 Adopt: Suvee - Fluxgraph

travel through time

Michael

Davy

Peter

DavidDavy

Peter

knows

knows

Michael

knows

checkpoint

currenttime

Page 22: 11:40 Adopt: Suvee - Fluxgraph

travel through time

Michael

Davy

Peter

DavidDavy

Peter

knows

knows

Michael

knows

checkpoint

currenttime

by default

Page 23: 11:40 Adopt: Suvee - Fluxgraph

travel through time

Michael

Davy

Peter

DavidDavy

Peter

knows

knows

Michael

knows

checkpoint

currenttime

fg.setCheckpointTime(checkpoint);

Page 24: 11:40 Adopt: Suvee - Fluxgraph

time-scoped iteration

t1

Davy

Page 25: 11:40 Adopt: Suvee - Fluxgraph

t2

time-scoped iteration

change

Davy’

t1

Davy

Page 26: 11:40 Adopt: Suvee - Fluxgraph

t3t2

time-scoped iteration

change change

Davy’ Davy’’

t1

Davy

Page 27: 11:40 Adopt: Suvee - Fluxgraph

tcurrrentt3t2

time-scoped iteration

change change change

Davy’’’Davy’ Davy’’

t1

Davy

Page 28: 11:40 Adopt: Suvee - Fluxgraph

tcurrrentt3t2

time-scoped iteration

change change change

Davy’’’Davy’ Davy’’

t1

Davy

➡ how to find the version of the vertex you are interested in?

Page 29: 11:40 Adopt: Suvee - Fluxgraph

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Page 30: 11:40 Adopt: Suvee - Fluxgraph

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Page 31: 11:40 Adopt: Suvee - Fluxgraph

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();

Page 32: 11:40 Adopt: Suvee - Fluxgraph

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();

Page 33: 11:40 Adopt: Suvee - Fluxgraph

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();

Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);

Page 34: 11:40 Adopt: Suvee - Fluxgraph

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();

Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);Interval valid = davy.getTimerInterval();

Page 35: 11:40 Adopt: Suvee - Fluxgraph

time-scoped iteration

➡ When does an element change?

Page 36: 11:40 Adopt: Suvee - Fluxgraph

time-scoped iteration

➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed

➡ When does an element change?

Page 37: 11:40 Adopt: Suvee - Fluxgraph

time-scoped iteration

➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed

➡ When does an element change?

➡ edge:★ setting or removing a property ★ being removed

Page 38: 11:40 Adopt: Suvee - Fluxgraph

time-scoped iteration

➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed

➡ When does an element change?

➡ edge:★ setting or removing a property ★ being removed

➡ ... and each element is time-scoped!

Page 39: 11:40 Adopt: Suvee - Fluxgraph

MichaelMichael

Davy

Peter

David Davy

Peter

temporal graph comparison

knows

knows

knows

current checkpoint

what changed?

Page 40: 11:40 Adopt: Suvee - Fluxgraph

temporal graph comparison

➡ difference (A , B) = union (A , B) - B

➡ ... as a (immutable) graph!

Page 41: 11:40 Adopt: Suvee - Fluxgraph

temporal graph comparison

➡ difference (A , B) = union (A , B) - B

➡ ... as a (immutable) graph!

Page 42: 11:40 Adopt: Suvee - Fluxgraph

temporal graph comparison

➡ difference (A , B) = union (A , B) - B

➡ ... as a (immutable) graph!

difference ( , ) =

David

knows

Page 43: 11:40 Adopt: Suvee - Fluxgraph

FluxGraph ...

http://github.com/datablend/fluxgraph

➡ available on github

Page 44: 11:40 Adopt: Suvee - Fluxgraph

t3t2t1

use case: longitudinal patient data

patient patient

smoking

patient

smoking

t4

patient

cancer

t5

patient

cancer

death

Page 45: 11:40 Adopt: Suvee - Fluxgraph

use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)

Page 46: 11:40 Adopt: Suvee - Fluxgraph

use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)

➡ example analysis: ★ if a male patient is no longer smoking in 2005★ what are the chances of getting lung cancer in 2010, comparing

patients that smoked before 2005

patients that never smoked

Page 47: 11:40 Adopt: Suvee - Fluxgraph

use case: longitudinal patient data

➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Page 48: 11:40 Adopt: Suvee - Fluxgraph

use case: longitudinal patient data

➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()

Page 49: 11:40 Adopt: Suvee - Fluxgraph

use case: longitudinal patient data

➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()

while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();}

Page 50: 11:40 Adopt: Suvee - Fluxgraph

use case: longitudinal patient data

boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; }

}).iterator().hasNext();

➡ which patients were smoking before 2005?

Page 51: 11:40 Adopt: Suvee - Fluxgraph

use case: longitudinal patient data

Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());

➡ which patients have cancer in 2010

working set of smokers

Page 52: 11:40 Adopt: Suvee - Fluxgraph

use case: longitudinal patient data

Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());

➡ which patients have cancer in 2010

working set of smokers

➡ extract the patients that have an edge to the cancer node

Page 53: 11:40 Adopt: Suvee - Fluxgraph
Page 54: 11:40 Adopt: Suvee - Fluxgraph

gephi plugin for fluxgraph2010

Page 55: 11:40 Adopt: Suvee - Fluxgraph

gephi plugin for fluxgraph2001

Page 56: 11:40 Adopt: Suvee - Fluxgraph

gephi plugin for blueprints!

http://github.com/datablend/gephi-blueprints-plugin

➡ available on github

➡ Support for neo4j, orientdb, dex, rexter, ...

1. Kudos to Timmy Storms (@timmystorms)

1

Page 57: 11:40 Adopt: Suvee - Fluxgraph

Questions?