11:40 adopt: suvee - fluxgraph
DESCRIPTION
A retrospective cohort study is a medical research study in which the patient records of a group of similar individuals are compared for a particular outcome. For instance, a study can try to assess the impact of smoking behavior with respect to getting lung cancer in a group of 40-year old construction workers who also have been exposed to asbestos. As retrospective case studies are historical in nature, researchers require accurate representations of patient records over time in order to correctly assess the importance of particular time-dependent patient characteristics. During this presentation, we will show how a state-of-the-art Graph Database such as Neo4J can be extended with a set of temporal primitives that effectively aid researchers at gathering the required insights from a set of longitudinal medical records. Graph Databases are the ideal platform to model and store the multi-dimensional data points of the individual patient records and the cohorts to which they are belonging. By introducing a temporal notion within Graph Databases, physicians are given the power to query beyond time boundaries and get historical access to individual patient characteristics or combinations thereof. Patterns for individual patients can be compared and evaluated against the patterns for the cohort. In order to validate our proposed approach, we have implemented FluxGraph, a proof-of-concept Temporal Graph Database. Being Blueprints-compatible, it should be straightforward to integrate the proposed API changes within mature Graph Database products such as Neo4J. The explicit notion of time, combined with the flexible modelling offered by Graph Databases, provides users with an expressive and powerful data store and analysis platform which is difficult or even impossible to implement with traditional relational database technologies. Davy Suvée (IT Lead - Software Architect at Johnson & Johnson) Davy Suvee is currently working as an IT Lead/Software Architect in the Research and Development IT division of Janssen Pharmaceutica (Johnson & Johnson). Required to work with big and unstructured scientific data sets, Davy gathered hands-on expertise and insights in the best practices on Big Data and NOSQL. He is also the founder of Datablend and frequently blogs about the practical application of various NOSQL technologies.TRANSCRIPT
FluxGraph: A time-machine for your graphs
Davy SuveeMichel Van Speybroeck
Janssen Pharmaceutica
about me
➡ working as an it lead / software architect @ janssen pharmaceutica• dealing with big scientific data sets
• hands-on expertise in big data and NoSQL technologies
who am i ...
Davy Suvee@DSUVEE
➡ founder of datablend• provide big data and NoSQL consultancy
• share practical knowledge and big data use cases via blog
graphs and time ...
➡ graphs are continuously changing ...
graphs and time ...
➡ graphs are continuously changing ...
➡ graphs and time ... ★ neo-versioning by david montag 1
★ representing time dependent graphs in neo4j by the isi foundation 2
★ modeling a multilevel index in neo4j by peter neubauer 3
1. http://github.com/dmontag/neo4j-versioning 2. http://github.com/ccattuto/neo4j-dynagraph/wiki 3. http://blog.neo4j.org/2012/02/modeling-multilevel-index-in-neoj4.html
graphs and time ...
➡ graphs are continuously changing ...
➡ graphs and time ... ★ neo-versioning by david montag 1
★ representing time dependent graphs in neo4j by the isi foundation 2
★ modeling a multilevel index in neo4j by peter neubauer 3
copy and relink semantics
1. http://github.com/dmontag/neo4j-versioning 2. http://github.com/ccattuto/neo4j-dynagraph/wiki 3. http://blog.neo4j.org/2012/02/modeling-multilevel-index-in-neoj4.html
๏ graph size
๏ object identity
๏ mixing data-model and time-model
FluxGraph ...
➡ towards a time-aware graph ...
FluxGraph ...
➡ implement a blueprints-compatible graph on top of Datomic
➡ towards a time-aware graph ...
FluxGraph ...
➡ implement a blueprints-compatible graph on top of Datomic
➡ make FluxGraph fully time-aware ★ travel your graph through time★ time-scoped iteration of vertices and edges★ temporal graph comparison
➡ towards a time-aware graph ...
travel through time
FluxGraph fg = new FluxGraph();
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Davy
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Davy
Peter
Vertex peter = ...
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Michael
Davy
Peter
Vertex peter = ...Vertex michael = ...
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Michael
Davy
Peter
Vertex peter = ...Vertex michael = ...
Edge e1 = fg.addEdge(davy, peter,“knows”);
knows
travel through time
Date checkpoint = new Date();
Michael
Davy
Peter
knows
travel through time
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Michael
Peter
knows
David
travel through time
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Michael
Peter
Edge e2 = fg.addEdge(davy, michael,“knows”);
knows
David
knows
travel through time
Davy
Peter
Michael
knows
time
travel through time
Davy
Peter
Michael
knows
checkpoint
time
travel through time
Michael
Davy
Peter
DavidDavy
Peter
knows
knows
Michael
knows
checkpoint
time
travel through time
Michael
Davy
Peter
DavidDavy
Peter
knows
knows
Michael
knows
checkpoint
currenttime
travel through time
Michael
Davy
Peter
DavidDavy
Peter
knows
knows
Michael
knows
checkpoint
currenttime
by default
travel through time
Michael
Davy
Peter
DavidDavy
Peter
knows
knows
Michael
knows
checkpoint
currenttime
fg.setCheckpointTime(checkpoint);
time-scoped iteration
t1
Davy
t2
time-scoped iteration
change
Davy’
t1
Davy
t3t2
time-scoped iteration
change change
Davy’ Davy’’
t1
Davy
tcurrrentt3t2
time-scoped iteration
change change change
Davy’’’Davy’ Davy’’
t1
Davy
tcurrrentt3t2
time-scoped iteration
change change change
Davy’’’Davy’ Davy’’
t1
Davy
➡ how to find the version of the vertex you are interested in?
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);Interval valid = davy.getTimerInterval();
time-scoped iteration
➡ When does an element change?
time-scoped iteration
➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed
➡ When does an element change?
time-scoped iteration
➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed
➡ When does an element change?
➡ edge:★ setting or removing a property ★ being removed
time-scoped iteration
➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed
➡ When does an element change?
➡ edge:★ setting or removing a property ★ being removed
➡ ... and each element is time-scoped!
MichaelMichael
Davy
Peter
David Davy
Peter
temporal graph comparison
knows
knows
knows
current checkpoint
what changed?
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!
difference ( , ) =
David
knows
FluxGraph ...
http://github.com/datablend/fluxgraph
➡ available on github
t3t2t1
use case: longitudinal patient data
patient patient
smoking
patient
smoking
t4
patient
cancer
t5
patient
cancer
death
use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
➡ example analysis: ★ if a male patient is no longer smoking in 2005★ what are the chances of getting lung cancer in 2010, comparing
patients that smoked before 2005
patients that never smoked
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();}
use case: longitudinal patient data
boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; }
}).iterator().hasNext();
➡ which patients were smoking before 2005?
use case: longitudinal patient data
Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
➡ which patients have cancer in 2010
working set of smokers
use case: longitudinal patient data
Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
➡ which patients have cancer in 2010
working set of smokers
➡ extract the patients that have an edge to the cancer node
gephi plugin for fluxgraph2010
gephi plugin for fluxgraph2001
gephi plugin for blueprints!
http://github.com/datablend/gephi-blueprints-plugin
➡ available on github
➡ Support for neo4j, orientdb, dex, rexter, ...
1. Kudos to Timmy Storms (@timmystorms)
1
Questions?