graph database workshop
TRANSCRIPT
Graph Database Workshop
Jeremy Deane h3p://jeremydeane.net
(UberConf:Conference)-‐[:HOSTS]-‐>(session:Session) (developer:Person)-‐[:ATTENDS]-‐>(session) (session)-‐[:PROVIDES]-‐>(skill:Skill) (developer)-‐[:LEARNS]-‐>(skill)
Cover.cql
Agenda
Environment Setup
IntroducBon
Fundamentals
Architecture
Advanced Concepts
Generated with Graphgen -‐ h3p://bit.ly/1HkTP20
Environment Setup
① Download Neo4j (2.2.3) -‐ h3p://neo4j.com/download/
② Install to $NEO4J_HOME
③ Start Neo4j (%NEO4J_HOME%/bin\Neo4j start or %NEO4J_HOME%\bin\Neo4j.bat)
④ Launch Browser -‐ h3p://localhost:7474
⑤ Default UID/PW -‐ neo4j/neo4
Cypher Syntax HighlighBng:
Sublime 2 Package (Sublime 3 Manual Install)
Vim Bundle
intelliJ Plug-‐in
#Start Neo4j Bash function neorun() { cd $NEO4J_HOME/bin ./neo4j start cd $HOME }
#Start Neo4j Bash function neostop() { cd $NEO4J_HOME/bin ./neo4j stop cd $HOME }
Workshop Setup
① Clone or Download Github Repo -‐ h3ps://github.com/jtdeane/graph-‐workshop
② Unpack to $HOME/$WORKSHOP_HOME
③ Open $HOME/$WORKSHOP_HOME/Data Cheat Sheet
④ Bookmark or Open -‐ h3p://neo4j.com/docs/stable/cypher-‐refcard/
⑤ Bookmark or Open -‐ h3p://neo4j.com/docs/stable/
Suggested Naming ConvenBons Labels -‐ CamelCase RelaBonships -‐ SNAKE_CASE_UPPER_CASE ProperBes -‐ snake_case_lower_case Indexes -‐ snake_case_lower_case
Domain Model
PracBBoner
PaBent
WORKS_FOR OrganizaBon
LOCATION
TREATED_B
Y MAINTAINS
PracBBoner
PaBent
TREATED_B
Y
Explore Web Console
//Create Node CREATE (:Practitioner {name:"Zachary Smith", specialty:"General Medicine"})
//Retrieve Node MATCH (p:Practitioner) RETURN p
//Update Node MATCH (p) WHERE p.name="Zachary Smith" SET p.specialty="Neurosurgery"
//Retrieve Updated Node MATCH (p:Practitioner) WHERE p.specialty="Neurosurgery" RETURN p.name, p.specialty
//Retrieve Node by ID MATCH (p) WHERE ID(p)=0 RETURN p
//Delete Node By ID MATCH (p) WHERE ID(p)=0 DELETE p
//Merge Node MERGE (p:Practitioner {name:"Zachary Smith"}) ON CREATE SET n.created=timestamp() ON MATCH SET n.updated=timestamp()
Hello.cql
Explore Web Console //Create Node CREATE (:Patient {name:"Tim Smith", birth_date:"1965-06-27", conditions:["Diabetes", "Epilepsy"]})
//Create Relationship Long - Requires Patient Tim Smith and Practitioner Zachary Smith MATCH (p:Practitioner {name:"Zachary Smith"}) MATCH (m:Patient {name:"Tim Smith"}) CREATE (m)-[r:TREATED_BY]->(p) RETURN m, r, p
//Create Relationship Medium - Requires Practitioner Zachary Smith MATCH (p:Practitioner {name:"Zachary Smith"}) CREATE (m:Patient {name:"Holly Goodwin", birth_date:"1991-11-17"})-[r:TREATED_BY]->(p) RETURN m, r, p
//Create Nodes and Relationship Short CREATE (m:Patient {name:"Jackie Bonk", birth_date:"1978-12-15"})-[r:TREATED_BY]->(p:Practitioner {name:"Yuri Zhivago", specialty:"Immunology"}) RETURN m, r, p
//Clean out all Nodes and Relationships (careful!) MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r
Hello.cql
IntroducBon
Neuron from mouse cerebellum (160x) -‐ h3p://bit.ly/1Ja1VrJ
What are Graphs?
Graph Theory: a graph is a representaBon of a set of Objects where some pairs of objects are connected by Links
Seven Bridges of Königsberg h3p://bit.ly/1Lv7C66
Objects are Ver$ces (Nodes)
Links are Edges (RelaBonships)
Property Graph: Nodes & RelaBonships with key-‐value pairs (ProperBes)
Neo4j Property Graph: Nodes grouped by Labels
NoSQL Landscape
Sadalage/Fowler
h3p://amzn.to/1Lv8W8Z
Column Key-‐Value
Document
Graph
Graph – RelaBonal Database Comparison
RelaBonal Databases are great for storing transac'onal data in tabular tables
Graph Databases are great for storing semanBcally rich connected data in nodes and relaBonships
Depth& RDB&)me&(ms)& GDB&)me&(ms)& #&records&
2" 16" 10" ~2,500"
3" 30,267" 168" ~110,000"
4" 1,543,505" 1,359" ~600,000"
5" hang" 2,132" ~800,000"
From “Graph Databases” by Robinson, Webber and Eifrem, 2013, page 20
Degrees of separaBon between you and Kevin Bacon; RelaBonship Database falls over….
RelaBonal Databases require considerable up-‐front design (e.g. NormalizaBon) resulBng in a ridged schema
Graph Databases require no schema and support an emergent design approach
Graph Database Use Cases
Social (Professional) Network
Route Finding and LogisBcs
Network and System OperaBons
Security and Advanced AnalyBcs
h3p://bit.ly/1fYwEOO
Fundamentals
Custom Circuit Board Design -‐ h3p://bit.ly/1Ja4kTb
Domain Model
PracBBoner
PaBent
TREATED_B
Y
WORKS_FOR OrganizaBon
LOCATION
MAINTAINS
PracBBoner
PaBent
TREATED_B
Y
WORKS_FOR OrganizaBon
LOCATION
MAINTAINS
IniBal Data Load
① Execute Favorite “Clean database or nodes and relaBonships” OR execute:
② Import new Favorite “IniBal Data Load”
③ Execute “IniBal Data Load” OR
④ Open data.cql and copy contents
⑤ Paste and execute in Web Console
//Clean out all Nodes and Relationships (careful!) MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r
Clean.cql
Nodes
Smith
(Node) is a thing or noun
(Node) has :ProperBes
{name: “Zachary Smith” specialty:"General Medicine”}
(:Label) groups (Node)s :PracBBoner
//Retrieve a Node with Label Practitioner with a property equal to Zachary Smith MATCH (p:Practitioner) WHERE p.name="Zachary Smith" RETURN p
//Retrieve all Nodes with Label Patient and order by birth date MATCH (m:Patient) RETURN UPPER(m.name), m.birth_date ORDER BY m.birth_date
//Retrieve all Nodes with Label Patient and with diabetes MATCH (m:Patient) WHERE "Diabetes" IN m.conditions RETURN m
//Retrieve all Nodes with Label Patient and without diabetes MATCH (m:Patient) WHERE NOT("Diabetes" IN m.conditions) RETURN m
Fundamentals.cql
RelaBonships
(:RelaBonship) describes how (Node)s are related
PracBBoner
PaBent
TREATED_B
Y
(:RelaBonship) are direcBonal and cannot exist without both (Node)s
//Retrieve all Nodes with WORKS_AT Relationship MATCH (a)-[r:WORKS_AT]->(b) RETURN a,r,b
(:RelaBonship) are verbs and can have :ProperBes
{pcp: true}
//Retrieve all Nodes with TREATED_BY Relationship with PCP false MATCH (a)-[r:TREATED_BY {pcp:false}]->(b) RETURN a,r,b
//Retrieve all distinct list of Nodes that MAINTAIN a Node MATCH (a)-[:MAINTAINS]->(b) RETURN COUNT(DISTINCT a)
Fundamentals.cql
Modeling
Graphs read as natural language
Acts Upon {Verb} Object {Noun}
Subject {Noun}
Graphs are modeled with Circles, Boxes and Arrows
Graphs models translate to Ascii-‐Art
MATCH(Identifier:Label)-‐[Identifier:Relationship]-‐>(Identifier:Label)
Graph modeling is very expressive and white board friendly
Modeling Strategies – model using Domain Driven Design (DDD) or model by QuesBons (e.g. What do want to do?)
h3p://amzn.to/1GUkNKA
Modeling Guidelines
• Do not replicate all enBty details into Node ProperBes. Leverage a relaBonal or document database as System of Record or History.
• Create semanBcally rich relaBonships avoiding words verbs like HAS, CONTAINS, or IS.
• When possible qualify relaBonship with addiBonal informaBon (e.g. weight, origin, or date range) – “Strengthen vs. Atrophy”
• Avoid duplicate relaBonships – (a)-‐[:likes]-‐>(b)-‐[:likes]-‐>(a)
• Use Linked Lists to increase performance (e.g. head, previous)
• Leverage intermediate Node for n-‐ary relaBonships (e.g. Sorware, Version, Developer, OrganizaBon)
ApplicaBon Programming Interfaces
REST Web Service API
Java Plasorm Support
Other Popular Languages (C#, Ruby, Python, PHP)
Under the covers – Java OpBons: • Core API • Traversal Framework • Cypher Query Language (CQL)
Cypher TransacBonal HTTP Endpoint POST http://localhost:7474/db/data/transaction/commit
GET http://localhost:7474/db/data
HTTP InteracBons
① Install Postman Chrome Plug: h3p://bit.ly/1NooOJr (or similar)
② Set AuthorizaBon Header (HTTP Basic)
③ Issue GET http://localhost:7474/db/data and follow explore links
④ Explore links (e.g. GET http://localhost:7474/db/data/relationship/types)
⑤ Query via HTTP TransacBonal Endpoint:
POST http://localhost:7474/db/data/transaction/commit Accept: application/json Content-‐Type: application/json { "statements" : [ { "statement" : "MATCH (p:Practitioner) WHERE p.name={name} RETURN p", "parameters" : { "name" : "Zachary Smith" } } ] }
TesBng
OpBons:
• Manual tesBng via REST Clients
• Unit TesBng via Framework (e.g. JUnit)
• FuncBonal TesBng via Framework (e.g. RobotFramework or SoapUI)
① Requires -‐ h3ps://github.com/jtdeane/graph-‐workshop
② Navigate to $HOME/$WORKSHOP_HOME/testing
③ To execute tests enter mvn test
④ OpBonally update Java to output results to console
⑤ Re-‐execute tests enter mvn test
Architecture
h3p://bit.ly/1Ja2npT
Graph Database – Architecture
Language APIs
Caches
Files
HA Support Logging
Plug-‐ins and Extensions
Neo4j
Java RunBme Environment
Community & Enterprise EdiBon
Community is GPLv3
Enterprise EdiBon relaxes Consistency (ACID)
$NEO4J_HOME
Graph Database – Server Modes
Java RunBme Environment
Server Libraries
Embedded Neo4j
ApplicaBon
Embedded Web Server
Java RunBme Environment
Server Libraries
Neo4j Server
Extensions & Plug-‐ins
External ApplicaBon (Client)
Graph Database – Server Extension
① Requires -‐ h3ps://github.com/jtdeane/graph-‐workshop
② Navigate to $HOME/$WORKSHOP_HOME/extension
③ Build the extension JAR -‐-‐ graph-‐extension-‐1.0.0.jar
④ Copy the JAR from ../target to $NEO4J_HOME/plugins
⑤ Register the extension by updaBng $NEO4J_HOME/Conf
⑥ Restart Neo4j
⑦ Using REST Browser Client (e.g. Postman) query pracBBoners
org.neo4j.server.thirdparty_jaxrs_classes=ws.cogito.graphs=/extensions/
h3p://localhost:7474/extensions/directory/pracBBoners
Deployment Topologies
Single Community Server (Non-‐Produc$on Environments)
Non-‐Clustered Community Servers – Cold Standby
HA Clustered Enterprise Servers (Master-‐Slave)
Linux VM
<Java Runtime Environment>
Neo4j (Master)
Linux VM
<Java Runtime Environment>
Neo4j (Slave)
Linux VM
<Java Runtime Environment>
Neo4j (Slave)
Enterprise Edition High Availability
Read Consistent – Write Lock
Read Write Consistent
OperaBons & Security
• OperaBng System and Server Process Monitoring (e.g. Zabbix)
• Log Monitoring and AlerBng (e.g. Splunk or Logstash)
• Secure CommunicaBons via SSL
• Use HTTP Basic AuthenBcaBon for Console and REST API Access
• Web Console and REST API are on the same Port
• HTTP Basic requires HTTP/S
• Graph Governance is up to you!
Advanced Concepts
Bee PollinaBon -‐ h3p://bit.ly/1HkAa2c
Domain Model PracBBoner
PaBent
TREATED_B
Y
WORKS_FOR OrganizaBon
LOCATION
Caregiver
WORK
S_FO
R
Bulk Loads
Batch API (transacBonal) -‐ POST http://localhost:7474/db/data/batch
Batch Inserter (by-‐pass TransacBons) – Java Only
ImporBng Comma Separated Values (CSV)
//load caregiver nodes LOAD CSV WITH HEADERS FROM "file:///YOUR_LOCATION/CaregiverNodes.csv" AS csvLine CREATE (g:Caregiver {name: csvLine.name, guardian: csvLine.guardian}) RETURN *
//load caregiver patient relationships LOAD CSV WITH HEADERS FROM "file:///YOUR_LOCATION/PatientRelationships.csv" AS csvLine MATCH (giver:Caregiver { name:(csvLine.giver)}), (patient:Patient { name:(csvLine.patient)}) CREATE (giver)-[:CARES_FOR { type:(csvLine.type) }]->(patient) RETURN *
//load caregiver organization relationships LOAD CSV WITH HEADERS FROM "file:///YOUR_LOCATION/OrganizationRelationships.csv" AS csvLine MATCH (giver:Caregiver { name:(csvLine.giver)}), (org:Organization { name:(csvLine.organization)}) CREATE (giver)-[:WORKS_FOR { type:(csvLine.status) }]->(org) RETURN *
Advanced.cql
More Graph Queries
//Find patients who are also a practitioners MATCH (m:Patient), (p:Practitioner) WHERE m.name=p.name RETURN p
//All paths to Lovee Johnson MATCH paths = (m:Patient)-[*]-(node) WHERE m.name="Lovee Johnson" RETURN paths
//Shortest path from Lovee Jonhnson to Florence Nightingale MATCH (m:Patient {name:"Lovee Johnson"}), (g:Caregiver {name:"Florence Nightingale"}), path = shortestPath((m)-[*..10]-(g)) RETURN path
//Patients with more than one practitioner MATCH (patient:Patient)-[:TREATED_BY]->(practitioner) WITH patient, count (practitioner) AS practitioners WHERE practitioners > 1 RETURN patient
//All patients with a PCP having a name ending in ‘y’ ( REGEX) MATCH (m:Patient)-[TREATED_BY {pcp:true}]->(p:Practitioner) WHERE p.name=~ ".*y" RETURN m,p
Java 1.7 Regex -‐ h3p://bit.ly/1LEvt3j
//Return the patients with a family cargiver and their practitioners MATCH (g:Caregiver)-[CARES_FOR {type:"Family"}]->(m:Patient)-[TREATED_BY]->(p:Practitioner) RETURN m, p, g
Advanced.cql
Traversals
Depth-‐first search (DFS) – Default Neo4j Behavior
1
2
5 6
3 4
8 7 Breadth-‐first search (BFS)
1,2,5,6,3,4,7,8
1,2,3,4,5,6,7,8
Evaluators – e.g. Maximum Depth
Filters – e.g. Uniqueness
Path Expanders – e.g. DirecBon
• REST API – Executes arbitrary JavaScript code
• Java API – Require in-‐depth knowledge of your Graph
Indexes
AutomaBc Indexing -‐ $NEO4J_HOME/conf/neo4j.properties
# Enable auto-indexing for nodes, default is false. node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled. node_keys_indexable=name
# Enable auto-indexing for relationships, default is false. relationship_auto_indexing=true
# The relationship property keys to be auto-indexed, if enabled. relationship_keys_indexable=pcp,type
Cypher Index Commands
//create index on Patient Label CREATE INDEX ON :Patient(name)
//drop index on Patient Label DROP INDEX ON :Patient(name)
GET http://localhost:7474/db/data/schema/index/Patient
Advanced.cql
Constraints
Create and Drop Constraints
//create Unique Practitioner constraint CREATE CONSTRAINT ON (practitioner:Practitioner) ASSERT practitioner.name IS UNIQUE
//attempt to create duplicate Practitioner - should fail CREATE (McCoy:Practitioner {name:"Leonard McCoy", specialty:"General Medicine"})
//drop Unique Practitioner constraint DROP CONSTRAINT ON (practitioner:Practitioner) ASSERT practitioner.name IS UNIQUE
No way to retrieve list of Indexes or Constraints via Cypher (yet)
GET http://localhost:7474/db/data/schema/constraint/Practitioner
Advanced.cql
VisualizaBon
Neo4j Web Console http://localhost:7474/browser
Data Driven Documents (D3.js) http://d3js.org/
Alchemy.js http://bit.ly/1NwH7fB
Linkurious.js http://linkurio.us/toolkit/
VivaGaph.js https://github.com/anvaka/VivaGraphJS
Boston Hubway Graph -‐By Max De Marzi
ExecuBon from Scripts <script> or Node.JS
Require data transformaBon (e.g. Nodes and RelaBonship Arrays)
QuesBons & Feedback QuesBons & Feedback
My Contact informaEon:
Jeremy Deane Director of Sorware Architecture NaviNet [email protected] h3p://jeremydeane.net
h3ps://github.com/jtdeane/graph-‐workshop
Supplemental //Aggregate all providers MATCH (c:Caregiver) RETURN c.name AS names UNION MATCH (p:Practitioner) RETURN p.name AS names
Supplemental.cql
//Practitioners with patient counts MATCH (m:Patient) -[:TREATED_BY]-> (p:Practitioner) WITH p, COUNT(m) AS patients RETURN p.name, patients
//Patients with provider counts (Practitioner and/or Care Giver) MATCH (m:Patient) -[:TREATED_BY|:CARES_FOR]- (r) WITH DISTINCT (m), COUNT(r) AS providers RETURN m.name, providers
//All Patients with Caregiver (and without = null) MATCH (m:Patient) OPTIONAL MATCH (m) <-[:CARES_FOR]- (c:Caregiver) RETURN m.name, COALESCE(c.name,"INDEPENDENT")
//Profile simple query PROFILE MATCH (p:Practitioner) WHERE p.name="Zachary Smith" RETURN p
//Profile complex query PROFILE MATCH (m:Patient), (p:Practitioner) WHERE m.name=p.name RETURN p