ying zhou school of information technologies comp5338 – advanced data models week 6: graph data...
TRANSCRIPT
Ying ZhouSchool of Information Technologies
COMP5338 – Advanced Data Models
Week 6: Graph Data Model and Introduction to Neo4j
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-2
Outline
Brief Review of Graphs Examples of Graph Data Graph Engines and Graph Databases Introduction to Neo4j
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Graphs
A graph is just a collection of vertices and edges Vertex is also called Node Edge is also called Arc/Link
06-3
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Type of Graphs
Undirected graphs Edges have no orientation (direction) (a, b) is the same as (b, a)
Directed graphs Edges have orientation (direction) (a, b) is not the same as (b, a)
06-4
Representing Graph Data
Data structures used to store graphs in programs Adjacency list Adjacency matrix
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-5
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Adjacency List
06-6
Directed Graph
Undirected Graph
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Adjacency matrix – Directed Graph
06-7
Directed Graph
Undirected Graph
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-8
Outline
Brief Review of Graphs Examples of Graph Data Graph Engines and Graph Databases Introduction to Neo4j
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Examples of graphs
Social graphs Organization structure Facebook, LinkedIn, etc.
Computer Network topologies Data centre layout Network routing tables
Road, Rail and Airline networks Biological data
06-9
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Social Graphs and extension
06-10
Page 2 of the graph database book
A small social graph
Related information can be captured in the graph
Page 3 of the graph database book
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Social Graph with Various Relationships
06-11
Page 19 of the graph database book
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Transaction information
06-12
Page 23 of the graph database book
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-13
Outline
Brief Review of Graphs Examples of Graph Data Graph Engines and Graph Databases Introduction to Neo4j
Typical way to store connected data
Who are Bob’s friends?SELECT p1.PersonFROM Person p1 JOIN PersonFriend pf ON pf.FriendID = p1.ID
JOIN Person p2 ON pf.PersonID = p2.IDWHERE p2.Person = ”Bob”
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-14
Page 13 of the graph database book
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
RDBMS to store Graphs
Who are friends with Bob?
SELECT p1.PersonFROM Person p1 JOIN PersonFriend pf ON pf.PersonID = p1.ID
JOIN Person p2 ON pf.FriendID = p2.IDWHERE
p2.Person = ”Bob”
06-15
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
RDBMS to store Graphs
Who are Alice’s friends-of-friends?
SELECT p1.Person AS PERSON, p2.Person AS FRIEND_OF_FRIEND FROM PersonFriend pf1 JOIN Person p1 ON pf1.PersonID = p1.ID
JOIN PersonFriend pf2 ON pf2.PersonID = pf1.FriendIDJOIN Person p2 ON pf2.FriendID = p2.ID
WHERE p1.Person = ”Alice” AND pf2.FriendID <> p1.ID
06-16
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Graph Technologies
Graph Processing take data in any input format and perform graph related operations OLAP – OnLine Analysis Processing of graph data Google Pregel, Apache Giraph
Graph Databases manage, query, process graph data support high-level query language native storage of graph data OLTP – OnLine Transaction Processing possible OLAP – also possible
06-17
Data Models of Graph Databases
RDF (Resource Description Framework) Model Express node-edge relation as “subject, predicate, object” triple
(RDF statement) SPARQL query language Examples: AllegroGraph, Apache Jena
Property Graph Model Express node and edge as object like entities, both can have
properties Various query language Examples
Apache Titan • Support various NoSQL storage engine: BerkeleyDB, Cassandra, HBase• Structural query language: Gremlin
Neo4j• Native storage manager for graph data (Index-free Adjacency)• Declarative query language: Cypher query language
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-18
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Resource Description Framework
Proposed in 1999 by W3C to describe resources on the web as part of semantic web activity Designed to be read and understood by computers A major use case is to write ontologies and to support inference
RDF identifies resources using URI and describe resources with properties and property values
Descriptions are a collection of triples Subject – what does it describe (the resource) Predicate – what property of the subject Object – what is the value of the property
Example – description of a web page Subject – index.html Predicate – creation date Object – 3 September, 2013
index.html has a creation-date of 3 September, 2013
06-19
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
RDF/XML
An XML syntax for RDF<xml version=”1.0”?><rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:exterms=”http://www.example.org/terms”>
<rdf:Description rdf:about=”http://www.example.org/index.html”>
<exterms:creation-date>September 3, 2013</exterms:creation-date>
</rdf:Description></rdf:RDF>
Describes the following:
http://www.example.org/index.html has a creation-date of September 3, 2013
06-20
RDF/Triples
RDF based graph storage system storage and retrieval Triples using SQL-like declarative query language SPARQL
Four Triples Pi foaf:name “Pi” Pi foaf:age “14” Pi foaf:knows R.Parker R.Parker foaf:name “Richard Parker”
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-21
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <foaf:Person rdf:nodeID=“Pi">
<foaf:name>Pi</foaf:name> <foaf:age>14</foaf:age> <foaf:knows> <foaf:knows>
<foaf:Person rdf:nodeID=“R.Parker”> <foaf:name>Richard Parker</foaf:name>
</foaf:Person> </foaf:knows>
</foaf:Person> </rdf:RDF>
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Property Graph Model
Proposed by Neo technology No standard definition or specification Both Nodes and Edges can have property
RDF model cannot express edge property in a natural and easy to understand way
The actual storage varies The query language varies
06-22
since: 1 yearcommon: sailing
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-23
Outline
Brief Review of Graphs Examples of Graph Data Graph Engines and Graph Databases Introduction to Neo4j
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Neo4j
Native graph storage using property graph model Index-free Adjacency
Nodes and Relationships are stored
Supports property indexes Replication
Single master-multiple slaves replication
Neo4j cluster is limited to master/slave replication configuration Database engine is not distributed
Cypher – query language Also supports SPARQL
We will discuss the internals and indexing next week
06-24
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Property Graph Model
Property graph has the following characteristics It contains nodes and relationships Nodes contain properties
Properties are stored in the form of key-value pairs A node can have labels
Relationships connect nodes Has a direction, a type/name, start node, end node No dangling relationships (can’t delete node with a relationship)
Properties Both nodes and relationships have properties Useful in modeling and querying based on properties of relationships
06-25
http://docs.neo4j.org/chunked/milestone/graphdb-neo4j.html
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Property Graph ModelExample
06-26
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Property Graph Model: Nodes
06-27
http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-nodes.html
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Property Graph Model: Relationships
06-28
http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-relationships.html
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Property Graph Model: Properties
06-29
http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-properties.html
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Property Graph Model: Labels
06-30
A node may be labeled with any number of labels, including none, making labels an optional addition to the graph
http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-labels.html#_label_names
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Property Graph Model: Paths
A path is one or more nodes with connecting relationships, typically retrieved as a query or traversal result.
06-31
A path of length zero
A path of length one
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Cypher
Cypher is a query language specific to Neo4j Easy to read and understand It uses patterns to represents core concepts in the property
graph model It uses clauses to build queries; Certain clauses and
keywords are inspired by SQL
06-32
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Cypher patterns: node
A single node A node is described using a pair of parentheses, and is typically
given an identifier E.g.: (n) means a node n
Related nodes Related nodes are expressed using an arrow between two nodes E.g. (a)-->(b); (a)-->(b)<--(c); or (a)-->()<--(c)
Labels Label(s) can be attached to a node E.g.: (a:User)-->(b)
Specifying properties Properties are a list of name value pairs enclosed in a curly brackets E.g.: (a { name: "Andres", sport: "Brazilian Ju-Jitsu" })
06-33
http://neo4j.com/docs/stable/introduction-pattern.html
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Cypher patterns: relationships
Basic Relationships Directions are not important: (a)--(b) Named relationship: (a)-[r]->(b) Named and typed relationship: (a)-[r:REL_TYPE]->(b) Specifying Relationship that may belong to one of a set of types:
(a)-[r:TYPE1|TYPE2]->(b) Typed but not named relationship: (a)-[:REL_TYPE]->(b)
Relationship of variable lengths (a)-[*2]->(b) describes a path of length 2 between node a and node b (a)-[*3..5]->(b) describes a path of minimum length of 3 and
maximum length of 5 between node a and node b Either bound can be omitted (a)-[*3..]->(b), (a)-[*..5]->(b) Both bounds can be omitted as well (a)-[*]->(b)
06-34
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Cypher Clauses - Create
CREATE Create nodes or relationships with properties
Create a node matrix1 with the label MovieCREATE (matrix1:Movie {title:'The Matrix', released:1999,
tagline:'Welcome to the Real World'})
Create a node keanu with the label ActorCREATE (keanu:Actor {name:'Keanu Reeves', born:1964})
Create a relationship ACTS_INCREATE (keanu)-[:ACTS_IN {roles:'Neo'}]->(matrix1)
06-35
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Cypher – Read
MATCH … RETURN Return all nodes
MATCH (n) RETURN n
Return all nodes with a given label
MATCH (movie:Movie) RETURN movie
Return all actors in the movie “The Matrix”
MATCH (a:Actor)-[:ACTS_IN]->(m:Movie{title:"The Matrix"}) RETURN a.name
06-36
http://docs.neo4j.org/chunked/milestone/query-match.html
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Cypher - Update
MATCH … SET/REMOVE … RETURN
Set all the age property for all actor nodesMATCH (n:Actor)SET n.age = 2014 - n.bornRETURN n
Remove a propertyMATCH (n:Actor)REMOVE n.ageRETURN n
Remove a label
MATCH (n:Actor{name:"Keanu Reeves"})
REMOVE n:ActorRETURN n
06-37
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Cypher - Delete
MATCH … DELETE
Delete relationshipMATCH (n{name:"Keanu Reeves"})-[r:ACTS_IN]->()DELETE r
Delete a node and all possible relationshipMATCH (m{title:'The Matrix'})-[r]-()DELETE m,r
06-38
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
Upcoming Assessment
Quiz tonight in the lab! Group assignment instruction can be viewed in Elearning
and edlms The data set can be downloaded from ELearning Work in group of up to 3 students Both design and implementation parts Submit a report and demo your system
On week 10 You can form group now An Elearning link will be available for group sign up soon
06-39
COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)
References Ian Robinson, Jim Webber and Emil Eifrem, Graph Databases, Second Edition, O’Reilly
Media Inc., June 2015 You can download this book from the Neo4j site, http://www.neo4j.org/learn will redirect you to
http://graphdatabases.com/ The Neo4j Manual v2.2.5
The Neo4j Graph Database (http://neo4j.com/docs/stable/graphdb-neo4j.html) Cypher Query Language (http://neo4j.com/docs/stable/cypher-query-lang.html)
Noel Yuhanna, Market Overview: Graph Databases, Forrester White Paper, May, 2015 Renzo Angeles, A Comparison of Current Graph Data Models, ICDE Workshops 2013
(DOI-10.1109/ICDEW.2012.31) Renzo Angeles and Claudio Gutierrez, Survey of Graph Database Models, ACM
Computing Surveys, Vol. 40, N0. 1, Article 1, February 2008 (DOI-10.1145/1322432.1322433)
06-40