ying zhou school of information technologies comp5338 – advanced data models week 6: graph data...

40
Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

Upload: annis-preston

Post on 12-Jan-2016

234 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

Ying ZhouSchool of Information Technologies

COMP5338 – Advanced Data Models

Week 6: Graph Data Model and Introduction to Neo4j

Page 2: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-2

Outline

Brief Review of Graphs Examples of Graph Data Graph Engines and Graph Databases Introduction to Neo4j

Page 3: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Graphs

A graph is just a collection of vertices and edges Vertex is also called Node Edge is also called Arc/Link

06-3

Page 4: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Type of Graphs

Undirected graphs Edges have no orientation (direction) (a, b) is the same as (b, a)

Directed graphs Edges have orientation (direction) (a, b) is not the same as (b, a)

06-4

Page 5: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

Representing Graph Data

Data structures used to store graphs in programs Adjacency list Adjacency matrix

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-5

Page 6: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Adjacency List

06-6

Directed Graph

Undirected Graph

Page 7: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Adjacency matrix – Directed Graph

06-7

Directed Graph

Undirected Graph

Page 8: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-8

Outline

Brief Review of Graphs Examples of Graph Data Graph Engines and Graph Databases Introduction to Neo4j

Page 9: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Examples of graphs

Social graphs Organization structure Facebook, LinkedIn, etc.

Computer Network topologies Data centre layout Network routing tables

Road, Rail and Airline networks Biological data

06-9

Page 10: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Social Graphs and extension

06-10

Page 2 of the graph database book

A small social graph

Related information can be captured in the graph

Page 3 of the graph database book

Page 11: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Social Graph with Various Relationships

06-11

Page 19 of the graph database book

Page 12: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Transaction information

06-12

Page 23 of the graph database book

Page 13: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-13

Outline

Brief Review of Graphs Examples of Graph Data Graph Engines and Graph Databases Introduction to Neo4j

Page 14: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

Typical way to store connected data

Who are Bob’s friends?SELECT p1.PersonFROM Person p1 JOIN PersonFriend pf ON pf.FriendID = p1.ID

JOIN Person p2 ON pf.PersonID = p2.IDWHERE p2.Person = ”Bob”

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-14

Page 13 of the graph database book

Page 15: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

RDBMS to store Graphs

Who are friends with Bob?

SELECT p1.PersonFROM Person p1 JOIN PersonFriend pf ON pf.PersonID = p1.ID

JOIN Person p2 ON pf.FriendID = p2.IDWHERE

p2.Person = ”Bob”

06-15

Page 16: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

RDBMS to store Graphs

Who are Alice’s friends-of-friends?

SELECT p1.Person AS PERSON, p2.Person AS FRIEND_OF_FRIEND FROM PersonFriend pf1 JOIN Person p1 ON pf1.PersonID = p1.ID

JOIN PersonFriend pf2 ON pf2.PersonID = pf1.FriendIDJOIN Person p2 ON pf2.FriendID = p2.ID

WHERE p1.Person = ”Alice” AND pf2.FriendID <> p1.ID

06-16

Page 17: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Graph Technologies

Graph Processing take data in any input format and perform graph related operations OLAP – OnLine Analysis Processing of graph data Google Pregel, Apache Giraph

Graph Databases manage, query, process graph data support high-level query language native storage of graph data OLTP – OnLine Transaction Processing possible OLAP – also possible

06-17

Page 18: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

Data Models of Graph Databases

RDF (Resource Description Framework) Model Express node-edge relation as “subject, predicate, object” triple

(RDF statement) SPARQL query language Examples: AllegroGraph, Apache Jena

Property Graph Model Express node and edge as object like entities, both can have

properties Various query language Examples

Apache Titan • Support various NoSQL storage engine: BerkeleyDB, Cassandra, HBase• Structural query language: Gremlin

Neo4j• Native storage manager for graph data (Index-free Adjacency)• Declarative query language: Cypher query language

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-18

Page 19: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Resource Description Framework

Proposed in 1999 by W3C to describe resources on the web as part of semantic web activity Designed to be read and understood by computers A major use case is to write ontologies and to support inference

RDF identifies resources using URI and describe resources with properties and property values

Descriptions are a collection of triples Subject – what does it describe (the resource) Predicate – what property of the subject Object – what is the value of the property

Example – description of a web page Subject – index.html Predicate – creation date Object – 3 September, 2013

index.html has a creation-date of 3 September, 2013

06-19

Page 20: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

RDF/XML

An XML syntax for RDF<xml version=”1.0”?><rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:exterms=”http://www.example.org/terms”>

<rdf:Description rdf:about=”http://www.example.org/index.html”>

<exterms:creation-date>September 3, 2013</exterms:creation-date>

</rdf:Description></rdf:RDF>

Describes the following:

http://www.example.org/index.html has a creation-date of September 3, 2013

06-20

Page 21: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

RDF/Triples

RDF based graph storage system storage and retrieval Triples using SQL-like declarative query language SPARQL

Four Triples Pi foaf:name “Pi” Pi foaf:age “14” Pi foaf:knows R.Parker R.Parker foaf:name “Richard Parker”

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-21

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <foaf:Person rdf:nodeID=“Pi">

<foaf:name>Pi</foaf:name> <foaf:age>14</foaf:age> <foaf:knows> <foaf:knows>

<foaf:Person rdf:nodeID=“R.Parker”> <foaf:name>Richard Parker</foaf:name>

</foaf:Person> </foaf:knows>

</foaf:Person> </rdf:RDF>

Page 22: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Property Graph Model

Proposed by Neo technology No standard definition or specification Both Nodes and Edges can have property

RDF model cannot express edge property in a natural and easy to understand way

The actual storage varies The query language varies

06-22

since: 1 yearcommon: sailing

Page 23: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou) 06-23

Outline

Brief Review of Graphs Examples of Graph Data Graph Engines and Graph Databases Introduction to Neo4j

Page 24: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Neo4j

Native graph storage using property graph model Index-free Adjacency

Nodes and Relationships are stored

Supports property indexes Replication

Single master-multiple slaves replication

Neo4j cluster is limited to master/slave replication configuration Database engine is not distributed

Cypher – query language Also supports SPARQL

We will discuss the internals and indexing next week

06-24

Page 25: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Property Graph Model

Property graph has the following characteristics It contains nodes and relationships Nodes contain properties

Properties are stored in the form of key-value pairs A node can have labels

Relationships connect nodes Has a direction, a type/name, start node, end node No dangling relationships (can’t delete node with a relationship)

Properties Both nodes and relationships have properties Useful in modeling and querying based on properties of relationships

06-25

http://docs.neo4j.org/chunked/milestone/graphdb-neo4j.html

Page 26: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Property Graph ModelExample

06-26

Page 27: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Property Graph Model: Nodes

06-27

http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-nodes.html

Page 28: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Property Graph Model: Relationships

06-28

http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-relationships.html

Page 29: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Property Graph Model: Properties

06-29

http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-properties.html

Page 30: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Property Graph Model: Labels

06-30

A node may be labeled with any number of labels, including none, making labels an optional addition to the graph

http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-labels.html#_label_names

Page 31: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Property Graph Model: Paths

A path is one or more nodes with connecting relationships, typically retrieved as a query or traversal result.

06-31

A path of length zero

A path of length one

Page 32: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Cypher

Cypher is a query language specific to Neo4j Easy to read and understand It uses patterns to represents core concepts in the property

graph model It uses clauses to build queries; Certain clauses and

keywords are inspired by SQL

06-32

Page 33: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Cypher patterns: node

A single node  A node is described using a pair of parentheses, and is typically

given an identifier E.g.: (n) means a node n

Related nodes Related nodes are expressed using an arrow between two nodes E.g. (a)-->(b); (a)-->(b)<--(c); or (a)-->()<--(c)

Labels Label(s) can be attached to a node E.g.: (a:User)-->(b)

Specifying properties Properties are a list of name value pairs enclosed in a curly brackets E.g.: (a { name: "Andres", sport: "Brazilian Ju-Jitsu" })

06-33

http://neo4j.com/docs/stable/introduction-pattern.html

Page 34: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Cypher patterns: relationships

Basic Relationships Directions are not important: (a)--(b) Named relationship: (a)-[r]->(b) Named and typed relationship: (a)-[r:REL_TYPE]->(b) Specifying Relationship that may belong to one of a set of types:

(a)-[r:TYPE1|TYPE2]->(b) Typed but not named relationship: (a)-[:REL_TYPE]->(b)

Relationship of variable lengths (a)-[*2]->(b) describes a path of length 2 between node a and node b (a)-[*3..5]->(b) describes a path of minimum length of 3 and

maximum length of 5 between node a and node b Either bound can be omitted (a)-[*3..]->(b), (a)-[*..5]->(b) Both bounds can be omitted as well (a)-[*]->(b)

06-34

Page 35: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Cypher Clauses - Create

CREATE Create nodes or relationships with properties

Create a node matrix1 with the label MovieCREATE (matrix1:Movie {title:'The Matrix', released:1999,

tagline:'Welcome to the Real World'})

Create a node keanu with the label ActorCREATE (keanu:Actor {name:'Keanu Reeves', born:1964})

Create a relationship ACTS_INCREATE (keanu)-[:ACTS_IN {roles:'Neo'}]->(matrix1)

06-35

Page 36: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Cypher – Read

MATCH … RETURN Return all nodes

MATCH (n) RETURN n

Return all nodes with a given label

MATCH (movie:Movie) RETURN movie

Return all actors in the movie “The Matrix”

MATCH (a:Actor)-[:ACTS_IN]->(m:Movie{title:"The Matrix"}) RETURN a.name

06-36

http://docs.neo4j.org/chunked/milestone/query-match.html

Page 37: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Cypher - Update

MATCH … SET/REMOVE … RETURN

Set all the age property for all actor nodesMATCH (n:Actor)SET n.age = 2014 - n.bornRETURN n

Remove a propertyMATCH (n:Actor)REMOVE n.ageRETURN n

Remove a label

MATCH (n:Actor{name:"Keanu Reeves"})

REMOVE n:ActorRETURN n

06-37

Page 38: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Cypher - Delete

MATCH … DELETE

Delete relationshipMATCH (n{name:"Keanu Reeves"})-[r:ACTS_IN]->()DELETE r

Delete a node and all possible relationshipMATCH (m{title:'The Matrix'})-[r]-()DELETE m,r

06-38

Page 39: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

Upcoming Assessment

Quiz tonight in the lab! Group assignment instruction can be viewed in Elearning

and edlms The data set can be downloaded from ELearning Work in group of up to 3 students Both design and implementation parts Submit a report and demo your system

On week 10 You can form group now An Elearning link will be available for group sign up soon

06-39

Page 40: Ying Zhou School of Information Technologies COMP5338 – Advanced Data Models Week 6: Graph Data Model and Introduction to Neo4j

COMP5338 "Advanced Data Models" - 2015 (A. Fekete & Y. Zhou)

References Ian Robinson, Jim Webber and Emil Eifrem, Graph Databases, Second Edition, O’Reilly

Media Inc., June 2015 You can download this book from the Neo4j site, http://www.neo4j.org/learn will redirect you to

http://graphdatabases.com/ The Neo4j Manual v2.2.5

The Neo4j Graph Database (http://neo4j.com/docs/stable/graphdb-neo4j.html) Cypher Query Language (http://neo4j.com/docs/stable/cypher-query-lang.html)

Noel Yuhanna, Market Overview: Graph Databases, Forrester White Paper, May, 2015 Renzo Angeles, A Comparison of Current Graph Data Models, ICDE Workshops 2013

(DOI-10.1109/ICDEW.2012.31) Renzo Angeles and Claudio Gutierrez, Survey of Graph Database Models, ACM

Computing Surveys, Vol. 40, N0. 1, Article 1, February 2008 (DOI-10.1145/1322432.1322433)

06-40