hands on training – graph database with neo4j

Post on 18-Jan-2017

419 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hands on Training – Graph Database with Neo4j

www.serendio.com

Content• Introduction• The Graph Database: Neo4j• Neo4j - Cypher Query Language

– CRUD operations

• Use cases– Mumbai Local Train– Movie Recommendation– Email Analytics

• Neo4j Tools: Import, Visualization• Conclusion

Introduction to Nosql

Nosql = Not O

nly SQL

4

What is Graph?

V1

V4

V6V7

V2

V5

V3

The set of objects connected by links.

The Graph based Technologies in BigData/Nosql domainStorage & Traversal/Query • Neo4j• TitanDB• OrientDB

Processing/Computation Engines• Apache Giraph• GraphLab• Apache Spark Graph ML/Graphx

Graph Databases

• A database which follows graph structure• Each node knows its adjacent nodes• As the number of nodes increases, the cost of local

step remains the same• Index for lookups• Optimized for traversing connected data

Neo4j

• Graph database from Neo Technology• A schema-free labeled Property Graph Database +

Lucene Index• Perfect for complex, highly connected data• Reliable with real ACID Transactions• Scalable: Billions of Nodes and Relationships, Scale

out with highly available Neo4j Cluster• Server with REST API or Embeddable• Declarative Query Language (Cypher)

Neo4j: Strengths & Weakness

Strengths• Powerful data model• Whiteboard friendly• Fast for connected data• Easy to query

Weakness• Sharding• Requires Conceptual Shift (Graph like thinking)

Four Building Blocks

• Nodes• Relationships• Properties• Labels

(:USER)[:RELATIVE] (:PET)

Name: Mike

Animal: DogName: AppleAge: 25

Relation: Owner

10Serendio Proprietary and Confidential

SQL to Graph DB: Data Model Transformation

SQL Graph DBTable Type of Node

Rows of Table NodesColumns of Table Node-PropertiesForeign-key, Joins Relationships

SQL to Graph DB: Data Model Transformation

Name Movies Language

Rajnikant Tamil

Maheshbabu Telugu

Vijay Tamil

Prabhas Telugu

Name Lead Actor

Bahubali Prabhas

Puli Vijay

Shrimanthudu Maheshbabu

Robot Rajnikant

Table: Actor

Table: Movie

ACTOR

MOVIE

ACTOR

MOVIE

Name Prabhas

Movie Language

Telugu

Name Rajnikant

Movie Language

TamilName Bahubali

Name Robot

LEAD_ACTOR

LEAD_ACTOR

Interact with Neo4j• Web Interface

– http://IP:7474/browser/– http://IP:7474/webadmin/

• Neo4j Console• REST API• Java Native Libraries

How to query Graph Database?• Graph Query Language

– Cypher– Gremlin

A pattern-matching query language for graphs

Cypher

Cypher Query Language• Declarative• SQL-inspired • Pattern based

Apple OrangeLIKES

(Apple:FRUIT) - [connect:RELATIVE] -> (Orange:FRUIT)

Cypher: Getting Started

Structure:• Similar to SQL• Most common clauses:

– MATCH: the graph pattern for matching– WHERE: add constrains or filter– RETURN: what to return

Cypher: Frequently Used Queries

• get whole database: MATCH n RETURN n

• delete whole database: MATCH (n)OPTIONAL MATCH (n)-[r]-()DELETE n,r

CRUD OperationsCopy the code from link and paste in Noe4j Web Browser

MATCH:• MATCH (n) RETURN n• MATCH (movie:Movie) RETURN movie• MATCH (movie:Movie { title: 'Bahubali' }) RETURN movie• MATCH (director { name:'Rajamouli' })--(movie) RETURN movie.title• MATCH (raj:Person { name:'Rajamouli'})--(movie:Movie) RETURN movie• MATCH (raj:Person { name:'Rajamouli'})-->(movie:Movie) RETURN movie• MATCH (raj:Person { name:'Rajamouli'})<--(movie:Movie) RETURN movie• MATCH (raj:Person { name:'Rajamouli'})-[:DIRECTED]->(movie:Movie)

RETURN movie

CRUD Operations

WHERE:• MATCH (n)

WHERE n:Movie

RETURN n• MATCH (n)

WHERE n.name <> 'Prabhas'

RETURN n

CRUD Operations

Let clean the database:

MATCH (n)OPTIONAL MATCH (n)-[r]-()DELETE n,r

CRUD Operations

CREATE:Node:• CREATE (n)• CREATE (n),(m)• CREATE (n:Person)• CREATE (n:Person:Swedish)• CREATE (n:Person { name : 'Andres', title : 'Developer' })• CREATE (a:Person { name : 'Roman' }) RETURN a

CRUD Operations

CREATE:Relationships:• MATCH (a:Person),(b:Person)

WHERE a.name = 'Roman' AND b.name = 'Andres'CREATE (a)-[r:RELTYPE]->(b)

RETURN r• MATCH (a:Person),(b:Person)

WHERE a.name = 'Roman' AND b.name = 'Andres'

CREATE (a)-[r:RELTYPE { name : a.name + '<->' + b.name }]->(b)

RETURN r

CRUD Operations

CREATE:Relationships:• CREATE p =(andres { name:'Andres'}) - [:WORKS_AT] -> (neo)

<- [:WORKS_AT] - (michael { name:'Michael' })

RETURN p

CRUD Operations

UPDATE:Properties:• MATCH (n:Person { name : 'Andres' }) SET n :Person:Coder• MATCH (n:Person { name : 'Andres', title : 'Developer' }) SET

n.title = 'Mang'

CRUD Operations

DELETE:• MATCH (n:Person)

WHERE n.name = 'Andres'

DELETE n• MATCH (n { name: 'Andres' })-[r]-()

DELETE n, r• MATCH (n:Person)

DELETE n• MATCH (n)

OPTIONAL MATCH (n)-[r]-()

DELETE n,r

Functions

Predicates:• ALL(identifier in collection WHERE predicate)• ANY(identifier in collection WHERE predicate)• NONE(identifier in collection WHERE predicate)• SINGLE(identifier in collection WHERE predicate)• EXISTS( pattern-or-property )

Scalar Function:• LENGTH( collection/pattern expression )• TYPE( relationship )• ID( property-container )• COALESCE( expression [, expression]* )• HEAD( expression )• LAST( expression )• TIMESTAMP()

Functions

Collection Function:• NODES( path )• RELATIONSHIPS( path )• LABELS( node )• FILTER(identifier in collection WHERE predicate)• REDUCE( accumulator = initial, identifier in collection | expression )

Mathematical Function:• ABS( expression )• COS( expression )• LOG( expression )• ROUND( expression )• SQRT( expression )

Neo4j in Action

Usecases

Use case 1: Mumbai Local Train*Problem• Four main railway lines- Western, Central, Harbour and Trans

Harbour.• Each line serves various sections of the city.• To travel across sections, one must change lines at various

interchange stations. • Find the shortest path from source station to destination

station.

•*https://gist.github.com/luanne/8159102

Use case 1: Mumbai Local Train (conti..)

Use case 1: Mumbai Local Train (conti..)Solution:• Create railway network graph.• Use shortest path algo for source and destination.

Use case 1: Mumbai Local Train (conti..)Graph Database Model:

Station StationNext

Use case 1: Mumbai Local Train (conti..)Create Graph• Open the file from link below, copy-paste and run it on neo4j.

Use case 1: Mumbai Local Train (conti..)• Query 1: The Graphmatch n return n

• Query 2: Route from Churchgate to Vashimatch (s1 {name:"Churchgate"}),(s2 {name:"Vashi"}),p=shortestPath((s1)-[:NEXT*]->(s2)) return p

• Query 3: Route from Santa Cruz to Dockyard Road

match (s1 {name:"Santa Cruz"}),(s2 {name:"Dockyard Road"}),p=shortestPath((s1)-[:NEXT*]-(s2)) return p

Use Case 2: Movie Recommendation* Problem: • We are running IMDB type website. • We have dataset which contains movie rating done by users. • Our problem is to generate list of movies which will be

recommended to individual users.

* http://www.neo4j.org/graphgist?8173017

Use Case 2: Movie Recommendation (Conti..)Solution: • We will find the people who has given similar rating to the

movies watch by both of them.• After that we will recommend movies which one has not seen

and other has rated high.

• Cosine Similarity function to calculate similarity between users.

• k-Nearest Neighbors for finding similar users

Use Case 2: Movie Recommendation (Conti..)

• Cosine Similarity:

• K-NN:

Use Case 2: Movie Recommendation (Conti..)• Let’s create real dataset with you folks.

• Visit: http://graphlab.byethost7.com/movie_recco/index.php

Use Case 2: Movie Recommendation (Conti..)Dataset:• Nodes:

– movies.csv– users.csv

• Edges:– rating.csv

EXTRA FILES WE WILL CREATE• movies_header.csv• users_header.csv• rating_header.csv

Use Case 2: Movie Recommendation (Conti..)• Import to Neo4j$ ./neo4j-import \--into /tmp/graph.db \--nodes:USER person_header.csv,person.csv \--nodes:MOVIES movies_header.csv,movies.csv \--relationships:RATING rating_header.csv, rating.csv

Use Case 2: Movie Recommendation (Conti..)• Query:Add Cosine Similarity

MATCH (p1:USER)-[x:RATING]->(m:MOVIES)<-[y:RATING]-(p2:USER)WITH SUM(x.rating * y.rating) AS xyDotProduct, SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS

xLength, SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS

yLength, p1, p2MERGE (p1)-[s:SIMILARITY]-(p2)SET s.similarity = xyDotProduct / (xLength * yLength)

Use Case 2: Movie Recommendation (Conti..)• Query: See who is your neighbor in

similarity

MATCH (p1:USER {name:'Nishant'})-[s:SIMILARITY](p2:USER)WITH p2, s.similarity AS simORDER BY sim DESCLIMIT 5RETURN p2.name AS Neighbor, sim AS Similarity

Use Case 2: Movie Recommendation (Conti..)• Query: Recommendation Finally :D

MATCH (b:USER)-[r:RATING]->(m:MOVIES), (b)-[s:SIMILARITY]-(a:USER {name:'Nishant'})

WHERE NOT((a)-[:RATING]->(m))WITH m, s.similarity AS similarity, r.rating AS ratingORDER BY m.name, similarity DESCWITH m.name AS movie, COLLECT(rating)[0..3] AS ratingsWITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS

recoORDER BY reco DESCRETURN movie AS Movie, reco AS Recommendation

Use Case 3: Email Analytics*Overview:• Framework for analyzing large email datasets• Capability of performing Sentiment Analysis and Topic

Extraction on email dataset• Accessed through Command Line Interface• Incubated at Serendio and open source project now.

*https://github.com/serendio-labs/email-analytics

Use Case 3: Email Analytics (Conti..)

System Architecture:

Use Case 3: Email Analytics (Conti..)• DEMO

Use Case 3: Email Analytics (Conti..)Possible Use cases:• Keep track of your employee’s activities.• Fraud-detection• Data-mining for Business Analytics

Use Case 3: Email Analytics (Conti..)

• Come forward and contribute:• The project need attention in the area of

– Web-UI– REST API– Unit Test– Custom Email Format Support– Other Features

Neo4j with Other Technologies

Neo4j-Integration

Neo4j with Other technologies• Data Import

– LOAD CSV– Neo4j-import

• Graph Visualization– Alistair Jones (Arrow)– Alchemy.js (GraphJSON)– Neo4j Browser– Linkurious– Keylines– D3.js

Neo4j Integration• Apache Spark• Elasticsearch • Docker

Conclusion Graph Database Technologies like Neo4j has lot of potential to

solve many complex problems. The neo4j is mature technology which can be used in

designing solutions.

nishant@serendio.com

Serendio provides Big Data Science Solutions & Services for Data-Driven Enterprises.

Learn more at: serendio.com/index.php/case-studies

Thank You!

top related