graph databases

31
Graph Databases Josh Adell <[email protected]> 20110719

Upload: josh-adell

Post on 10-May-2015

2.742 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Graph Databases

Graph Databases

Josh Adell <[email protected]>20110719

Page 2: Graph Databases

Who am I?

• Software developer: PHP, Javascript, SQL• http://www.dunnwell.com• Fan of using the right tool for the job

Page 3: Graph Databases

The Problem

Page 4: Graph Databases

The Solution?

> -- Given "Keanu Reeves" find a connection to "Kevin Bacon"> SELECT ??? FROM cast WHERE ???

+---------------------------------------------------------------------+| actor_name                 | movie_title                            |+============================+========================================+| Jennifer Connelley         | Higher Learning                        |+----------------------------+----------------------------------------+| Laurence Fishburne         | Mystic River                           |+----------------------------+----------------------------------------+| Laurence Fishburne         | Higher Learning                        |+----------------------------+----------------------------------------+| Kevin Bacon                | Mystic River                           |+----------------------------+----------------------------------------+| Keanu Reeves               | The Matrix                             |+----------------------------+----------------------------------------+| Laurence Fishburne         | The Matrix                             |+----------------------------+----------------------------------------+

Page 5: Graph Databases

Find Every Actor at Each Degree

> -- First degree> SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon')

> -- Second degree> SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon')))

> -- Third degree> SELECT actor_name FROM cast WHERE movie_title IN(SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon'))))

Page 6: Graph Databases

The Truth

Relational databases aren't very good with relationships

Data

RDBMs

Page 7: Graph Databases

RDBs Use Set Math

Page 8: Graph Databases

The Real Problem

Finding relationships across multiple degrees of separation    ...and across multiple data types    ...and where you don't even know there is a relationship

Page 9: Graph Databases

The Real Solution

Page 10: Graph Databases

Computer Science Definition

A graph is an ordered pair G = (V, E) where V is a set of vertices and E is a set of edges, which are pairs of vertices.

Page 11: Graph Databases

Some Graph DB Vocabulary

• Node: vertex• Relationship: edge• Property: meta-datum attached to a node or relationship• Path: an ordered list of nodes and relationships• Index: node or relationship lookup table

Page 12: Graph Databases

Relationships are First-Class Citizens

• Have a type• Have properties• Have a direction

o Domain semanticso Traversable in any direction

Page 13: Graph Databases

Graph Examples

Page 14: Graph Databases

Relational Databases are Graphs!

Page 15: Graph Databases

New Solution to the Bacon Problem

$keanu = $actorIndex->find('name', 'Keanu Reeves');$kevin = $actorIndex->find('name', 'Kevin Bacon');

$path = $keanu->findPathTo($kevin);

Page 16: Graph Databases

Some Graph Use Cases

• Social networking• Manufacturing• Map directions• Fraud detection• Multi-tenancy

Page 17: Graph Databases

Modelling a Domain with Graphs

• Graphs are "whiteboard-friendly"• Nouns become nodes• Verbs become relationships• Properties are adjectives and adverbs

Page 18: Graph Databases

Audience Participation!

Page 19: Graph Databases

• Neo Technologies• http://neo4j.org• Embedded in Java applications• Standalone server via REST• Plugins: spatial, lucene, rdf

• http://github.com/jadell/Neo4jPHP

Page 20: Graph Databases

Using the REST client

$client = new Client(new Transport());

$customer = new Node($client);$customer->setProperty('name', 'Josh')->save();

$store = new Node($client);$store->setProperty('name', 'Home Despot')      ->setProperty('location', 'Durham, NC')->save();

$order = new Node($client);$order->save();$item = new Node($client);$item->setProperty('item_number', 'Q32-ESM')->save();

$order->relateTo($item, 'CONTAINS')->save();$customer->relateTo($order, 'BOUGHT')->save();$store->relateTo($order, 'SOLD')->save();

$customerIndex = new Index($client, Index::TypeNode, 'customers');$customerIndex->add($customer, 'name', $customer->getProperty('name'));$customerIndex->add($customer, 'rating', 'A++');

Page 21: Graph Databases

Graph Mining

• Paths• Traversals• Ad-hoc Queries

Page 22: Graph Databases

Path Finding

• Find any connection from node A to node B• Limit by relationship types and/or direction• Path finding algorithms: all, simple, shortest, Dijkstra

$customer = $customerIndex->findOne('name', 'Josh');$item = $itemIndex->findOne('item_number', 'Q32-ESM');

$path = $item->findPathsTo($customer)             ->setMaxDepth(2)             ->getSinglePath();

foreach ($path as $node) {    echo $node->getId() . "\n";}

Page 23: Graph Databases

Traversal

• Complex/Custom path finding• Base next decision on previous path

$traversal = new Traversal($client);$traversal->setOrder(Traversal::OrderDepthFirst)->setUniqueness(Traversal::UniquenessNodeGlobal)->setPruneEvaluator('javascript','(function traverse(pos) {     if (pos.length() == 1 && pos.lastRelationship.getType() == "CONTAINS") {        return false;    } else if (pos.length() == 2 && pos.lastRelationship.getType() == "BOUGHT") {        return false;    }    return true;})(position)')->setReturnFilter('javascript',    'return position.endNode().getProperty('type') == 'Customer;');

$customers = $traversal->getResults($item, Traversal::ReturnTypeNode);

Page 24: Graph Databases

• Uses mathematical notation approach• Complex traversal behaviors, including backtracking• https://github.com/tinkerpop/gremlin/wiki

m = [:]g.v(1).out('likes').in('likes').out('likes').groupCount(m)m.sort{a,b -> a.value <=> b.value}

Page 25: Graph Databases

Cypher

• "What to find" vs. "How to find"

$query = 'START item=(1)MATCH (item)<-[:CONTAINS]-(order)<-[:BOUGHT]-(customer)RETURN customer';

$cypher = new Cypher\Query($client, $query);$customers = $cypher->getResultSet();

Page 26: Graph Databases

Cypher Syntax

START item = (1)                        START item = (1,2,3)START item = (items, 'name:Q32*')       START item = (1), customer = (2,3)

MATCH (item)<--(order)                  MATCH (order)-->(item)MATCH (order)-[r]->(item)                              MATCH ()--(item)MATCH    (supplier)-[:SUPPLIES]->(item)<-[:CONTAINS]-(order),    (customer)-[:RATED]->(item)WHERE customer.name = 'Josh' and s.coupon = 'freewidget'

RETURN item, order                      RETURN customer, item, r.ratingRETURN r~TYPE                                                      RETURN COUNT(*)ORDER BY customer.name DESC             RETURN AVG(r.rating)LIMIT 3 SKIP 2

Page 27: Graph Databases

Cypher - All Together Now

// Find the top 10 `widget` ratings by customers who bought AND rated// `widgets`, and the supplier

START item = (items, 'name:widget')MATCH (item)<--(order)<--(customer)-[r:RATED]->(item)<--(supplier)RETURN customer, r.rating, supplier ORDER BY r.rating DESC LIMIT 10

Page 28: Graph Databases

Tools

• Neoclipse• Webadmin

Page 29: Graph Databases

Are RDBs Useful At All?

• Aggregation• Ordered data• Truly tabular data• Few or clearly defined relationships

Page 30: Graph Databases

Questions?

Page 31: Graph Databases

Resources

• http://neo4j.org• http://docs.neo4j.org• http://www.youtube.com/watch?v=UodTzseLh04

o Emil Eifrem (Neo Tech. CEO) webinaro Check out around the 54 minute mark

• http://github.com/jadell/Neo4jPHP

• http://joshadell.com• [email protected]• @josh_adell• Google+, Facebook, LinkedIn