graph databases in php @ phpcon poland 10-22-2011
DESCRIPTION
Presentation given at the national PHP conference in Poland, in Kielce, October 2011, dealing with the introduction of graph databases in PHP, taking a practical look at OrientDB.TRANSCRIPT
1
David FunaroAlessandro Nadalin
GraphDB in PHP
domenica 23 ottobre 11
Agenda
2
•Theory•When to use a graph?•Why graphDB?•The graphDB community•OrientDB•OrientDB in PHP•Demo
domenica 23 ottobre 11
Essential (Theory)
3
domenica 23 ottobre 11
Essential (Theory)
3
Gra
phG =
domenica 23 ottobre 11
Essential (Theory)
3
Ver
tex
(V,G
raph
G =
domenica 23 ottobre 11
Essential (Theory)
A
3
Ver
tex
(V,G
raph
G =
domenica 23 ottobre 11
Essential (Theory)
A
3
Ver
tex
(V,G
raph
G =
Edg
e
E)
domenica 23 ottobre 11
Essential (Theory)
A
3
Ver
tex
(V,G
raph
G =
Edg
e
E)
domenica 23 ottobre 11
Binary Relation
4
BA
Hates
Itchy Scratchy
domenica 23 ottobre 11
Binary Relation
4
B
Vertex Vertex
Edge
A
domenica 23 ottobre 11
Graph
5
B
D
E
G
FA
domenica 23 ottobre 11
Undirected Graph
B
D
E
F
A
Example: Friendship 6
domenica 23 ottobre 11
Directed Edge
7
B
Vertex Vertex
A
domenica 23 ottobre 11
Directed Edge
7
B
Vertex Vertex
Edge
A
domenica 23 ottobre 11
Directed Graph
8Example: Followee
D
FA
BA
domenica 23 ottobre 11
Path
9
B
D
E
G
FA
domenica 23 ottobre 11
Path
10
B D EG FA
domenica 23 ottobre 11
Graph -> GraphDB
11
GraphDB is a database that use the Graph as its primary data structure
domenica 23 ottobre 11
... when to use a graph ?
domenica 23 ottobre 11
Web in ’99
13
domenica 23 ottobre 11
Web in 2005
14
domenica 23 ottobre 11
The social web
15
domenica 23 ottobre 11
Your data is a graph
16
domenica 23 ottobre 11
a tree is a graph
17
domenica 23 ottobre 11
parent_id is a graph
18
domenica 23 ottobre 11
Recommendations
19
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows
domenica 23 ottobre 11
Recommendations
20
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows x
✓
x domenica 23 ottobre 11
Recommendations
21
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows x
✓
x
✓
domenica 23 ottobre 11
Recommendations
22
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows x
✓
x
✓ ✓
domenica 23 ottobre 11
Recommendations
23
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows x
✓
x
✓ ✓ x
domenica 23 ottobre 11
Recommendations
24
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows x
✓
x
x x x ✓
domenica 23 ottobre 11
Recommendations
25
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows x
✓
x
x x x ✓ ✓
domenica 23 ottobre 11
Recommendations
26
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows x
✓
x
x x x ✓ ✓ ✓
domenica 23 ottobre 11
Recommendations
27
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows x
✓
x
x x x ✓ ✓ ✓ ✓
domenica 23 ottobre 11
Solve decision problems
domenica 23 ottobre 11
Maximum flow
domenica 23 ottobre 11
domenica 23 ottobre 11
maximum flowGiven a dataset, calculate how to best organize it
domenica 23 ottobre 11
travelling salesman problem
domenica 23 ottobre 11
The pizza guy needs to deliver on A, B,C.
domenica 23 ottobre 11
Decision base on distance, traffic, time and so on.
domenica 23 ottobre 11
Shortest pathdomenica 23 ottobre 11
Identify "special" nodes of the graph
domenica 23 ottobre 11
Given your dataset, organize some clusters
Are there some nodes which cannot belong to a cluster?
They probably have some properties different from the average
domenica 23 ottobre 11
Given your dataset, organize some clusters
Are there some nodes which cannot belong to a cluster?
They probably have some properties different from the average
ACHTUNG!TERRORISTEN!
domenica 23 ottobre 11
but ... why graphDB?
38
domenica 23 ottobre 11
http://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation#
Representing a Graph in:
39
domenica 23 ottobre 11
✓Relational Database
(mysql, oracle)
✓Document Oriented DB
(mongodb, couchdb)
✓XML Database
(MarkLogic, eXist-db)
http://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation#
Representing a Graph in:
39
domenica 23 ottobre 11
where is the difference ?
40
domenica 23 ottobre 11
A graph database is any storage system that provides index-free adjacency.
GraphDB
http://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation
domenica 23 ottobre 11
Step by step example
42
Given a list of people, find their homepages
domenica 23 ottobre 11
Tree-based DB WAY
43
1
domenica 23 ottobre 11
Tree-based DB WAY
43
1
David Funaro
put in the Search Engine2
domenica 23 ottobre 11
Tree-based DB WAY
43
1
find
http://davidfunaro.com
3
David Funaro
put in the Search Engine2
domenica 23 ottobre 11
Tree-based DB WAY
43
1
find
http://davidfunaro.com
3
David Funaro
put in the Search Engine2
The cost to find a single friend HP grows as the friends HP tables grows
domenica 23 ottobre 11
GraphDB WAY
44
it’s like that the GraphDB has an additional information(the ancor <a>)
domenica 23 ottobre 11
GraphDB WAY
44
get the embedded information(index)
www.odino.org
1
it’s like that the GraphDB has an additional information(the ancor <a>)
domenica 23 ottobre 11
GraphDB WAY
45
<a href=”http://odino.org”>Alessandro Nadalin
</a>
The Anchor work as a local index to reach the document = index-free
adjacency
domenica 23 ottobre 11
Local cost
46
The local cost is O(k) = Constant
domenica 23 ottobre 11
Local cost
47
The local cost is O(k) = Constant
domenica 23 ottobre 11
Local cost
48
domenica 23 ottobre 11
Local cost
48
Thus, as the graph grows in size, the cost of a local step remain the same
domenica 23 ottobre 11
any database can implicity represent a graph
BUTonly a graph database make the graph
structure explicit
49
domenica 23 ottobre 11
Benchmark
50
• 1 Million Vertex
• 4 Million Edge
• Scale Free Tolopogy
• Postgres VS Neo4J
• Both Hash and BTree
Deph RDBMS Graph
1
2
3
4
5
100ms 30ms
1000ms 500ms
10000ms 3000ms
100000ms
50000ms
N/A 100000ms
http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/
domenica 23 ottobre 11
Databases
community that is building and feeding the GraphDB ecosystem
ThinkerPopStack
GraphDB community
domenica 23 ottobre 11
Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data
model. Blueprints is analogous to the JDBC, but for graph databases.
https://github.com/tinkerpop/blueprints/wiki/
data model and their implementation
domenica 23 ottobre 11
provide a collection of "pipes" that are connected togheter to from processing
pipelines
a data flow Framework using Process Graph
domenica 23 ottobre 11
a graph-based programming language.
a Turing-Complete graph-base programming language that compiles Gremlin syntax down to Pipes
domenica 23 ottobre 11
a REST-full graph shell.
Allow blueprints graph to be exposed through a RESTful API (HTTP)
domenica 23 ottobre 11
What's hot
domenica 23 ottobre 11
OrientDB
domenica 23 ottobre 11
Glossary
58
<10:05>RID
Cluster Position
domenica 23 ottobre 11
Glossary
58
<10:05>RID
Cluster Position
CLASS
domenica 23 ottobre 11
Main features
domenica 23 ottobre 11
Inheritance
domenica 23 ottobre 11
class Bike
class Vehicle
class Car
domenica 23 ottobre 11
class Bike
class Vehicle
class Car
SELECT FROM Vehicle WHERE owner = 1:1
domenica 23 ottobre 11
class Bike
class Vehicle
class Car
can return records of class Bike or Car
domenica 23 ottobre 11
Traversal
domenica 23 ottobre 11
domenica 23 ottobre 11
SELECT FROM fellas WHERE any() traverse(0,-1) ( @rid = [Michelle @rid] )66
domenica 23 ottobre 11
67SELECT FROM fellas WHERE any() traverse(0,-1) ( @rid = [Michelle @rid] )
domenica 23 ottobre 11
SELECT FROM fellas WHERE any() traverse(0,2) ( @rid = [Michelle @rid] )SELECT FROM fellas WHERE any() traverse(0,2) ( @rid = [Michelle @rid] )
domenica 23 ottobre 11
SELECT FROM fellas WHERE any() traverse(0,2) ( @rid = [Michelle @rid] )
domenica 23 ottobre 11
SQL synthax
domenica 23 ottobre 11
beyond SQL
domenica 23 ottobre 11
SELECT FROM authors WHERE book.title = ...
domenica 23 ottobre 11
ACIDdomenica 23 ottobre 11
speaks JSON
domenica 23 ottobre 11
{ "schema": { "name": "Address" }, "result": [{ "@type": "d", "@rid": "#13:0", "@version": 6, "@class": "Address", "type": "Residence", "street": "Piazza Navona, 1", "city": "#14:0", "nick": "Luca2" }, { ... ...
domenica 23 ottobre 11
Double Protocol
domenica 23 ottobre 11
HTTP
domenica 23 ottobre 11
HTTP
Universal
domenica 23 ottobre 11
HTTP
Easy to interact with
domenica 23 ottobre 11
binary
domenica 23 ottobre 11
Blazing fast
binary
domenica 23 ottobre 11
on-record SELECTs
domenica 23 ottobre 11
SELECT FROM cats
domenica 23 ottobre 11
SELECT FROM cats
domenica 23 ottobre 11
SELECT FROM 11:0
domenica 23 ottobre 11
SELECT FROM 11:0
domenica 23 ottobre 11
SELECT FROM [11:0,11:1]
domenica 23 ottobre 11
SELECT FROM [11:0,11:1]
domenica 23 ottobre 11
SELECT FROM [11:0,12:0]
domenica 23 ottobre 11
SELECT FROM [11:0,12:0]
domenica 23 ottobre 11
stress-free setupdomenica 23 ottobre 11
2 Mb
domenica 23 ottobre 11
./orient/bin/server.sh
93
domenica 23 ottobre 11
in-memory DB
domenica 23 ottobre 11
or disk-persisted
domenica 23 ottobre 11
Supports standards Supports standards
96
domenica 23 ottobre 11
OrientDB
•Inheritance
•Traversal
•Sql syntax like
•ACID
•Speak JSON
•Double protocol
•on-record Select
•ThinkerPop Compliant
domenica 23 ottobre 11
Oh, it's Java.
98
domenica 23 ottobre 11
PHP ?
domenica 23 ottobre 11
somebody started writing thebinary-protocol binding
https://github.com/AntonTerekhov/OrientDB-PHP( beta0.4.1, 28 April 2010 )
domenica 23 ottobre 11
$db = new OrientDB($host, $port);
$record = $db->recordLoad('1:1', '*:-1');
// $record instance of OrientDBRecord
domenica 23 ottobre 11
and others
domenica 23 ottobre 11
domenica 23 ottobre 11
Orient Library
104
... are writing a complete library
https://github.com/congow/Orient
domenica 23 ottobre 11
Orient = PHP Library to work with OrientDB
105
domenica 23 ottobre 11
Data Mapper
Query BuilderHTTP Binding
domenica 23 ottobre 11
HTTP Binding
domenica 23 ottobre 11
use Congow\Orient;use Congow\Orient\Foundation\Binding;
$driver = new Orient\Http\Client\Curl();$orient = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');
$response = $orient->query("SELECT FROM Address");
$output = json_decode($response->getBody());
foreach ($output->result as $address){ var_dump($address->street);}
domenica 23 ottobre 11
use Congow\Orient;use Congow\Orient\Foundation\Binding;
$driver = new Orient\Http\Client\Curl();$orient = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');
$response = $orient->query("SELECT FROM Address");
$output = json_decode($response->getBody());
foreach ($output->result as $address){ var_dump($address->street);}
domenica 23 ottobre 11
use Congow\Orient;use Congow\Orient\Foundation\Binding;
$driver = new Orient\Http\Client\Curl();$orient = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');
$response = $orient->query("SELECT FROM Address");
$output = json_decode($response->getBody());
foreach ($output->result as $address){ var_dump($address->street);}
domenica 23 ottobre 11
use Congow\Orient;use Congow\Orient\Foundation\Binding;
$driver = new Orient\Http\Client\Curl();$orient = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');
$response = $orient->query("SELECT FROM Address");
$output = json_decode($response->getBody());
foreach ($output->result as $address){ var_dump($address->street);}
domenica 23 ottobre 11
use Congow\Orient;use Congow\Orient\Foundation\Binding;
$driver = new Orient\Http\Client\Curl();$orient = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');
$response = $orient->query("SELECT FROM Address");
$output = json_decode($response->getBody());
foreach ($output->result as $address){ var_dump($address->street);}
{ "schema": { "name": "Address" }, "result": [{ "@type": "d", "@rid": "#13:0", "@version": 6, "@class": "Address", "type": "Residence", "street": "Piazza Navona, 1", "city": "#14:0", "nick": "Luca2" }, { ... ...
domenica 23 ottobre 11
apart from ->query($SQL)
domenica 23 ottobre 11
->get|delete|postClass($class)
domenica 23 ottobre 11
->post|delete|put|getDocument($rid)
domenica 23 ottobre 11
...and much more!
(connect, disconnect, ...)
domenica 23 ottobre 11
Query Builder
domenica 23 ottobre 11
use Congow\Orient\Query;
$query = new Query();$query->from(array('users'))->where('username = ?', "admin");
echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"
domenica 23 ottobre 11
use Congow\Orient\Query;
$query = new Query();$query->from(array('users'))->where('username = ?', "admin");
echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"
domenica 23 ottobre 11
use Congow\Orient\Query;
$query = new Query();$query->from(array('users'))->where('username = ?', "admin");
echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"
domenica 23 ottobre 11
use Congow\Orient\Query;
$query = new Query();$query->from(array('users'))->where('username = ?', "admin");
echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"
domenica 23 ottobre 11
$query->select(array('name', 'username', 'email'), false) ->from(array('12:0', '12:1'), false) ->where('any() traverse ( any() like "%danger%" )') ->orWhere("1 = ?", 1) ->andWhere("links = ?", 1) ->limit(20) ->orderBy('username') ->orderBy('name', true, true) ->range("12:0", "12:1");
SELECT name, username, email FROM [12:0, 12:1] WHERE any() traverse ( any() like "%danger%" ) OR 1 = "1" AND links = "1" ORDER BY name, username LIMIT 20 RANGE 12:0 12:1
domenica 23 ottobre 11
Data Mapper
domenica 23 ottobre 11
A Doctrine2 strange ODM
domenica 23 ottobre 11
namespace Poland\PHPCon\Entity;
use Congow\Orient\ODM\Mapper\Annotations as ODM;
/*** @ODM\Document(class="Person")*/class Speaker{ /** * @ODM\Property( type="string") */ protected $name;
public function setName($name) { $this->name = $name; }
domenica 23 ottobre 11
namespace Poland\PHPCon\Entity;
use Congow\Orient\ODM\Mapper\Annotations as ODM;
/*** @ODM\Document(class="Person")*/class Speaker{ /** * @ODM\Property(type="string") */ protected $name;
public function setName($name) { $this->name = $name; }
domenica 23 ottobre 11
namespace Poland\PHPCon\Entity;
use Congow\Orient\ODM\Mapper\Annotations as ODM;
/*** @ODM\Document(class="Person")*/class Speaker{ /** * @ODM\Property(type="string") */ protected $name;
public function setName($name) { $this->name = $name; }
domenica 23 ottobre 11
namespace Poland\PHPCon\Entity;
use Congow\Orient\ODM\Mapper\Annotations as ODM;
/*** @ODM\Document(class="Person")*/class Speaker{ /** * @ODM\Property(type="string") */ protected $name;
public function setName($name) { $this->name = $name; }
domenica 23 ottobre 11
Domain Driven Design
domenica 23 ottobre 11
{ "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ...
domenica 23 ottobre 11
{ "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ...
$david = $mapper->hydrate(json_decode($speaker));
domenica 23 ottobre 11
{ "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ...
$david instanceOf Poland\PHPCon\Entity\Speaker
domenica 23 ottobre 11
Repository Pattern
$repo = $manager->getRepository('Speaker')
domenica 23 ottobre 11
$speakers = $repo->findAll();
domenica 23 ottobre 11
$speaker = $repo->find($rid);
domenica 23 ottobre 11
$criteria = array('Name' => 'Lorna');
$lornas = $repo->findBy($criteria);
domenica 23 ottobre 11
$criteria = array( 'Name' => 'Lorna', 'last_name' => 'Jane');
$lornaJ = $repo->findOneBy($criteria);
domenica 23 ottobre 11
Know your boundaries
138
domenica 23 ottobre 11
https://github.com/doctrine/common/tree/master/lib/Doctrine/Common/Persistence
139
domenica 23 ottobre 11
Theory sucks.
140
domenica 23 ottobre 11
Demo
domenica 23 ottobre 11
Demo
142
id type page url
1 external NULL http://www.google.com
2 page 1 NULL
Menu items in RDBMS
domenica 23 ottobre 11
Demo
143
rid title url
8:2 google google.com
Menu items in OrientDB
rid title page
9:1 home 1{ Link
PageLink ExternalLink
domenica 23 ottobre 11
144
That’s all, folks!
domenica 23 ottobre 11
144
David Funaro@ingdavidinohttp://davidfunaro.com
That’s all, folks!
domenica 23 ottobre 11
144
David Funaro@ingdavidinohttp://davidfunaro.com
Alessandro Nadalin@_odino_
http://odino.org
That’s all, folks!
domenica 23 ottobre 11
144
David Funaro@ingdavidinohttp://davidfunaro.com
Alessandro Nadalin@_odino_
http://odino.org
That’s all, folks!
domenica 23 ottobre 11
Credits
http://www.flickr.com/photos/sayamindu/5677281218/sizes/l/in/photostream/http://farm1.static.flickr.com/182/471383865_79d04aec36_o.pnghttp://farm1.static.flickr.com/134/318947873_12028f1b66_b.jpg
http://www.flickr.com/photos/atomdocs/3275758118/sizes/o/in/photostream/http://www.flickr.com/photos/pattipics/5229478393/sizes/o/in/photostream/
http://www.flickr.com/photos/kongharald/366597251/sizes/o/in/photostream/http://www.everaldo.com/
http://www.flickr.com/photos/tusnelda/6140792529/sizes/l/in/photostream/http://www.flickr.com/photos/mondi/5368644355/sizes/l/in/photostream/
http://www.flickr.com/photos/jayneandd/4191106566/sizes/l/in/photostream/http://www.flickr.com/photos/jooon/2093253534/sizes/l/in/photostream/
http://www.flickr.com/photos/bluedharma/89186151/sizes/o/in/photostream/http://www.flickr.com/photos/exfordy/2747089295/sizes/l/in/photostream/
http://www.flickr.com/photos/nostri-imago/3137422976/sizes/o/in/photostream/http://www.flickr.com/photos/fionasjournal/379587818/sizes/z/in/photostream/
http://www.flickr.com/photos/nperlapro/1297392267/http://www.flickr.com/photos/fastphive/28428808/sizes/m/in/photostream/
http://www.flickr.com/photos/rnugraha/2003147365/sizes/o/in/photostream/http://www.flickr.com/photos/zigazou76/4412946911/sizes/l/in/photostream/http://www.flickr.com/photos/greatnet/4667555436/sizes/l/in/photostream/
http://www.flickr.com/photos/mnsc/2768391365/sizes/l/in/photostream/http://www.flickr.com/photos/christmaswithak/4675962453/sizes/l/in/photostream/
http://www.amazon.com/Trainspotting-Irvine-Welsh/dp/0393314804http://www.flickr.com/photos/franconadalin59/5778176872/sizes/l/in/photostream/
http://farm6.static.flickr.com/5176/5474445627_875d621689_b.jpghttp://farm3.static.flickr.com/2243/2189435082_a16d3c89ae_b.jpghttp://farm3.static.flickr.com/2647/3816311930_ac52cff491_o.jpg
http://i130.photobucket.com/albums/p266/feike1977/PES6-4-3-3defencesettings.jpghttp://images.usatoday.com/life/_photos/2006/11/30/numb3rs-topper.jpg
http://www.flickr.com/photos/jakecaptive/3205277810/sizes/l/in/photostream/
domenica 23 ottobre 11