thinking in documents
DESCRIPTION
Introduction to NoSQL database in general, focusing on MongoDBTRANSCRIPT
![Page 1: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/1.jpg)
Thinking in Documents(dropping ACID)
César D. [email protected]://crodas.org/
PHP Conference 2009Sâo Paulo, Brasil
1
![Page 2: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/2.jpg)
Who is this fellow?
� Paraguayan
� Part of the Google Summer of Code 2008
� PHP Classes Innovation Award winner 2007, 2008
� ... and some other few things
@crodas - http://crodas.org/ - LATEX 2
![Page 3: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/3.jpg)
Agenda
� How to scale
� The Web’s major bottleneck
� NoSQL databases• Redis
• Tokyo Cabinet
• Cassandra
• CouchDB
• MongoDB
� Thinking in documents• Data behavior
• Complex operations
� PHP Integration (The fun part!)
� Map/Reduce (Extra time)
@crodas - http://crodas.org/ - LATEX 3
![Page 5: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/5.jpg)
Increase computationalpower
@crodas - http://crodas.org/ - LATEX 5
![Page 6: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/6.jpg)
To make it reliable
@crodas - http://crodas.org/ - LATEX 6
![Page 8: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/8.jpg)
How to scale
� Buying more hardware (and connectivity)
� Reverses (threaded) proxies
� DNS round robin for your Reverses proxies
� Gearmand
� Memcached
� and.. What about the data?
@crodas - http://crodas.org/ - LATEX 8
![Page 9: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/9.jpg)
How to scale data?
@crodas - http://crodas.org/ - LATEX 9
![Page 11: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/11.jpg)
Scaling RDBMS - Solutions
� Master - Slave replication
� Multi-Master replication
� Data sharding
� DRDB and Heartbeat (RAID-1 over the network)
@crodas - http://crodas.org/ - LATEX 11
![Page 13: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/13.jpg)
Master-Slave replication
� We need to modify our app
� It worth only if our application is read intense
� It doesn’t spread the data across servers
� Single point of failure
@crodas - http://crodas.org/ - LATEX 13
![Page 14: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/14.jpg)
Scaling RDBMS - Problems
� SQL
� JOIN
� Autoincrement
� Transactions (ACID)
@crodas - http://crodas.org/ - LATEX 14
![Page 16: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/16.jpg)
StrongConsistency, HighAvailability,Partition-tolerance
Theorem
@crodas - http://crodas.org/ - LATEX 16
![Page 17: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/17.jpg)
BASEBasically Available, Soft state, Eventually Consistent
@crodas - http://crodas.org/ - LATEX 17
![Page 18: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/18.jpg)
Everybody is doing it
� Amazon
� eBay
� Yahoo!
� ...
@crodas - http://crodas.org/ - LATEX 18
![Page 19: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/19.jpg)
Open implementations
� Cassandra
� Redis
� Tokyo Cabinet/Tyrant
� CouchDB
� MongoDB (FTW!)
� ...
@crodas - http://crodas.org/ - LATEX 19
![Page 20: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/20.jpg)
Cassandra
� No master (p2p)
� Storage model more like BigTable
� Open source
� Incremental scalable
� PHP interface (with Thrift)
� Never played too much with it.
@crodas - http://crodas.org/ - LATEX 20
![Page 22: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/22.jpg)
Key-value
� Fast
� Similar to PHP’s array
� Simple
� Easy to distribute across machines
@crodas - http://crodas.org/ - LATEX 22
![Page 23: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/23.jpg)
Memcached
� It is a key-value store engine used as a cache.
� No persistence(RAM, uses LRU)
� Lightening fast
� Well supported
� *Everybody* is using it
� Several clients for PHP [even I had wrote one ;-)]
@crodas - http://crodas.org/ - LATEX 23
![Page 24: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/24.jpg)
Redis
� Very new
� As fast as Memcached
� Persistent to disk
� Very simple protocol
� Support lists and tuples
� Replication
� Operation in the key space
� I loved it!• Until I realised it is in-memory DB
@crodas - http://crodas.org/ - LATEX 24
![Page 25: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/25.jpg)
Tokyo Tyrant
� Very similar to BerkeleyDB ( dba open() )
� Performs well (I’ve been playing a bit with it)
� Actively developed
� HTTP Interface (+/-)
� Memcached Protocol (++)
� Going to Document-oriented (supports "tables")
@crodas - http://crodas.org/ - LATEX 25
![Page 26: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/26.jpg)
Document-oriented DB
@crodas - http://crodas.org/ - LATEX 26
![Page 27: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/27.jpg)
http://www.flickr.com/photos/beglen/152027605/
@crodas - http://crodas.org/ - LATEX 27
![Page 28: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/28.jpg)
What is a "Document"?
<?php$collection[$id] = array(
"title" => "PHP rules","tags" => array("php", "web"),"body" => "... PHP rules ...","comments" => array(
array("author" => "crodas", "comment" => "Yes it does"),)
);?>
@crodas - http://crodas.org/ - LATEX 28
![Page 29: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/29.jpg)
Docuement Databases
� Schema free
� Document versioning
� Improved Key-value store
� Great for storing objects
@crodas - http://crodas.org/ - LATEX 29
![Page 31: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/31.jpg)
CouchDB
� Apache project
� Asynchronous replication
� JSON-based (XML free!)
� RESTful interface (might be bad)
� Views are materialized on demand (not Indexes :-( )
� Cool admin
� Safe IO (Append only)
� Distributed (concurrent) by nature (written in Erlang)
@crodas - http://crodas.org/ - LATEX 31
![Page 34: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/34.jpg)
MongoDB
� Forgot about its name meaning in Portuguese.
� Fast, Fast, Fast
� JSON and BSON (Binary JSON-ish)
� Asynchronous replication, autosharding
� Support indexes (FTW!)
� Nested documents (FTW!)
� Advanced queries (FTW!)
� Native extension for PHP
@crodas - http://crodas.org/ - LATEX 34
![Page 35: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/35.jpg)
MongoDB - Advanced
� Select• $gt, $lt, $gte, $lte, $eq, $neq: >, <, >=, <=, ==, !=
• $in, $nin
• $size, $exists
• group()
• limit()
• skip()
• ...
� Update• $push
• $pull
• $inc
• ...
@crodas - http://crodas.org/ - LATEX 35
![Page 36: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/36.jpg)
pecl install mongo
@crodas - http://crodas.org/ - LATEX 36
![Page 37: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/37.jpg)
MongoDB - Connection
<?php
/* connects to localhost:27017 */$connection = new Mongo();
/* connect to a remote host (default port) */$connection = new Mongo( "example.com" );
/* connect to a remote host at a given port */$connection = new Mongo( "example.com:65432" );
/* select some DB (and create if it doesn’t exits yet) */$db = $connection->selectDB("db name");
?>
@crodas - http://crodas.org/ - LATEX 37
![Page 38: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/38.jpg)
MongoDB - "Tables"
<?php
$db = $connection->selectDB("db name");$table = $db->getCollection("table");
?>
@crodas - http://crodas.org/ - LATEX 38
![Page 39: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/39.jpg)
FROM SQL to MongoDB
@crodas - http://crodas.org/ - LATEX 39
![Page 40: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/40.jpg)
MongoDB - Count
<?php/* SELECT count(*) FROM table */$collection->count();
/* SELECT count(*) FROM table WHERE foo = 1 */$collection->find(array("foo" => 1))->count();
?>
@crodas - http://crodas.org/ - LATEX 40
![Page 41: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/41.jpg)
MongoDB - Queries<?php/** SELECT * FROM table WHERE field IN (5,6,7) and enable=1* and worth < 5* ORDER BY timestamp DESC*/
$collection->ensureIndex(array(’field’=>1, ’enable’=>1, ’worth’=>1, ’timestamp’=>-1)
);
$filter = array(’field’ => array(’$in’ => array(5,6,7),’enable’ => 1,’worth’ => array(’$lt’ => 5)
);$results = $collection->find($filter)->sort(array(’timestamp’ => -1));
@crodas - http://crodas.org/ - LATEX 41
![Page 42: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/42.jpg)
MongoDB - Pagination<?php/** SELECT * FROM table WHERE field IN (5,6,7) and enable=1* and worth < 5* ORDER BY timestamp DESC LIMIT $offset, 20*/$filter = array(
’field’ => array(’$in’ => array(5,6,7),’enable’ => 1,’worth’ => array(’$lt’ => 5)
);
$cursor = $collection->find($filter);$cursor = $cursor->sort(array(’timestamp’ => -1))->skip($offset)->limit(20);
foreach ($cursor as $result) {var dump($result);
}
@crodas - http://crodas.org/ - LATEX 42
![Page 43: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/43.jpg)
Thinking in documents
@crodas - http://crodas.org/ - LATEX 43
![Page 45: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/45.jpg)
MongoDB - Data structure<?php$post = array(
"title" => "...","body" => "...","uri" => "...","comments" => array(
array("email" => "...","name" => "...","comment" => "...",
),),"tags" => array("tag1", "tag2"),
);/* Creating indexes (they’re important) */$collection->ensureIndex("uri");$collection->ensureIndex("comments.email");$collection->ensureIndex("tags");
@crodas - http://crodas.org/ - LATEX 45
![Page 46: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/46.jpg)
MongoDB - Data structure<?php/**** - SELECT * FROM posts WHERE uri = <uri>* - SELECT tags.tag FROM post has tags* INNER JOIN tags ON (tags id == tags.id) WHERE post id = <post id>* - SELECT * FROM comments WHERE post = <post id>*/
$result = $collection->find(array("uri" => "<uri>"));
?>
@crodas - http://crodas.org/ - LATEX 46
![Page 47: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/47.jpg)
MongoDB<?php/**** SELECT posts.* FROM posts INNER* JOIN comments ON (comments.post = posts.id)* WHERE comments.email = ’<email>’**/
$filter = array("comments.email" => ’[email protected]’,
);
$result = $collection->find($filter);
?>
@crodas - http://crodas.org/ - LATEX 47
![Page 48: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/48.jpg)
MongoDB<?php/**** SELECT * FROM posts* WHERE id IN (SELECT posts id FROM posts has tags* INNER JOIN tags ON (tags id == tags.id) WHERE tag = <tag>)**/
$filter = array("tags" => ’<tag>’,
);
$result = $collection->find($filter);
?>
@crodas - http://crodas.org/ - LATEX 48
![Page 49: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/49.jpg)
MongoDB<?php/**** SELECT * FROM posts WHERE id IN (* SELECT post FROM comments GROUP* BY post HAVING count(*) > 10)*/
$filter = array("comments" => array(’$size’ => array(’$gt’ => 10))
);
$result = $collection->find($filter);
?>
@crodas - http://crodas.org/ - LATEX 49
![Page 50: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/50.jpg)
MongoDB<?php/**** SELECT * FROM posts WHERE 10 < (* SELECT count(*) FROM comments* post = posts.id)*//* on insert a comment */$collection->update(
array("uri" => "uri"), // selectarray(’$inc’ => array(’comments size’=>1)) //increment
);
$filter = array("comments size" => array(’$gt’ => 10)
);
$result = $collection->find($filter);
@crodas - http://crodas.org/ - LATEX 50
![Page 51: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/51.jpg)
Map/ReduceExtra time
@crodas - http://crodas.org/ - LATEX 51
![Page 52: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/52.jpg)
Map/Reduce -- Theory<?php
for($i=0; $i < 50; $i++) {$result[$i] = pow($i, 2);
}
var dump($result);
/**** IF pow takes 1 second* 1 process = 50 seconds* 10 process = 5 seconds*/
?>
@crodas - http://crodas.org/ - LATEX 52
![Page 53: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/53.jpg)
Map/Reduce -- Theory II<?php
$data = range(1, 1000);
/* MAP */foreach ($data as $key => $value) {
$n key = $value % 10;/* append */$tmp[$n key][] = $value;
}
/* REDUCE */foreach ($tmp as $key => $value) {
$value = array sum($value);print "{$key} = {$value}\n";
}
@crodas - http://crodas.org/ - LATEX 53
![Page 55: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/55.jpg)
Thank you fellows!
@crodas - http://crodas.org/ - LATEX 55
![Page 56: Thinking in documents](https://reader033.vdocuments.us/reader033/viewer/2022042713/54620833b1af9fba388b4c7a/html5/thumbnails/56.jpg)
@crodas
crodas.org
@crodas - http://crodas.org/ - LATEX 56