thinking in documents

57
Thinking in Documents (dropping ACID) CØsar D. Rodas [email protected] http://crodas.org/ PHP Conference 2009 So Paulo, Brasil 1

Upload: cesar-rodas

Post on 11-Nov-2014

4.684 views

Category:

Technology


0 download

DESCRIPTION

Introduction to NoSQL database in general, focusing on MongoDB

TRANSCRIPT

Page 1: Thinking in documents

Thinking in Documents(dropping ACID)

César D. [email protected]://crodas.org/

PHP Conference 2009Sâo Paulo, Brasil

1

Page 2: Thinking in documents

Who is this fellow?

� Paraguayan

� Part of the Google Summer of Code 2008

� PHP Classes Innovation Award winner 2007, 2008

� ... and some other few things

@crodas - http://crodas.org/ - LATEX 2

Page 3: Thinking in documents

Agenda

� How to scale

� The Web’s major bottleneck

� NoSQL databases• Redis

• Tokyo Cabinet

• Cassandra

• CouchDB

• MongoDB

� Thinking in documents• Data behavior

• Complex operations

� PHP Integration (The fun part!)

� Map/Reduce (Extra time)

@crodas - http://crodas.org/ - LATEX 3

Page 4: Thinking in documents

Scaling?

@crodas - http://crodas.org/ - LATEX 4

Page 5: Thinking in documents

Increase computationalpower

@crodas - http://crodas.org/ - LATEX 5

Page 6: Thinking in documents

To make it reliable

@crodas - http://crodas.org/ - LATEX 6

Page 7: Thinking in documents

DISTRIBUTED

@crodas - http://crodas.org/ - LATEX 7

Page 8: Thinking in documents

How to scale

� Buying more hardware (and connectivity)

� Reverses (threaded) proxies

� DNS round robin for your Reverses proxies

� Gearmand

� Memcached

� and.. What about the data?

@crodas - http://crodas.org/ - LATEX 8

Page 9: Thinking in documents

How to scale data?

@crodas - http://crodas.org/ - LATEX 9

Page 10: Thinking in documents

The hardest way

@crodas - http://crodas.org/ - LATEX 10

Page 11: Thinking in documents

Scaling RDBMS - Solutions

� Master - Slave replication

� Multi-Master replication

� Data sharding

� DRDB and Heartbeat (RAID-1 over the network)

@crodas - http://crodas.org/ - LATEX 11

Page 12: Thinking in documents

@crodas - http://crodas.org/ - LATEX 12

Page 13: Thinking in documents

Master-Slave replication

� We need to modify our app

� It worth only if our application is read intense

� It doesn’t spread the data across servers

� Single point of failure

@crodas - http://crodas.org/ - LATEX 13

Page 14: Thinking in documents

Scaling RDBMS - Problems

� SQL

� JOIN

� Autoincrement

� Transactions (ACID)

@crodas - http://crodas.org/ - LATEX 14

Page 15: Thinking in documents

The easiest way

@crodas - http://crodas.org/ - LATEX 15

Page 16: Thinking in documents

StrongConsistency, HighAvailability,Partition-tolerance

Theorem

@crodas - http://crodas.org/ - LATEX 16

Page 17: Thinking in documents

BASEBasically Available, Soft state, Eventually Consistent

@crodas - http://crodas.org/ - LATEX 17

Page 18: Thinking in documents

Everybody is doing it

� Google

� Amazon

� eBay

� Yahoo!

� Facebook

� ...

@crodas - http://crodas.org/ - LATEX 18

Page 19: Thinking in documents

Open implementations

� Cassandra

� Redis

� Tokyo Cabinet/Tyrant

� CouchDB

� MongoDB (FTW!)

� ...

@crodas - http://crodas.org/ - LATEX 19

Page 20: Thinking in documents

Cassandra

� No master (p2p)

� Storage model more like BigTable

� Open source

� Incremental scalable

� PHP interface (with Thrift)

� Never played too much with it.

@crodas - http://crodas.org/ - LATEX 20

Page 21: Thinking in documents

Key-value

@crodas - http://crodas.org/ - LATEX 21

Page 22: Thinking in documents

Key-value

� Fast

� Similar to PHP’s array

� Simple

� Easy to distribute across machines

@crodas - http://crodas.org/ - LATEX 22

Page 23: Thinking in documents

Memcached

� It is a key-value store engine used as a cache.

� No persistence(RAM, uses LRU)

� Lightening fast

� Well supported

� *Everybody* is using it

� Several clients for PHP [even I had wrote one ;-)]

@crodas - http://crodas.org/ - LATEX 23

Page 24: Thinking in documents

Redis

� Very new

� As fast as Memcached

� Persistent to disk

� Very simple protocol

� Support lists and tuples

� Replication

� Operation in the key space

� I loved it!• Until I realised it is in-memory DB

@crodas - http://crodas.org/ - LATEX 24

Page 25: Thinking in documents

Tokyo Tyrant

� Very similar to BerkeleyDB ( dba open() )

� Performs well (I’ve been playing a bit with it)

� Actively developed

� HTTP Interface (+/-)

� Memcached Protocol (++)

� Going to Document-oriented (supports "tables")

@crodas - http://crodas.org/ - LATEX 25

Page 26: Thinking in documents

Document-oriented DB

@crodas - http://crodas.org/ - LATEX 26

Page 27: Thinking in documents

http://www.flickr.com/photos/beglen/152027605/

@crodas - http://crodas.org/ - LATEX 27

Page 28: Thinking in documents

What is a "Document"?

<?php$collection[$id] = array(

"title" => "PHP rules","tags" => array("php", "web"),"body" => "... PHP rules ...","comments" => array(

array("author" => "crodas", "comment" => "Yes it does"),)

);?>

@crodas - http://crodas.org/ - LATEX 28

Page 29: Thinking in documents

Docuement Databases

� Schema free

� Document versioning

� Improved Key-value store

� Great for storing objects

@crodas - http://crodas.org/ - LATEX 29

Page 30: Thinking in documents

@crodas - http://crodas.org/ - LATEX 30

Page 31: Thinking in documents

CouchDB

� Apache project

� Asynchronous replication

� JSON-based (XML free!)

� RESTful interface (might be bad)

� Views are materialized on demand (not Indexes :-( )

� Cool admin

� Safe IO (Append only)

� Distributed (concurrent) by nature (written in Erlang)

@crodas - http://crodas.org/ - LATEX 31

Page 32: Thinking in documents

@crodas - http://crodas.org/ - LATEX 32

Page 33: Thinking in documents

@crodas - http://crodas.org/ - LATEX 33

Page 34: Thinking in documents

MongoDB

� Forgot about its name meaning in Portuguese.

� Fast, Fast, Fast

� JSON and BSON (Binary JSON-ish)

� Asynchronous replication, autosharding

� Support indexes (FTW!)

� Nested documents (FTW!)

� Advanced queries (FTW!)

� Native extension for PHP

@crodas - http://crodas.org/ - LATEX 34

Page 35: Thinking in documents

MongoDB - Advanced

� Select• $gt, $lt, $gte, $lte, $eq, $neq: >, <, >=, <=, ==, !=

• $in, $nin

• $size, $exists

• group()

• limit()

• skip()

• ...

� Update• $push

• $pull

• $inc

• ...

@crodas - http://crodas.org/ - LATEX 35

Page 36: Thinking in documents

pecl install mongo

@crodas - http://crodas.org/ - LATEX 36

Page 37: Thinking in documents

MongoDB - Connection

<?php

/* connects to localhost:27017 */$connection = new Mongo();

/* connect to a remote host (default port) */$connection = new Mongo( "example.com" );

/* connect to a remote host at a given port */$connection = new Mongo( "example.com:65432" );

/* select some DB (and create if it doesn’t exits yet) */$db = $connection->selectDB("db name");

?>

@crodas - http://crodas.org/ - LATEX 37

Page 38: Thinking in documents

MongoDB - "Tables"

<?php

$db = $connection->selectDB("db name");$table = $db->getCollection("table");

?>

@crodas - http://crodas.org/ - LATEX 38

Page 39: Thinking in documents

FROM SQL to MongoDB

@crodas - http://crodas.org/ - LATEX 39

Page 40: Thinking in documents

MongoDB - Count

<?php/* SELECT count(*) FROM table */$collection->count();

/* SELECT count(*) FROM table WHERE foo = 1 */$collection->find(array("foo" => 1))->count();

?>

@crodas - http://crodas.org/ - LATEX 40

Page 41: Thinking in documents

MongoDB - Queries<?php/** SELECT * FROM table WHERE field IN (5,6,7) and enable=1* and worth < 5* ORDER BY timestamp DESC*/

$collection->ensureIndex(array(’field’=>1, ’enable’=>1, ’worth’=>1, ’timestamp’=>-1)

);

$filter = array(’field’ => array(’$in’ => array(5,6,7),’enable’ => 1,’worth’ => array(’$lt’ => 5)

);$results = $collection->find($filter)->sort(array(’timestamp’ => -1));

@crodas - http://crodas.org/ - LATEX 41

Page 42: Thinking in documents

MongoDB - Pagination<?php/** SELECT * FROM table WHERE field IN (5,6,7) and enable=1* and worth < 5* ORDER BY timestamp DESC LIMIT $offset, 20*/$filter = array(

’field’ => array(’$in’ => array(5,6,7),’enable’ => 1,’worth’ => array(’$lt’ => 5)

);

$cursor = $collection->find($filter);$cursor = $cursor->sort(array(’timestamp’ => -1))->skip($offset)->limit(20);

foreach ($cursor as $result) {var dump($result);

}

@crodas - http://crodas.org/ - LATEX 42

Page 43: Thinking in documents

Thinking in documents

@crodas - http://crodas.org/ - LATEX 43

Page 44: Thinking in documents

@crodas - http://crodas.org/ - LATEX 44

Page 45: Thinking in documents

MongoDB - Data structure<?php$post = array(

"title" => "...","body" => "...","uri" => "...","comments" => array(

array("email" => "...","name" => "...","comment" => "...",

),),"tags" => array("tag1", "tag2"),

);/* Creating indexes (they’re important) */$collection->ensureIndex("uri");$collection->ensureIndex("comments.email");$collection->ensureIndex("tags");

@crodas - http://crodas.org/ - LATEX 45

Page 46: Thinking in documents

MongoDB - Data structure<?php/**** - SELECT * FROM posts WHERE uri = <uri>* - SELECT tags.tag FROM post has tags* INNER JOIN tags ON (tags id == tags.id) WHERE post id = <post id>* - SELECT * FROM comments WHERE post = <post id>*/

$result = $collection->find(array("uri" => "<uri>"));

?>

@crodas - http://crodas.org/ - LATEX 46

Page 47: Thinking in documents

MongoDB<?php/**** SELECT posts.* FROM posts INNER* JOIN comments ON (comments.post = posts.id)* WHERE comments.email = ’<email>’**/

$filter = array("comments.email" => ’[email protected]’,

);

$result = $collection->find($filter);

?>

@crodas - http://crodas.org/ - LATEX 47

Page 48: Thinking in documents

MongoDB<?php/**** SELECT * FROM posts* WHERE id IN (SELECT posts id FROM posts has tags* INNER JOIN tags ON (tags id == tags.id) WHERE tag = <tag>)**/

$filter = array("tags" => ’<tag>’,

);

$result = $collection->find($filter);

?>

@crodas - http://crodas.org/ - LATEX 48

Page 49: Thinking in documents

MongoDB<?php/**** SELECT * FROM posts WHERE id IN (* SELECT post FROM comments GROUP* BY post HAVING count(*) > 10)*/

$filter = array("comments" => array(’$size’ => array(’$gt’ => 10))

);

$result = $collection->find($filter);

?>

@crodas - http://crodas.org/ - LATEX 49

Page 50: Thinking in documents

MongoDB<?php/**** SELECT * FROM posts WHERE 10 < (* SELECT count(*) FROM comments* post = posts.id)*//* on insert a comment */$collection->update(

array("uri" => "uri"), // selectarray(’$inc’ => array(’comments size’=>1)) //increment

);

$filter = array("comments size" => array(’$gt’ => 10)

);

$result = $collection->find($filter);

@crodas - http://crodas.org/ - LATEX 50

Page 51: Thinking in documents

Map/ReduceExtra time

@crodas - http://crodas.org/ - LATEX 51

Page 52: Thinking in documents

Map/Reduce -- Theory<?php

for($i=0; $i < 50; $i++) {$result[$i] = pow($i, 2);

}

var dump($result);

/**** IF pow takes 1 second* 1 process = 50 seconds* 10 process = 5 seconds*/

?>

@crodas - http://crodas.org/ - LATEX 52

Page 53: Thinking in documents

Map/Reduce -- Theory II<?php

$data = range(1, 1000);

/* MAP */foreach ($data as $key => $value) {

$n key = $value % 10;/* append */$tmp[$n key][] = $value;

}

/* REDUCE */foreach ($tmp as $key => $value) {

$value = array sum($value);print "{$key} = {$value}\n";

}

@crodas - http://crodas.org/ - LATEX 53

Page 54: Thinking in documents

Questions?

@crodas - http://crodas.org/ - LATEX 54

Page 55: Thinking in documents

Thank you fellows!

@crodas - http://crodas.org/ - LATEX 55

Page 56: Thinking in documents

@crodas

crodas.org

@crodas - http://crodas.org/ - LATEX 56

Page 57: Thinking in documents

Powered by...

@crodas - http://crodas.org/ - LATEX 57