cassandra - php

144
Cassandra Integrating Cassandra into your project dinsdag 12 november 13

Upload: mauritsl

Post on 15-Jan-2015

5.287 views

Category:

Technology


1 download

DESCRIPTION

Presentation on integrating Cassandra in PHP projects. 12 november 2013 - PHPMeetup Amersfoort

TRANSCRIPT

Page 1: Cassandra - PHP

CassandraIntegrating Cassandra into your project

dinsdag 12 november 13

Page 2: Cassandra - PHP

Maurits Lawende

• Work at Dutch Open Projects (DOP) since 2007

• Development and technical design for challenging Drupal sites

• Development of SaaS solutions in PHP & NodeJS

dinsdag 12 november 13

Page 3: Cassandra - PHP

ToDoToDay

• Data versus information

• History and usage of Cassandra

• How to use Cassandra

• Developments

dinsdag 12 november 13

Page 4: Cassandra - PHP

Data versus informationCelko, J. (1999). Data and databases

dinsdag 12 november 13

Page 5: Cassandra - PHP

SQL is designed for informationDBMS knows how to use your data

dinsdag 12 november 13

Page 6: Cassandra - PHP

SQL is designed for flexibilityNot even a single line on scalability

dinsdag 12 november 13

Page 7: Cassandra - PHP

SQLnearly 40 years of experience

dinsdag 12 november 13

Page 8: Cassandra - PHP

SQLNever designed for scalability

dinsdag 12 november 13

Page 9: Cassandra - PHP

Alexa top 10• Google

• Facebook

• YouTube

• Yahoo

• Baidu

• Wikipedia

• QQ.com

• LinkedIn

• Live.com

• Twitter

dinsdag 12 november 13

Page 10: Cassandra - PHP

Alexa top 10• Google (BigTable)

• Facebook (MySQL)

• YouTube (MySQL)

• Yahoo

• Baidu (HyperTable)

• Wikipedia (MySQL)

• QQ.com

• LinkedIn (Voldemort)

• Live.com

• Twitter (MySQL)

dinsdag 12 november 13

Page 11: Cassandra - PHP

Cassandra users• Facebook (+ Redis & HBase & MySQL)

• Twitter (+ MySQL)

• Reddit (+ Postgres)

• Digg (+ Redis)

• Bit.ly (+ MongoDB)

• Netflix

dinsdag 12 november 13

Page 12: Cassandra - PHP

Cassandra users• Facebook (+ Redis & HBase & MySQL)

• Twitter (+ MySQL)

• Reddit (+ Postgres)

• Digg (+ Redis)

• Bit.ly (+ MongoDB)

• Netflix

Jeff Hammerbacher

dinsdag 12 november 13

Page 13: Cassandra - PHP

Cassandra users• Facebook (+ Redis & HBase & MySQL)

• Twitter (+ MySQL)

• Reddit (+ Postgres)

• Digg (+ Redis)

• Bit.ly (+ MongoDB)

• Netflix

Jeff Hammerbacherleft Facebook in 2008

dinsdag 12 november 13

Page 14: Cassandra - PHP

Back to basicDon’t think SQL

dinsdag 12 november 13

Page 15: Cassandra - PHP

Key/value storeEvolved towards tables

dinsdag 12 november 13

Page 16: Cassandra - PHP

Just data

• No joins

• Limited sorting capabilities

• No aggregation, grouping, subqueries whatsoever

dinsdag 12 november 13

Page 17: Cassandra - PHP

Schemaless

• Fixed <strike>tables</strike> column families, but;

• Dynamic column names

dinsdag 12 november 13

Page 18: Cassandra - PHP

Operations in Cassandra 1.0

• CREATE KEYSPACE name

• USE name

• CREATE COLUMN FAMILY name

• DROP KEYSPACE name

• DROP COLUMN FAMILY name

dinsdag 12 november 13

Page 19: Cassandra - PHP

Operations in Cassandra 1.0

• SET columnfamily[‘row’][‘column’] = ‘value’;

• GET columnfamily[‘row’]

• LIST columnfamily

• DEL columnfamily[‘row’]

• DEL columnfamily[‘row’][‘column’]

dinsdag 12 november 13

Page 20: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘lastname’] = ‘Lawende’;

dinsdag 12 november 13

Page 21: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘lastname’] = ‘Lawende’;

post

uuid

user

mau

titleFirst post!

firstnameMaurits

lastnameLawende

dinsdag 12 november 13

Page 22: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘lastname’] = ‘Lawende’;

sorted by rowkey, columnname (all ascending)

dinsdag 12 november 13

Page 23: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

dinsdag 12 november 13

Page 24: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

How to get a listof blogs by “mau”?

dinsdag 12 november 13

Page 25: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

How to get a listof blogs by “mau”?

WHERE user = ‘mau’

dinsdag 12 november 13

Page 26: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

How to get a listof blogs by “mau”?

WHERE user = ‘mau’

Bad Request:No indexed columns present in

by-columns clause withEqual operator

dinsdag 12 november 13

Page 27: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

How to get a listof blogs by “mau”?

WHERE user = ‘mau’

Bad Request:No indexed columns present in

by-columns clause withEqual operator

sequal scansare rejected

dinsdag 12 november 13

Page 28: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

How to get a listof blogs by “mau”?

WHERE user = ‘mau’

Bad Request:No indexed columns present in

by-columns clause withEqual operator

Bad Request: Order by is currently only supportedon the clustered columns of the PRIMARY KEY

dinsdag 12 november 13

Page 29: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

How to get a listof blogs by “mau”?

WHERE user = ‘mau’

Bad Request:No indexed columns present in

by-columns clause withEqual operator

Bad Request: Order by is currently only supportedon the clustered columns of the PRIMARY KEY

Bad Request: ORDER BY is only supported when the partition key is restricted by an EQ or an IN.

dinsdag 12 november 13

Page 30: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

How to get a listof blogs by “mau”?

WHERE user = ‘mau’

ORDER BY date DESCLIMIT 10

dinsdag 12 november 13

Page 31: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

How to get a listof blogs by “mau”?

WHERE user = ‘mau’

ORDER BY date DESCLIMIT 10

only possible when user anddate is in primary key

dinsdag 12 november 13

Page 32: Cassandra - PHP

Predictable performanceNo performance degradation after data growth

dinsdag 12 november 13

Page 33: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘post001’] = ‘uuid’;

• user[‘mau’][‘post002’] = ‘uuid’;

dinsdag 12 november 13

Page 34: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘mau’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘post001’] = ‘uuid’;

• user[‘mau’][‘post002’] = ‘uuid’;

any order and limit

dinsdag 12 november 13

Page 35: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘uuid’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘post001’] = ‘uuid’;

• user[‘mau’][‘post002’] = ‘uuid’;

join

dinsdag 12 november 13

Page 36: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• post[‘uuid’][‘user’] = ‘uuid’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘post001’] = ‘uuid’;

• user[‘mau’][‘post002’] = ‘uuid’;

join

no uuid IN (...) or OR’s

dinsdag 12 november 13

Page 37: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘post001:uuid’] = ‘First post!’;

• user[‘mau’][‘post002:uuid’] = ‘Second post!’;

dinsdag 12 november 13

Page 38: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘post001:uuid’] = ‘First post!’;

• user[‘mau’][‘post002:uuid’] = ‘Second post!’;

only one query requiredto get user profile

with latest posts

dinsdag 12 november 13

Page 39: Cassandra - PHP

Operations in Cassandra 1.0

• post[‘uuid’][‘title’] = ‘First post!’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘post001:uuid’] = ‘First post!’;

• user[‘mau’][‘post002:uuid’] = ‘Second post!’;

64 KB 64 KB 2 GB

2 billion cells

dinsdag 12 november 13

Page 40: Cassandra - PHP

Beauty?

• Dirty in the SQL world, but;

• It’s a best practice in Big Data

• Don’t think of it as a relational database

• No strict rules on how to use it, just push it to the limits

dinsdag 12 november 13

Page 41: Cassandra - PHP

dinsdag 12 november 13

Page 42: Cassandra - PHP

Each row is a snapshot of data meant to satisfy a given query, sort

of like a materialized view.

dinsdag 12 november 13

Page 43: Cassandra - PHP

Storage in a cluster

dinsdag 12 november 13

Page 44: Cassandra - PHP

Cluster structures

dinsdag 12 november 13

Page 45: Cassandra - PHP

Master-slave

dinsdag 12 november 13

Page 46: Cassandra - PHP

Master-master

dinsdag 12 november 13

Page 47: Cassandra - PHP

Sharding

dinsdag 12 november 13

Page 48: Cassandra - PHP

HDFS / GlusterFS

dinsdag 12 november 13

Page 49: Cassandra - PHP

HyperTable

dinsdag 12 november 13

Page 50: Cassandra - PHP

Dynamo

dinsdag 12 november 13

Page 51: Cassandra - PHP

No master or single point of failureEvery node is (nearly) identical

dinsdag 12 november 13

Page 52: Cassandra - PHP

Distribution and replication02^127

dinsdag 12 november 13

Page 53: Cassandra - PHP

Distribution and replication

dinsdag 12 november 13

Page 54: Cassandra - PHP

Distribution and replication

dinsdag 12 november 13

Page 55: Cassandra - PHP

Distribution and replication

dinsdag 12 november 13

Page 56: Cassandra - PHP

Distribution and replication

dinsdag 12 november 13

Page 57: Cassandra - PHP

Distribution and replication

dinsdag 12 november 13

Page 58: Cassandra - PHP

Client can connect to any node

dinsdag 12 november 13

Page 59: Cassandra - PHP

Seed nodes

• Required for bootstrapping nodes

• Define 2 or 3 seed nodes per cluster

dinsdag 12 november 13

Page 60: Cassandra - PHP

Extending the ring

• Assign a token for new node

• Configure seed node host

• Start Cassandra on new node

dinsdag 12 november 13

Page 61: Cassandra - PHP

Extending the ring

• Assign a token for new node

• Configure seed node host

• Start Cassandra on new node

dinsdag 12 november 13

Page 62: Cassandra - PHP

Consistency

dinsdag 12 november 13

Page 63: Cassandra - PHP

Writing data

• Hinted handoff

• Write to commit log

• Write in memory

• Write to disk (together with timestamp)

dinsdag 12 november 13

Page 64: Cassandra - PHP

Write consistency

• Choose from ANY, ONE, TWO, THREE, QUORUM, ALL

• QUORUM = floor((replication factor / 2) + 1)

dinsdag 12 november 13

Page 65: Cassandra - PHP

Read consistency

• Choose from ONE, TWO, THREE, QUORUM, ALL

• Most recent copy is returned

dinsdag 12 november 13

Page 66: Cassandra - PHP

Read repair

• Compares data with 2 other replica’s in the background

• Fixes inconsistent and missing data

• At 10% of all reads

dinsdag 12 november 13

Page 67: Cassandra - PHP

Node repair

• Gradually compares all data in nodes with replica’s

• Required in conjunction with read repair to fix ‘forgotten deletes’

dinsdag 12 november 13

Page 68: Cassandra - PHP

ACID theorem

• Atomic; completed successfully or entirely rolled back

• Consistent; transations never invalidates the database state

• Isolated; transactions are processed sequential

• Durable; completed actions are persistent

dinsdag 12 november 13

Page 69: Cassandra - PHP

CAP theorem

• Consistency

• Availability

• Partition tolerance

Impossible to achieve all three:

dinsdag 12 november 13

Page 70: Cassandra - PHP

Eventual consistencyNot guaranteed to be consistent, but becomes consistent later

dinsdag 12 november 13

Page 71: Cassandra - PHP

Eventual consistency

• Best effort

• Consistency is not always more important than speed and scalability (doesn’t require locking)

• Configurable consistency level, but no transaction support

dinsdag 12 november 13

Page 72: Cassandra - PHP

Surrogate keysSay bye to sequences

dinsdag 12 november 13

Page 73: Cassandra - PHP

Surrogate keysSay bye to sequences

not consistent across cluster

dinsdag 12 november 13

Page 74: Cassandra - PHP

Surrogate keysSay bye to sequences

not consistent across cluster

counters are for counting

dinsdag 12 november 13

Page 75: Cassandra - PHP

Surrogate keysSay bye to sequences

not consistent across cluster

counters are for counting

Native support for uuid’sf47ac10b-58cc-4372-a567-0e02b2c3d479

dinsdag 12 november 13

Page 76: Cassandra - PHP

Cassandra 1.2

dinsdag 12 november 13

Page 77: Cassandra - PHP

Cassandra 1.2

• Not longer schemaless

• Introduced CQL3

• No wide tables anymore

dinsdag 12 november 13

Page 78: Cassandra - PHP

Collections

• Lists

• Maps

• Sets

dinsdag 12 november 13

Page 79: Cassandra - PHP

Lists

• user[‘mau’][‘posts’] = ‘uuid’;

• CREATE TABLE user ( username text PRIMARY KEY, posts list<uuid>);

• UPDATE user SET posts = posts + [‘uuid’]

• UPDATE user SET posts = [‘uuid’] + posts

dinsdag 12 november 13

Page 80: Cassandra - PHP

Set

• CREATE TABLE user ( username text PRIMARY KEY, email set<text>);

• UPDATE user SET emails = emails + {‘[email protected]’}

dinsdag 12 november 13

Page 81: Cassandra - PHP

Maps

• CREATE TABLE user ( username text PRIMARY KEY, attending map<timestamp,text>);

• UPDATE user SET attending[‘2013-11-12’] = ‘PHPMeetup’

• DELETE attending[‘2013-12-05’] FROM user

dinsdag 12 november 13

Page 82: Cassandra - PHP

Limits on collections

• 64K

• Whole collection loaded in memory when reading / writing

• Not an alternative to wide tables!

dinsdag 12 november 13

Page 83: Cassandra - PHP

Limits on collections

• 64K

• Whole collection loaded in memory when reading / writing

• Not an alternative to wide tables!

No size check in CQLSET list = list + [‘...’]

dinsdag 12 november 13

Page 84: Cassandra - PHP

Wide tables in CQL3

• CREATE TABLE tweets ( tweet_id uuid PRIMARY KEY, author varchar, body varchar);

• CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id))

dinsdag 12 november 13

Page 85: Cassandra - PHP

Wide tables in CQL3

• CREATE TABLE tweets ( tweet_id uuid PRIMARY KEY, author varchar, body varchar);

• CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id))

user_idmauuser_idmike

uuid:authoranneuuid:authordavid

uuid:bodyTweet from Anneuuid:bodyTweet from David

dinsdag 12 november 13

Page 86: Cassandra - PHP

Wide tables in CQL3

• CREATE TABLE tweets ( tweet_id uuid PRIMARY KEY, author varchar, body varchar);

• CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id))

user_idmauuser_idmike

uuid:authoranneuuid:authordavid

uuid:bodyTweet from Anneuuid:bodyTweet from David

For schemaless lovers:

CREATE TABLE name ( rowkey varchar, columnname varchar, value blob, PRIMARY KEY (rowkey, columnname));

dinsdag 12 november 13

Page 87: Cassandra - PHP

Secondary index

• CREATE INDEX name ON table (column);

• High memory usage when used with high cardinality

dinsdag 12 november 13

Page 88: Cassandra - PHP

Iteration

• SELECT * FROM users

dinsdag 12 november 13

Page 89: Cassandra - PHP

Iteration

• SELECT * FROM users LIMIT 10 OFFSET 100

unpredictable performance

dinsdag 12 november 13

Page 90: Cassandra - PHP

Iteration

• SELECT * FROM users

• SELECT token(username), username, country, age FROM user

dinsdag 12 november 13

Page 91: Cassandra - PHP

Iteration

• SELECT * FROM users

• SELECT token(username), username, country, age FROM userWHERE token(username) > 23947239 LIMIT 10

dinsdag 12 november 13

Page 92: Cassandra - PHP

Queries are always controlled by one node

dinsdag 12 november 13

Page 93: Cassandra - PHP

Queries are always controlled by one node

Even if data from 100 nodes is involved

dinsdag 12 november 13

Page 94: Cassandra - PHP

MapReduceOr just ‘MapRed’

dinsdag 12 november 13

Page 95: Cassandra - PHP

MapReduce

• array_map

• array_reduce

dinsdag 12 november 13

Page 96: Cassandra - PHP

map()

• Processes a subset of the data

• array_map(function($v) { return strtoupper($v); }, array('a', 'b'))

dinsdag 12 november 13

Page 97: Cassandra - PHP

reduce()

• Merge results from the mapping function

• array_reduce(array(1, 2, 3), function($a, $b) { return $a + $b; });

dinsdag 12 november 13

Page 98: Cassandra - PHP

MapReduce

dinsdag 12 november 13

Page 99: Cassandra - PHP

MapReduce

map() map() map() map()

map() map() map() map()

map()map()map()map()

dinsdag 12 november 13

Page 100: Cassandra - PHP

MapReduce

dinsdag 12 november 13

Page 101: Cassandra - PHP

MapReduce

dinsdag 12 november 13

Page 102: Cassandra - PHP

MapReduce

dinsdag 12 november 13

Page 103: Cassandra - PHP

MapReduce

dinsdag 12 november 13

Page 104: Cassandra - PHP

MapReduce

dinsdag 12 november 13

Page 105: Cassandra - PHP

MapReduce

dinsdag 12 november 13

Page 106: Cassandra - PHP

MapReduce

dinsdag 12 november 13

Page 107: Cassandra - PHP

MapReduce

result

dinsdag 12 november 13

Page 108: Cassandra - PHP

Wordcount$data = array(‘red green blue’, ‘orange blue’, ‘purple green’);

$data = array_map(function($v) { $words = array(); foreach (explode(' ', $v) as $word) $words[$word] = isset($words[$word]) ? $words[$word] + 1 : 1; return $words;}, $data);$data = array_reduce($data, function($a, $b) { foreach ($a as $word => $count) $b[$word] = isset($b[$word]) ? $b[$word] + $count : $count; return $b;}, array());

array(‘red’ => 1, ‘green’ => 2, ‘blue’ => 2, ‘orange’ => 1, ‘purple’ => 1)

dinsdag 12 november 13

Page 109: Cassandra - PHP

ORDER BY value LIMIT 5$data = array(array(4,5,2), array(62,35,1), array(74,56,2,34));

$data = array_map(function($v) { sort($v); return array_slice($v, 0, 5);}, $data);$data = array_reduce($data, function($a, $b) { $v = array_merge($a, $b); sort($v); return array_slice($v, 0, 5);}, array());

array(1, 2, 2, 4, 5)

dinsdag 12 november 13

Page 110: Cassandra - PHP

Remember

• Getting information is a bumpy road in big data

• Use MapRed to transform data into information

dinsdag 12 november 13

Page 111: Cassandra - PHP

MapReduce

• No native support in Cassandra

• MapReduce possible with Hadoop (requires Java programming)

dinsdag 12 november 13

Page 112: Cassandra - PHP

Pig

input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray);

words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;filtered_words = FILTER words BY word MATCHES '\\w+';word_groups = GROUP filtered_words BY word;word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word;ordered_word_count = ORDER word_count BY count DESC;

STORE ordered_word_count INTO '/tmp/number-of-words-on-internet';

dinsdag 12 november 13

Page 113: Cassandra - PHP

Hive

SELECT v['ip'], COUNT(1) AS cnt FROM www_access GROUP BY v['ip'] ORDER BY cnt DESC LIMIT 30

dinsdag 12 november 13

Page 114: Cassandra - PHP

Pig and Hive

• Using MapReduce

• No(t very) predictable performance

• Good for analysis

dinsdag 12 november 13

Page 115: Cassandra - PHP

Hack your own

• Not too difficult

• Data can be split into subsets by filtering on tokens

• Application must run on all MapRed nodes

• Probably better performance than Pig / Hive

dinsdag 12 november 13

Page 116: Cassandra - PHP

dinsdag 12 november 13

Page 117: Cassandra - PHP

Interfaces / protocols

• Thrift

• Binary protocol (1.2+)

• Gossip (internode communication)

dinsdag 12 november 13

Page 118: Cassandra - PHP

Thrift

• Something like SOAP in a binary format

• Tool which generates libraries based on definition files

• Supports many languages (incl. PHP, JS, NodeJS, c, java, python, ruby.....)

• Also used by HyperTable, HBase, Accumulo and ElasticSearch

• Sole interface before 1.2

dinsdag 12 november 13

Page 119: Cassandra - PHP

Thrift

• No support for collections

dinsdag 12 november 13

Page 120: Cassandra - PHP

Binary protocol

• Recommended protocol for Cassandra 1.2

• Few client libraries available

• No binary connectors were available for PHPhttps://github.com/mauritsl/php-cassandra

dinsdag 12 november 13

Page 121: Cassandra - PHP

php-cassandrarequire('lib/cassandra/Cassandra.php');use Cassandra\Connection as Cassandra;

$connection = new Cassandra('localhost', 'keyspace');

$rows = $connection->query('SELECT * FROM user');foreach ($rows as $row) { print $row->firstname; print $row->listfield[0];}

$rows->count();$rows->getColumns();

dinsdag 12 november 13

Page 122: Cassandra - PHP

Scaling applications

dinsdag 12 november 13

Page 123: Cassandra - PHP

Rule 1:Don’t ask for NoSQL drivers for a CMS

dinsdag 12 november 13

Page 124: Cassandra - PHP

Cassandra does not fit all(same story for every NoSQL solution)

dinsdag 12 november 13

Page 125: Cassandra - PHP

Every page (or API call) should only require a few (if not one) query

dinsdag 12 november 13

Page 126: Cassandra - PHP

Static versus Dynamic data

• Static: information that doesn’t change very often

• I.e.: translations

• May go in a RDBMS or local storage (files?)

• Dynamic: many changes

• Changes must be visible on all nodes

• Use Cassandra

dinsdag 12 november 13

Page 127: Cassandra - PHP

Local versus Global data

• Logging

• Separate logs per node

• Cache

• Sometimes no need to share cache between nodes

• Statistics

• Can be kept local for a limited time

dinsdag 12 november 13

Page 128: Cassandra - PHP

Local versus Global data

• Sessions

• Dependent on session stickiness

dinsdag 12 november 13

Page 129: Cassandra - PHP

Caching

• Memcache is recommended for local cache

• Cassandra can be used for global cache

• Has a TTL featureINSERT INTO ... (...) VALUES (...) USING TTL 86400

dinsdag 12 november 13

Page 130: Cassandra - PHP

What about files?

• Use Hadoop Distributed File System (HDFS) or GlusterFS

dinsdag 12 november 13

Page 131: Cassandra - PHP

What about files?

• Use Hadoop Distributed File System (HDFS) or GlusterFS

• Or use Cassandra

dinsdag 12 november 13

Page 132: Cassandra - PHP

What about files?

• Split files in chunks to avoid hotspots and save the heap

• Not uncommon to have files in Cassandra

• github.com/Netflix/astyanax

• GB’s are ok, but do not store TB’s

dinsdag 12 november 13

Page 133: Cassandra - PHP

Maximum size of cluster?

• No satisfactory answer

• Probably more dependent on network equipment

• Rack awareness helps here

• Facebook: 150 node cluster, 50TB data (2010)

• Easou: 400 node cluster, 300TB data (300 million images)

dinsdag 12 november 13

Page 134: Cassandra - PHP

Minimum size of a cluster?

• Can run on a single node

• 4GB RAM recommended

• Runs fine on 1GB RAM

dinsdag 12 november 13

Page 135: Cassandra - PHP

Minimum size of a cluster?

• Can run on a single node

• 4GB RAM recommended

• Runs fine on 1GB RAM“hot data” should fit in RAM

dinsdag 12 november 13

Page 136: Cassandra - PHP

Installing Cassandra

• Install JDKOracle Java recommended but OpenJDK works ok

• Add Cassandra repository

• apt-get install cassandra

• Set listen and seed address (IP address of node and seed)

• (Re)start Cassandra

dinsdag 12 november 13

Page 137: Cassandra - PHP

Last words...

dinsdag 12 november 13

Page 138: Cassandra - PHP

Data versus informationData structure is naturally responsive for information

dinsdag 12 november 13

Page 139: Cassandra - PHP

Data versus informationData structure is naturally responsive for information

predictable performance

dinsdag 12 november 13

Page 140: Cassandra - PHP

History and usageJeff Hammerbacher

dinsdag 12 november 13

Page 141: Cassandra - PHP

How to use itSchema design, CQL3 and limits

dinsdag 12 november 13

Page 142: Cassandra - PHP

DevelopmentsCQL3 and binary protocol

dinsdag 12 november 13

Page 143: Cassandra - PHP

Thank you!

dinsdag 12 november 13

Page 144: Cassandra - PHP

Questions?

dinsdag 12 november 13