hrx meetup group 8/20/2014: cassandra and how to scale your database
DESCRIPTION
HR5 alum Stephen Portanova will be presenting on the highly scalable database Cassandra, which is used by Reddit, Netflix, CERN, and The Weather Channel. 'nuff said.TRANSCRIPT
CassandraPretty Cool
HistoryGoogle Big Table
Amazon Dynamo
Today
Why Should You Care● Horizontal Scaling (basically auto sharding)
● Multiple Nodes - Highly Available
● Really Fast Writes
● Not too shabby at reads either - SLICES!!
● Bright Future
The Cluster
● replication factor (rf)● read consistency (r)● write consistency (w)● clustering - shard on
partition key
The One Ring
Storage - Vnodes
Data Model
● Wide rows
● Slices Queries
● Denormalization
● Index tables
CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY(user_id));
Data Model - Simple Key
ROW KEY
Data Model - Simple InsertsINSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘[email protected]‘, ‘[email protected]‘, ‘at my place’);
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘999’, ‘wat‘, ‘[email protected]‘, ‘[email protected]‘, ‘is going on?’);
Data Model Simple Inserts Result
Select * from email_app.emails;
111subject to_add cc body
wat horse@ giraffe@ is going on999
subject to_add cc body
party cat@ hippo@ at my place
Mental Model - Nested Hash
111
to cc bodyColumn Values
Row Keys 999
subject to cc bodysubject
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘[email protected]‘, ‘[email protected]‘, ‘at my place’);
Data Model - Simple Insert - Again
111 subject to_add cc body
party cat@ hippo@ at my place
subject to_add cc body
wat horse@ giraffe@ Is going on?999IDEMPOTENT
CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY(user_id, subject));
Data Model - Composite Key 1
ROW KEY CLUSTERING KEY
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘at my place’);
Data Model - Composite Insert 1
Same as Before. Right???
Data Model Composite Insert Result
Select * from emails WHERE user_id = 111;
111 party|to_ad party|cc party|body
cat@ hippo@ At my place
Subject
Mental Model - Nested Hash
111
to_add cc bodyColumn Values
Row Key
partyClustering Column
user_id
subject
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ’swim’, ‘[email protected]‘, ‘[email protected]‘, ‘in the pool’);
Data Model - Composite Insert 2
Composite Insert 2 Result
Select * from emails WHERE user_id = ‘111’;
111 party|to_add party|cc party|body
cat@ hippo@ at my place
Subject
swim|to_add swim|cc swim|body
cat@ hippo@b in the pool
Sorted by clustering column - “subject”
Mental Model - Nested Sorted Hash
111
party
to cc body
Clustering Column
Column Values
Row Key
swim
to cc body
subject
user_id
Why sorted?
SELECT * FROM emails WHERE user_id = '111' AND (subject) >= ('s') AND (subject) < (‘t’);
111 party|to_add party|cc party|body
cat@ giraffe@ At my place
SLICE QUERIES!!
swim|to_add swim|cc swim|body
cat@ hippo@b in the pool
CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY((user_id, subject), to_add));
DM - Compound Composite Key
ROW KEY CLUSTERING KEY
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘wat‘, ‘[email protected]‘, ‘[email protected]‘, ‘is going on?’);
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘at my place’);
Composite / Compound Inserts
Composite Insert 2 Result
SELECT * FROM emails WHERE user_id = ‘111’AND subject = ‘party’;
111:partycat@|cc cat@|body
hippo@ At my place
SELECT * FROM emails WHERE user_id = ‘111’;
to_add
Data Model - Composite Insert 1
SELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’;
111:partycat@|cc cat@...|body
giraffe@ At my place
dog@|cc dog@|body
hippo@b all the time
Sorting / slice on - “to_add”
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘all the time’);
to_add
CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY((user_id, subject), to_add, cc));
DM - Compound Composite Key 2
ROW KEY CLUSTERING KEYS
Composite / Clustered InsertsINSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘all the time);
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘At my place’);
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘At my place’);
DM - Composite / Clustered InsertsSELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’;
111|partycat@|hippo@|body cat@|mouse@|body
at my place at my place
dog@|hippo@|body
all the time
Slice on (to_add) OR (to_add, cc)
Mental Model - Nested Sorted Hash
111|party
cat dog
hippo mouse hippo
body body body
Clustering Columns
Column Values
Row Key
to_add
cc
user_id +subject
Part 2 / 8 of this 7 hour talk
● Denormalization
● Index Column Families
● Cassandra Internals (memtables, SSTables, compaction, repair)
Part 8 / 8: The Future
● Continually improving● More and more adoption● Awesome projects● http://www.datastax.
com/documentation/cassandra/2.0/pdf/cassandra20.pdf
● http://planetcassandra.org/