hrx meetup group 8/20/2014: cassandra and how to scale your database

Post on 15-Jan-2015

712 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

HR5 alum Stephen Portanova will be presenting on the highly scalable database Cassandra, which is used by Reddit, Netflix, CERN, and The Weather Channel. 'nuff said.

TRANSCRIPT

CassandraPretty Cool

HistoryGoogle Big Table

Amazon Dynamo

Today

Why Should You Care● Horizontal Scaling (basically auto sharding)

● Multiple Nodes - Highly Available

● Really Fast Writes

● Not too shabby at reads either - SLICES!!

● Bright Future

The Cluster

● replication factor (rf)● read consistency (r)● write consistency (w)● clustering - shard on

partition key

The One Ring

Storage - Vnodes

Data Model

● Wide rows

● Slices Queries

● Denormalization

● Index tables

CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY(user_id));

Data Model - Simple Key

ROW KEY

Data Model - Simple InsertsINSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my place’);

INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘999’, ‘wat‘, ‘horse@b.com‘, ‘giraffe@b.com‘, ‘is going on?’);

Data Model Simple Inserts Result

Select * from email_app.emails;

111subject to_add cc body

wat horse@ giraffe@ is going on999

subject to_add cc body

party cat@ hippo@ at my place

Mental Model - Nested Hash

111

to cc bodyColumn Values

Row Keys 999

subject to cc bodysubject

INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my place’);

Data Model - Simple Insert - Again

111 subject to_add cc body

party cat@ hippo@ at my place

subject to_add cc body

wat horse@ giraffe@ Is going on?999IDEMPOTENT

CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY(user_id, subject));

Data Model - Composite Key 1

ROW KEY CLUSTERING KEY

INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my place’);

Data Model - Composite Insert 1

Same as Before. Right???

Data Model Composite Insert Result

Select * from emails WHERE user_id = 111;

111 party|to_ad party|cc party|body

cat@ hippo@ At my place

Subject

Mental Model - Nested Hash

111

to_add cc bodyColumn Values

Row Key

partyClustering Column

user_id

subject

INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ’swim’, ‘cat@b.com‘, ‘hippo@b.com‘, ‘in the pool’);

Data Model - Composite Insert 2

Composite Insert 2 Result

Select * from emails WHERE user_id = ‘111’;

111 party|to_add party|cc party|body

cat@ hippo@ at my place

Subject

swim|to_add swim|cc swim|body

cat@ hippo@b in the pool

Sorted by clustering column - “subject”

Mental Model - Nested Sorted Hash

111

party

to cc body

Clustering Column

Column Values

Row Key

swim

to cc body

subject

user_id

Why sorted?

SELECT * FROM emails WHERE user_id = '111' AND (subject) >= ('s') AND (subject) < (‘t’);

111 party|to_add party|cc party|body

cat@ giraffe@ At my place

SLICE QUERIES!!

swim|to_add swim|cc swim|body

cat@ hippo@b in the pool

CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY((user_id, subject), to_add));

DM - Compound Composite Key

ROW KEY CLUSTERING KEY

INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘wat‘, ‘horse@b.com‘, ‘giraffe@b.com‘, ‘is going on?’);

INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my place’);

Composite / Compound Inserts

Composite Insert 2 Result

SELECT * FROM emails WHERE user_id = ‘111’AND subject = ‘party’;

111:partycat@|cc cat@|body

hippo@ At my place

SELECT * FROM emails WHERE user_id = ‘111’;

to_add

Data Model - Composite Insert 1

SELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’;

111:partycat@|cc cat@...|body

giraffe@ At my place

dog@|cc dog@|body

hippo@b all the time

Sorting / slice on - “to_add”

INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘dog@b.com‘, ‘hippo@b.com‘, ‘all the time’);

to_add

CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY((user_id, subject), to_add, cc));

DM - Compound Composite Key 2

ROW KEY CLUSTERING KEYS

Composite / Clustered InsertsINSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘dog@b.com‘, ‘hippo@b.com‘, ‘all the time);

INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b.com‘, ‘At my place’);

INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘mouse@b.com‘, ‘At my place’);

DM - Composite / Clustered InsertsSELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’;

111|partycat@|hippo@|body cat@|mouse@|body

at my place at my place

dog@|hippo@|body

all the time

Slice on (to_add) OR (to_add, cc)

Mental Model - Nested Sorted Hash

111|party

cat dog

hippo mouse hippo

body body body

Clustering Columns

Column Values

Row Key

to_add

cc

user_id +subject

Part 2 / 8 of this 7 hour talk

● Denormalization

● Index Column Families

● Cassandra Internals (memtables, SSTables, compaction, repair)

Part 8 / 8: The Future

● Continually improving● More and more adoption● Awesome projects● http://www.datastax.

com/documentation/cassandra/2.0/pdf/cassandra20.pdf

● http://planetcassandra.org/

top related