big data grows up - a (re)introduction to cassandra

Big Data Grows UpA (re)introduction to Cassandra

Robbie Strickland

Who am I?

Robbie StricklandSoftware Development ManagerThe Weather Channel

rostrickland@gmail.com@dont_use_twitter

Who am I?

● Cassandra user/contributor since 2010● … it was at release 0.5 back then● 4 years? Oracle DBA’s aren’t impressed● Done lots of dumb stuff with Cassandra● … and some really awesome stuff too

Cassandra in 2010

Cassandra in 2014

Why Cassandra?

It’s fast:

● No locks● Tunable consistency● Sequential R/W● Decentralized

Why Cassandra?

It scales (linearly):

● Multi data center● No SPOF● DHT● Hadoop integration

Why Cassandra?

It’s fault tolerant:

● Automatic replication● Masterless● Failed nodes

replaced with ease

… a lot in the last year (ish)

What’s different?

What’s new?

● Virtual nodes● O(n) data moved off-heap● CQL3 (and defining schemas)● Native protocol/driver● Collections● Lightweight transactions● Compaction throttling that actually works

What’s gone?

● Manual token management● Supercolumns● Thrift (if you use the native driver)● Directly managing storage rows

What’s still the same?

● Still not an RDBMS● Still no joins (see above)● Still no ad-hoc queries (see above again)● Still requires a denormalized data model (^^)● Still need to know what the heck you’re

Linear scalability without the migraine

Token Management

The old way● 1 token per node● Assigned manually● Adding nodes ==

reassignment of all tokens

● Node rebuild heavily taxes a few nodes

cluster with no vnodes

… enter Vnodes● n tokens per node● Assigned magically● Adding nodes ==

painless● Node rebuild

distributed across many nodes

Dcluster with vnodes

Node rebuild without Vnodes

Node rebuild with Vnodes

because the JVM sometimes sucks

Going Off-heap

Why go off-heap

● GC overhead● JVM no good with big heap sizes● GC overhead● GC overhead● GC overhead

O(n) data structures

● Row cache● Bloom filters● Compression offsets● Partition summary

… all these are moved off-heap

New memory allocation

native

Row cacheBloom filtersCompression offsetsPartition summary

Partition key cache

Or, how to build a killer data store without a crappy interface

Death of a (Thrift) Salesman

Reasons not to ditch Thrift

● Lots of client libraries still use it● You finally got it installed● You didn’t know there was another choice● It sucks less than many alternatives

… in spite of all those benefits, you really should ditch Thrift because:

● It requires your entire result set to fit into RAM on both client and server

● The native protocol is better, faster, and supports all the new features

● Thrift-based client libraries are always a step behind

● It’s going away eventually

… and did I mention ...

It requires your entire result set to fit into RAM

on both client and server!!!

Requesting too much data

really catchy tag line here

Going Native

Native protocol

● It’s binary, making it lighter weight● It supports cursors (FTW!)● It supports prepared statements● Cluster awareness built-in● Either synchronous or asynchronous ops● Only supports CQL-based operations● Can be used side-by-side with Thrift

Native drivers

from DataStax:JavaC#Python

… other community supported drivers available

Native query exampleval insert = session.prepare("INSERT INTO myKsp.myTable (myKey, col1, col2) VALUES (?,?,?)")val select = session.prepare("SELECT * FROM myKsp.myTable WHERE myKey = ?")val cluster = Cluster.builder().addContactPoints(host1, host2, host3)val session = cluster.connect()session.execute(insert.bind(myKey, col1, col2))val result = session.execute(select.bind(myKey))

Or, how to make Cassandra more awesome while simultaneously irritating early adopters

Wait, was that SQL?!!

Introducing CQL3

● Because the first two attempts sucked● Stands for “Cassandra Query Language”● Looks a heck of a lot like SQL● … but isn’t● Substantially lowers the learning curve● … but also makes it easier to screw up● An abstraction over the storage rows

Storage rows[default@unknown] create keyspace Library;[default@unknown] use Library;[default@Library] create column family Books... with comparator=UTF8Type... and key_validation_class=UTF8Type… and default_validation_class=UTF8Type;[default@Library] set Books['Patriot Games']['author'] = 'Tom Clancy';[default@Library] set Books['Patriot Games']['year'] = '1987';[default@Library] list Books;

RowKey: Patriot Games=> (name=author, value=Tom Clancy, timestamp=1393102991499000)=> (name=year, value=1987, timestamp=1393103015955000)

Storage rows - composites[default@Library] create column family Authors... with key_validation_class=UTF8Type... and comparator='CompositeType(LongType,UTF8Type,UTF8Type)'... and default_validation_class=UTF8Type;[default@Library] set Authors['Tom Clancy']['1987:Patriot Games:publisher'] = 'Putnam';[default@Library] set Authors['Tom Clancy']['1987:Patriot Games:ISBN'] = '0-399-13241-4';[default@Library] set Authors['Tom Clancy']['1993:Without Remorse:publisher'] = 'Putnam';[default@Library] set Authors['Tom Clancy']['1993:Without Remorse:ISBN'] = '0-399-13825-0';[default@Library] list Authors;

RowKey: Tom Clancy=> (name=1987:Patriot Games:ISBN, value=0-399-13241-4, timestamp=1393104011458000)=> (name=1987:Patriot Games:publisher, value=Putnam, timestamp=1393103948577000)=> (name=1993:Without Remorse:ISBN, value=0-399-13825-0, timestamp=1393104109214000)=> (name=1993:Without Remorse:publisher, value=Putnam, timestamp=1393104083773000)

CQL - simple introcqlsh> CREATE KEYSPACE Library WITH REPLICATION = {'class':'SimpleStrategy', 'replication_factor':1};cqlsh> use Library;cqlsh:library> CREATE TABLE Books ( ... title varchar, ... author varchar, ... year int, ... PRIMARY KEY (title) ... );cqlsh:library> INSERT INTO Books (title, author, year) VALUES ('Patriot Games', 'Tom Clancy', 1987);cqlsh:library> INSERT INTO Books (title, author, year) VALUES ('Without Remorse', 'Tom Clancy', 1993);

CQL - simple intro

Storage rows:

CQL - composite keyCREATE TABLE Authors (

name varchar,year int,title varchar,publisher varchar,ISBN varchar,PRIMARY KEY (name, year, title)

CQL - composite key

Storage rows:

Keys and Filters

● Ad hoc queries are NOT supported● Query by key● Key must include all potential filter columns● Must include partition key in filter● Subsequent filters must be in order● Only last filter can be a range

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (title))

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (author, title))

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (author, year))

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (year, author))

Secondary Indexes

● Allows query-by-value● CREATE INDEX myIdx ON myTable (myCol)● Works well on low cardinality fields● Won’t scale for high cardinality fields● Don’t overuse it -- not a quick fix for a bad

data model

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (author))CREATE INDEX Books_year ON Books(year)

Composite Partition Keys

● PRIMARY KEY((year, author), title)● Creates a more granular shard key● Can be useful to make certain queries more

efficient, or to better distribute data● Updates sharing a partition key are atomic

and isolated

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY ((year, author), title))

Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (year, author, title))

denormalization done well

Collections

Supported types

● Sets - ordered naturally● Lists - ordered by index● Maps - key/value pairs

Caveats

● Max 64k items in a collection● Max 64k size per item● Collections are read in their entirety, so keep

them small

Set name

Itemvalue

List name Ordering meta data

List item value

Map name

Key Value

(tracing on)

Using tracing

● In cqlsh, “tracing on”● … enjoy!

Example1393126200000

AntipatternCREATE TABLE WorkQueue ( name varchar, time bigint, workItem varchar, PRIMARY KEY (name, time))

… do a bunch of inserts ...SELECT * FROM WorkQueue WHERE name='ToDo' ORDER BY time ASC;DELETE FROM WorkQueue WHERE name=’ToDo’ AND time=[some_time]

Antipattern - enqueue

Antipattern - dequeue

Antipattern

20k tombstones!! 13ms of 17ms spent reading tombstones

(no it’s not ACID)

Lightweight Transactions

Primer

● Supports basic Compare-and-Set ops● Provides linearizable consistency● … aka serial isolation● Uses “Paxos light” under the hood● Still expensive -- four round trips!● For most cases quorum reads/writes will be

sufficient

UsageINSERT INTO Users (login, name)VALUES (‘rs_atl’, ‘Robbie Strickland’)IF NOT EXISTS;

UPDATE UsersSET password=’super_secure_password’WHERE login=’rs_atl’IF reset_token=’some_reset_token’;

Other cool stuff

● Triggers (experimental)● Batching multiple requests● Leveled compaction● Configuration via CQL● Gossip-based rack/DC configuration

Thank you!

Robbie StricklandSoftware Development ManagerThe Weather Channel

rostrickland@gmail.com@dont_use_twitter

big data grows up - a (re)introduction to cassandra

utf8type default

keyspace library default

unknown use library

putnam default

utf8typeand default

storage rows default

tom clancy default

cassandra usercontributor

Technology

a guide to stress testing kafka, spark and cassandra … ·...

state of cassandra, 2012 - nosql | apache cassandra ·...

introduction to cassandra • why spark - apache cassandra |...

cassandra core concepts - cassandra day toronto

cassandra day nyc - cassandra anti patterns

cassandra day atlanta 2016 - monitoring cassandra

mariadb cassandra interoperability cassandra storage engine...

datastax-welcome to cassandra - odbms.org€¦ · what is...

faith that grows and grows

nota - siti personali | libero...

associate)professor)cassandra)l.atherton) deakin...

running cassandra on amazon’s ecs -...

apache cassandra at target - cassandra summit 2014

mariadb cassandra interoperability -...

cabs, cassandra, and hailo (at cassandra eu)

la cassandra day 2015 - testing cassandra

la cassandra day 2015 - cassandra for developers

storage on ec2 (& cassandra), cassandra workshop, berlin...

cassandra summit 2014: cassandra at instagram 2014

introduction to cassandra • why spark + cassandra ... ·...