intro to cassandra

©2013 DataStax Confidential. Do not distribute without consent.

Jon Haddad, Luke Tillman Technical Evangelists, DataStax @rustyrazorblade, @LukeTillman

Introduction to Apache Cassandra

1

Small Data• 100's of MB to low GB, single user • sed, awk, grep are great • sqlite • Limitations: • bad for multiple concurrent users (file sharing!)

Medium Data• Fits on 1 machine • RDBMS is fine • postgres • mysql

• Supports hundreds of concurrent users • ACID makes us feel good • Scales vertically

Can RDBMS work for big data?

Replication: ACID is a lie

Client

Master Slave

replication lag

Consistent results? Nope!

Third Normal Form Doesn't Scale• Queries are unpredictable • Users are impatient • Data must be denormalized • If data > memory, you = history • Disk seeks are the worst

(SELECT CONCAT(city_name,', ',region) value, latitude, longitude, id, population, ( 3959 * acos( cos( radians($latitude) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians($longitude) ) + sin( radians($latitude) ) * sin( radians( latitude ) ) ) ) AS distance, CASE region WHEN '$region' THEN 1 ELSE 0 END AS region_match FROM `cities` $where and foo_count > 5 ORDER BY region_match desc, foo_count desc limit 0, 11) UNION (SELECT CONCAT(city_name,', ',region) value, latitude, longitude, id, population, ( 3959 * acos( cos( radians($latitude) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians($longitude) ) + sin( radians($latitude) ) * sin( radians( latitude ) ) ) ) AS distance,

Sharding is a Nightmare• Data is all over the place •No more joins •No more aggregations • Denormalize all the things • Querying secondary indexes

requires hitting every shard • Adding shards requires manually

moving data • Schema changes

High Availability.. not really•Master failover… who's responsible? • Another moving part… • Bolted on hack

•Multi-DC is a mess • Downtime is frequent • Change database settings (innodb buffer

pool, etc) • Drive, power supply failures • OS updates

Summary of Failure• Scaling is a pain • ACID is naive at best • You aren't consistent

• Re-sharding is a manual process •We're going to denormalize for

performance •High availability is complicated,

requires additional operational overhead

Lessons Learned• Consistency is not practical • So we give it up

•Manual sharding & rebalancing is hard • So let's build in

• Every moving part makes systems more complex • So let's simplify our architecture - no more master / slave

• Scaling up is expensive • We want commodity hardware

• Scatter / gather no good • We denormalize for real time query performance • Goal is to always hit 1 machine

What is Apache Cassandra?• Fast Distributed Database •High Availability • Linear Scalability • Predictable Performance •No SPOF •Multi-DC • Commodity Hardware • Easy to manage operationally •Not a drop in replacement for

RDBMS

Hash Ring•No master / slave / replica sets •No config servers, zookeeper • Data is partitioned around the ring • Data is replicated to RF=N servers • All nodes hold data and can answer

queries (both reads & writes) • Location of data on ring is

determined by partition key

CAP Tradeoffs

• Cassandra chooses Availability & Partition Tolerance over Consistency • Queries have tunable consistency level • ALL, QUORUM, ONE •Hinted Handoff to deal with failed nodes

Data Modeling

Data Structures• Like an RDBMS, Cassandra uses a Table to

store data • But there’s where the similarities end • Partitions within tables • Rows within partitions (or a single row) • CQL to create tables & query data • Partition keys determine where a partition

is found • Clustering keys determine ordering of rows

within a partition

Table

Partition

Row

Keyspace

Example: Single Row Partition• Simple User system • Identified by name (pk) • 1 Row per partition • This is familiar territory

name age job

jon 33 evangelist

luke 33 evangelist

old pete 108 retired

s. seagal 62 actor

JCVD 53 actor

cqlsh:demo> select * from user WHERE name = 'JCVD'

cqlsh:demo> create table user (name text primary key, age int, job text);

Example: Multiple Rows

• Comments on photos • Comments are always selected by

the photo_id • There are only 4 rows in 2 partitions • In the real world, use UUIDs instead

of int for PK

photo_id comment_id user comment

5 1 jon hi

5 2 luke oh hey

5 3 JCVD AHHHHH!!!

6 4 jon great pic

select * from comment where photo_id=5

create table comment ( photo_id int, comment_id int, user text, comment text, primary key (photo_id, comment_id));

Partition with Clustering

photo_id comment_id user comment comment_id user comment comment_id user comment

5 1 jon hi 2 luke oh hey 3 JCVD AHHHHH!!!

6 4 jon great pic

•Multiple rows are transposed into a single partition • Partitions vary in size • Old terminology - "wide row"

Model Tables to Answer Queries• This is not 3NF!! •We always query by partition key • Create many tables aka

materialized views •Manage in your app code • Denormalize!!

user age

jon 33

luke 33

JCVD 53

age user user

33 jon luke

53 JCVD

CREATE TABLE age_to_user ( age int, user text, primary key (age, user));

CQL Data Types

Basic Types Collections

text uuid counter map

int timeuuid list

decimal set

blob

Read the CQL documentation for the full list of types

Reads & Writes

The Write Path

•Writes are written to any node in the cluster (coordinator) •Writes are written to commit log, then to

memtable • Every write includes a timestamp •Memtable flushed to disk periodically

(sstable) •New memtable is created in memory • Deletes are a special write case, called a

“tombstone”

What is an SSTable?• Immutable data file for row storage • Deletes are written as tombstones • Every write includes a timestamp of when it

was written • Partition is spread across multiple SSTables • Same column can be in multiple SSTables •Merged through compaction, only latest

timestamp is kept • Easy backups!

sstable sstable sstable

sstable

The Read Path• Any server may be queried, it acts as the

coordinator • Contacts nodes with the requested key • On each node, data is pulled from

SSTables and merged • Consistency< ALL performs read repair

in background (read_repair_chance)

Analytics with Spark

Spark at a Glance• Scala, Python, Java •Hadoop alternative - batch analytics • Distributed SQL • Real time analytics via streaming •Machine learning • GraphX (in progress) • Open source connector available • Built into DSE

Picking a distribution

Open Source• Latest, bleeding edge features • File JIRAs • Support via mailing list & IRC • Fix bugs • cassandra.apache.org • Perfect for hacking

http://cassandra.apache.org

DataStax Enterprise• Integrated Multi-DC Solr • Integrated Spark • Free Startup Program • <3MM rev & <$30M funding

• Extended support • Additional QA • Focused on stable releases for enterprise • Included on USB drive

Summary•How do I query my data if I can only

query by key? • Denormalize! • Create multiple views into your data

(multiple tables) • Cassandra is built for fast writes • Use fast writes to do as few reads as

possible • Use Spark for advanced analytics and

real time analysis

intro to cassandra

Technology