cassandra concepts, patterns and anti-patterns

63
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Cassandra concepts, patterns and anti-patterns Dave Gardner @davegardnerisme ApacheCon EU 2012

Upload: dave-gardner

Post on 06-May-2015

9.459 views

Category:

Documents


1 download

DESCRIPTION

An introduction to the fundamental concepts behind Apache Cassandra. This talk explains the engineering principles that make Cassandra such an attractive choice for building highly resilient and available systems and then goes on to explain how to use it - covering basic data modelling patterns and anti-patterns.

TRANSCRIPT

Page 1: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Cassandra concepts, patterns and anti-

patterns

Dave Gardner@davegardnerisme

ApacheCon EU 2012

Page 2: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Agenda

• Choosing NoSQL• Cassandra concepts

(Dynamo and Big Table)• Patterns and anti-patterns

of use

Page 3: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Choosing NoSQL...

Page 4: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

1. Find data store that doesn’t use SQL

2. Anything3. Cram all the things into it4. Triumphantly blog this

success5. Complain a month later when

it bursts into flames

http://www.slideshare.net/rbranson/how-do-i-cassandra/4

Page 5: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

“NoSQL DBs trade off traditional features to better support new and emerging use cases”

http://www.slideshare.net/argv0/riak-use-cases-dissecting-the-solutions-to-hard-problems

Page 6: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

More widely used, tested and documented software..(MySQL first OS release 1998)

.. for a relatively immature product(Cassandra first open-sourced in 2008)

Page 7: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Ad-hoc querying..(SQL join, group by, having, order)

.. for a rich data model with limited ad-hoc querying ability(Cassandra makes you denormalise)

Page 8: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

What do we get in return?

Page 9: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Proven horizontal scalability

Cassandra scales reads and writes linearly as new nodes are added

Page 10: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-

on.html

Page 11: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

High availability

Cassandra is fault-resistant with tunable consistency levels

Page 12: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Low latency, solid performance

Cassandra has very good write performance

Page 13: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

http://blog.cubrid.org/dev-platform/nosql-benchmarking/

* Add pinch of salt

Page 14: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Operational simplicity

Homogenous cluster, no “master” node, no SPOF

Page 15: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Rich data model

Cassandra is more than simple key-value – columns, composites, counters, secondary indexes

Page 16: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Choosing NoSQL...

Page 17: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

“they say … I can’t decide between this project and this project even though they look nothing like each other. And the fact that you can’t decide indicates that you don’t actually have a problem that requires them.”

http://nosqltapes.com/video/benjamin-black-on-nosql-cloud-computing-and-fast_ip(at 30:15)

Page 18: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Or you haven’t learned enough about them..

Page 19: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

• What tradeoffs are you making?

• How is it designed?• What algorithms does it use?• Are the fundamental design

decisions sane?

http://www.alberton.info/nosql_databases_what_when_why_phpuk2011.html

Page 20: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Concepts...

Page 21: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Consistent hashingVector clocks *Gossip protocolHinted handoffRead repair

http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

ColumnarSSTable storage

Append-onlyMemtable

Compaction

http://labs.google.com/papers/bigtable-osdi06.

pdf* not in Cassandra

Amazon Dynamo + Google Big Table

Page 22: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

1

2

Client

tokens are integers from

0 to 2127

Distributed Hash Table

(DHT)

3

4

5

6

Page 23: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

1

2

Client

Coordinator node 3

4

5

6

consistent hashing

Client

Page 24: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

1

2

Client

replication factor (RF) 3

coordinator node 3

4

5

6

Client

Page 25: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Consistency Level (CL)

How many replicas must respond to declare success?

Page 26: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Level Description

ONE 1st Response

QUORUM N/2 + 1 replicas

LOCAL_QUORUM N/2 + 1 replicas in local data centre

EACH_QUORUM N/2 + 1 replicas in each data centre

ALL All replicas

http://wiki.apache.org/cassandra/API#Read

For read operations

Page 27: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Level Description

ANY One node, including hinted handoff

ONE One node

QUORUM N/2 + 1 replicas

LOCAL_QUORUM N/2 + 1 replicas in local data centre

EACH_QUORUM N/2 + 1 replicas in each data centre

ALL All replicas

http://wiki.apache.org/cassandra/API#Write

For write operations

Page 28: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

1

2

Client

coordinator node 3

4

5

6

Client

RF = 3CL =

Quorum

Page 29: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Hinted Handoff

A hint is written to the coordinatornode when a replica is down

http://wiki.apache.org/cassandra/HintedHandoff

Page 30: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

1

2

Client

coordinator node 3

4

5

6

Client

RF = 3CL =

Quorum

node offline

hint

Page 31: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Read Repair

Background digest query on-read to find and update out-of-date

replicas*

http://wiki.apache.org/cassandra/ReadRepair

* carried out in the background unless CL:ALL

Page 32: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

1

2

Client

coordinator node 3

4

5

6

Client

RF = 3CL = One

background digest query,

then update out-of-date replicas

Page 33: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Big Table...

Page 34: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

• Sparse column based data model

• SSTable disk storage• Append-only commit log• Memtable (buffer and sort)• Immutable SSTable files• Compaction

http://research.google.com/archive/bigtable-osdi06.pdfhttp://www.slideshare.net/geminimobile/bigtable-4820829

Page 35: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

+ timestamp

Name

Value

Column

Timestamp used for conflict

resolution (last write wins)

Page 36: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Name

Value

Column

Name

Value

Column

Name

Value

Column

we can have millions of columns

*

* theoretically up to 2 billion

Page 37: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Name

Value

Column

Name

Value

Column

Name

Value

Column

Row Key

Row

Page 38: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Column Family

ColumnRow Key Colum

nColum

n

ColumnRow Key Colum

nColum

n

ColumnRow Key Colum

nColum

n

we can have billions of rows

Page 39: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Write Memtable

SSTable

SSTable

SSTable

SSTable

Commit Log

Memory

Disk

Write path buffer writes and sort data

flush on time or size trigger

immutable

Page 40: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Sorted data written to disk in blocks

Each “query” can be answered from a single slice

of disk

Therefore start from your queries and work backwards

Page 41: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Patterns and anti-patterns...

Page 42: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Page 43: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Storing entities as individual columns

under one row

Pattern

Page 44: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

row: USERID1234

name: Daveemail: [email protected]: Developer

Pattern

we can use C* secondary indexes to fetch all users with job=developer

one row per user

Page 45: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Storing whole entity as single

column blob

Anti-pattern

Page 46: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

row: USERID1234

data: {"name":"Dave",

"email":"[email protected]", "job":"Developer"}

now we can’t use secondary indexes nor easily update safely

one row per user

Anti-pattern

Page 47: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Mutate just the changes to

entities, make use of C* conflict

resolution

Pattern

Page 48: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

$userCf->insert( "USER1234", array("job" => "Cruft") );

Pattern

we only update the “job” column, avoiding any race conditions on reading all properties and then writing all, having only updated one

Page 49: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Lock, read, update

Anti-pattern

Page 50: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Don’t overwrite anything; store as time series data

Pattern

Page 51: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

row: USERID1234

a384cff0-26c1-11e2-81c1-0800200c9a66{"action":"create", "name":"Dave"}10dc4c40-26c2-11e2-81c1-0800200c9a66{"action":"update", "name":"foo"}

Pattern

column name is a type 1 UUID (time based)http://www.famkruithof.net/guid-uuid-timebased.html

one row per user; many columns (wide row)

Page 52: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

We can store all sorts of stuff as

time series

http://rubyscale.com/2011/basic-time-series-with-cassandra/

Pattern

Page 53: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Order Preserving Paritioner (OPP)

http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

Anti-pattern

Page 54: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Distributed counters

Pattern

Page 55: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Super Columns(a trap for the unwary)

http://rubyscale.com/2010/beware-the-supercolumn-its-a-trap-for-the-unwary/

Anti-pattern

Page 56: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

In conclusion...

Page 57: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Cassandra is founded on sound design

principles

Page 58: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

The data model is incredibly powerful

Page 59: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

CQL and a new breedof clients are making

it easier to use

Page 60: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Lots of tools and integrations exist to

expand the feature set

Page 61: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

There is a strongcommunity and

multiple companies offering professional

support

Page 62: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Thanks

Learn more about Cassandra (if you’re ever in London)meetup.com/Cassandra-London

Learn more about the fundamentalshttp://nosqlsummer.org/

Watch videos from Cassandra SF 2011http://www.datastax.com/events/cassandrasf2011/presentations

looking for a job?

Page 63: Cassandra concepts, patterns and anti-patterns

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Extending functionality

Search via Apache Solr and DataStax Enterprisehttp://www.datastax.com/technologies/solr

Batch processing via Apache Hadoop and DataStax Enterprisehttp://www.datastax.com/technologies/hadoop

Real-time analytics via Acunu Reflexhttp://www.acunu.com/acunu-analytics.html