cassandra intro -- theedge2012

72
Practical Introduction To March 2012 #theedge2012 Sonia Margulis @robosonia

Upload: robosonia-mar

Post on 03-Jul-2015

2.182 views

Category:

Technology


3 download

DESCRIPTION

This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability. I gave this talk at TheEdge conference.

TRANSCRIPT

Page 1: Cassandra Intro -- TheEdge2012

Practical IntroductionTo

March 2012

#theedge2012

Sonia Margulis

@robosonia

Page 2: Cassandra Intro -- TheEdge2012

Your Application

Page 3: Cassandra Intro -- TheEdge2012

Gone Viral

Page 4: Cassandra Intro -- TheEdge2012

Best Hardware Money Can Buy

Page 5: Cassandra Intro -- TheEdge2012

Improve Reads

Page 6: Cassandra Intro -- TheEdge2012

Sharding RDBMS – A Nightmare

Page 7: Cassandra Intro -- TheEdge2012

Cassandra’s Sweet Spot

Inherently Clustered

Many concurrent

users

Linear Scalability

High Volumes of Operations

Distributed

Page 8: Cassandra Intro -- TheEdge2012

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Introduction to Cassandra

Page 9: Cassandra Intro -- TheEdge2012

A non-relational database

Values availability

Scales out, not up

Open source

Active community

Page 10: Cassandra Intro -- TheEdge2012

AlwaysAvailable

Page 11: Cassandra Intro -- TheEdge2012

Who Uses It?

Page 12: Cassandra Intro -- TheEdge2012

Use Case: Social & Timelines

Page 13: Cassandra Intro -- TheEdge2012

Logs by Rick Payette

Use Case: Statistics & Logs

Page 14: Cassandra Intro -- TheEdge2012

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Running a Server

Page 15: Cassandra Intro -- TheEdge2012

The Cassandra Project

» Project

» Runs on:

» Apache License

» Current release: 1.0.8

sonia@hiro:~/apache-cassandra-1.0.8$

You arehere

Page 16: Cassandra Intro -- TheEdge2012

Running a Server

sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra -f

....

Now serving reads.

localhost/127.0.0.1:9160

Page 17: Cassandra Intro -- TheEdge2012

Cassandra command line interface (CLI) tool

sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra-cli –host 127.0.0.1 –port 9160

Connected to: “Test Cluster” on localhost/9160

Welcome to Cassandra CLI version 1.0.8

Connecting to Our Server

Page 18: Cassandra Intro -- TheEdge2012

Cassandra’s equivalent to RDBMSs database

Lets start using it

[default@unknown] create keyspace demo;

[default@unknown] use demo;

[default@demo]

Creating a Keyspace

Page 19: Cassandra Intro -- TheEdge2012

Creating a Column Family

A column family holds data, much like a table in RDBMS.

Start adding data

[default@demo] create column family user;

[default@demo] set user[1][a]=utf8(„foo‟);

[default@demo] set user[2][b]=utf8(„bar‟);

[default@demo] set user[2][c]=utf8(„test‟);

Page 20: Cassandra Intro -- TheEdge2012

Retrieving Data

Retrieving columns by user key[default@demo] get user[2];

(column=b, value=bar)

(column=c, value=test)

Returned 2 results.

Page 21: Cassandra Intro -- TheEdge2012

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Data Model

Page 22: Cassandra Intro -- TheEdge2012

Column

Column Name

Value

Page 23: Cassandra Intro -- TheEdge2012

Column

namePeter Parker

name Peter Parker1

Page 24: Cassandra Intro -- TheEdge2012

Row

name

Peter Parker

residence

New Yorkspiderman

icon

Page 25: Cassandra Intro -- TheEdge2012

Row

Row IdColumns

21

spiderman

name Peter Parkerspiderman

icon name

Peter Parker

residence

New York

Page 26: Cassandra Intro -- TheEdge2012

Column Family

spider-man

batman

name

Peter P

name

Bruce W

name

Bruce B

residence

New York

residence

Gotham

residence

New Yorkhulk

icon

icon

icon

Page 27: Cassandra Intro -- TheEdge2012

Column Family

spider-man

batman

name

Peter P

name

Bruce W

name

Bruce B

residence

New York

residence

Gotham

residence

New Yorkhulk

icon

icon

icon

set user[„spiderman‟][„name‟] = „Peter Parker‟

Column Family

Row id

Column name

Value

Page 28: Cassandra Intro -- TheEdge2012

The Allies Column Family

batmanRobin Alfred

spider-man

Iceman Firestar Iron Man Storm

Page 29: Cassandra Intro -- TheEdge2012

Published Issues Column Family

~2600 columns

. . .1/5/1939 2/3/2012 9/3/2012batman

spider-man

1/8/1962 1/3/2012 8/3/2012. . .

~3800 columns

### ### ###

#########

Page 30: Cassandra Intro -- TheEdge2012

Model Flexibility

FlexibleData Model

Image: photostock / FreeDigitalPhotos.net

Page 31: Cassandra Intro -- TheEdge2012

Keyspace

» Like RDBMS database

» A container for column families

» One keyspace per application, in most cases

[default@unknown] create keyspace demo;

Page 32: Cassandra Intro -- TheEdge2012

Expiring Columns – TTL

set users[„spiredman‟][„passwd_reminder‟] = „abcd‟ with ttl = 7200;

passwd_reminder

abcd

name

Peter P

residence

New York

iconspider-man

passwd_reminder

abcd

7200s = 2 hours

Page 33: Cassandra Intro -- TheEdge2012

Distributed Counters

javaedge.com

sessionsspeakers

1035 3402

incr page_views[„javaedge.com‟][„speakers‟] by 1

get page_views[„javaedge.com‟][„speakers‟]

Page 34: Cassandra Intro -- TheEdge2012

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Communication with the Server: Clients

Page 35: Cassandra Intro -- TheEdge2012

Cassandra Query Language

» Looks a lot like SQL

» Mostly valid SQL

SELECT name, universe

FROM users

WHERE KEY = „hulk‟

INSERT INTO users (KEY, name, universe) VALUES (hulk, Bruce, marvel)

»

Page 36: Cassandra Intro -- TheEdge2012

Advantages of using CQL

» Run ad-hoc queries

» Very familiar, easier to use

» Stable interface

▪ For library developers

▪ For users

Page 37: Cassandra Intro -- TheEdge2012

CQL Example

SELECT name, residence FROM users

SELECT 01/1/2011 .. 1/1/2012

FROM published_issues

WHERE KEY = „spiderman‟

SELECT FIRST 5

FROM allies

WHERE KEY = „spiderman‟

Page 38: Cassandra Intro -- TheEdge2012

CQL Example

SELECT name, residence FROM users

SELECT 01/1/2011 .. 1/1/2012

FROM published_issues

WHERE KEY = „spiderman‟

SELECT FIRST 5

FROM allies

WHERE KEY = „spiderman‟

Page 39: Cassandra Intro -- TheEdge2012

CQL Example

SELECT name, residence FROM users

SELECT FIRST 5

FROM allies

WHERE KEY = „spiderman‟

SELECT 01/1/2011 .. 1/1/2012

FROM published_issues

WHERE KEY = „spiderman‟

Page 40: Cassandra Intro -- TheEdge2012

Cassandra JDBC Driver

import java.sql.*;

Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");

Connection con = DriverManager.getConnection("jdbc:cassandra://localhost:9160/keyspace");

Page 41: Cassandra Intro -- TheEdge2012

Cassandra JDBC Driver

Statement stmt = con.createStatement();

ResultSet rs = stmt.executeQuery(“SELECT name, residence

FROM users

WHERE KEY ='" + key + "'");

Page 42: Cassandra Intro -- TheEdge2012

Cassandra JDBC Driver

JDBC

Page 43: Cassandra Intro -- TheEdge2012

Hector

SliceQuery<...> query =

HFactory.createSliceQuery(keyspace, ...);

query.setRange(startDate, endDate, false, 100) .setColumnFamily("published_issues")

.setKey("spiderman");

QueryResult<ColumnSlice<Date, String>> result =query.execute();

Page 44: Cassandra Intro -- TheEdge2012

Hector: Advanced Features

» Failover support

» Connection pooling

» Load balancing

» JMX counters

» Object mapper

Page 45: Cassandra Intro -- TheEdge2012

Maven plugin

mvn cassandra:start

mvn cassandra:cql-exec

mvn cassandra:stop

Run your tests

Page 46: Cassandra Intro -- TheEdge2012

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Modeling Data

Page 47: Cassandra Intro -- TheEdge2012

Queries First

» Use the same Column Family for data that should be fetched together

▪ Reduces IO

» Consider filtering and ordering

Page 48: Cassandra Intro -- TheEdge2012

Denormalize

» Less seeks - faster reads

» Storing redundant data

▪ Manually handling data integrity

» Disk space is cheaper than seek time

Page 49: Cassandra Intro -- TheEdge2012

Secondary Index

» Requirement:

spidermanicon name

Peter Parker

residence

New York

Find all superheroes that live in New York

Page 50: Cassandra Intro -- TheEdge2012

Secondary Index

» Requirement:

» Good for indexes with low cardinality

spidermanicon name

Peter Parker

residence

New York

Find all superheroes that live in New York

create column family users

... and column_metadata=

[{column_name: residence, index_type: KEYS}];

SELECT nameFROM usersWHERE residence = „New York‟

Page 51: Cassandra Intro -- TheEdge2012

Manually Managed Index

» Requirement:

Find a superhero by name

Page 52: Cassandra Intro -- TheEdge2012

Manually Managed Index

» Requirement:

» Manually maintain an inverted index

Find a superhero by name

Bruce batmanhulk

Keys in users CF

Search term

Peterspiderman

Page 53: Cassandra Intro -- TheEdge2012

Bucketing

hulk_jan_2012

1/1/2012 2/1/2012 4/1/2012

Issue-1 Issue-2 Issue-3

hulk_feb_2012

2/2/2012 28/2/2012 29/2/2012

Issue-4 Issue-5 Issue-6

All issues

By month

Page 54: Cassandra Intro -- TheEdge2012

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Cassandra Cluster

Page 55: Cassandra Intro -- TheEdge2012

Virtual Ring

10

40

6075

90

Page 56: Cassandra Intro -- TheEdge2012

Node Token

10

40

6075

90KeysNode

91-1010

11-4040

41-6060

61-7575

76-9090

Page 57: Cassandra Intro -- TheEdge2012

Node Token

10

40

6075

90

MD5’(hulk) = 20

hulk

Page 58: Cassandra Intro -- TheEdge2012

Node Token

10

40

6075

90

hulkMD5’(hulk) = 20

Page 59: Cassandra Intro -- TheEdge2012

Node Token

10

40

6075

90

hulk

thorMD5’(thor) = 42

Page 60: Cassandra Intro -- TheEdge2012

Node Token

10

40

6075

90

hulk

thor

MD5’(thor) = 42

Page 61: Cassandra Intro -- TheEdge2012

10

40

6075

90

Inter-Node Communication

» Gossip

» FailureDetection

Page 62: Cassandra Intro -- TheEdge2012

Fault Tolerance

» Replication factor

» Hinted Handoff10

40

6075

90

hulk

thor

Page 63: Cassandra Intro -- TheEdge2012

Replication Factor

» Replication factor

» Hinted Handoff10

40

6075

90

hulk

thor

hulk

thor

hulk

thor Replication factor = 3

Page 64: Cassandra Intro -- TheEdge2012

10

40

6075

90

Fault Tolerance

» Replication factor

» Hinted Handoff

Page 65: Cassandra Intro -- TheEdge2012

10

40

6075

90

Hinted Handoff

» Replication factor

» Hinted Handoff

Page 66: Cassandra Intro -- TheEdge2012

10

40

6075

90

Hinted Handoff

» Replication factor

» Hinted Handoff

Page 67: Cassandra Intro -- TheEdge2012

Client Requests

Write Request

10

40

6075

90

Coordinator

Page 68: Cassandra Intro -- TheEdge2012

Consistency Level

10

40

6075

90Write Request

Consistency level = ONE

Page 69: Cassandra Intro -- TheEdge2012

Consistency Level

10

40

6075

90Write Request

Consistency level = ALL

Page 70: Cassandra Intro -- TheEdge2012

The Road to Mastership

Running a Server

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Summary

Growing a Cluster

Page 71: Cassandra Intro -- TheEdge2012
Page 72: Cassandra Intro -- TheEdge2012

Where Do You Sign?

» Cassandra

▪ http://cassandra.apache.com

▪ http://www.datastax.com/

• Docs, tutorials & videos

▪ IRC: #cassandra on freenode

» Hector

▪ https://github.com/rantav/hector

▪ https://github.com/zznate/hector-examples