cassandra community webinar: back to basics with cql3

43
Back to Basics with CQL3 Matt Overstreet OpenSource Connections OpenSource Connections

Upload: datastax

Post on 26-Jan-2015

103 views

Category:

Technology


0 download

DESCRIPTION

Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).

TRANSCRIPT

Page 1: Cassandra Community Webinar: Back to Basics with CQL3

Back to Basics with CQL3

Matt OverstreetOpenSource Connections

OpenSource Connections

Page 2: Cassandra Community Webinar: Back to Basics with CQL3

Outline

• Overview• Architecture• Data Modeling• Good At/Bad At• Using Cassandra

OpenSource Connections

Page 3: Cassandra Community Webinar: Back to Basics with CQL3

Outline

• Overview• Architecture• Data Modeling• Good At/Bad At• Using Cassandra

OpenSource Connections

• What is Big Data?• How does Cassandra fit?

Page 4: Cassandra Community Webinar: Back to Basics with CQL3

What is Big Data?

• The three V’s (and a C)

OpenSource Connections

velocityvolumeVariety

Complexity

Page 5: Cassandra Community Webinar: Back to Basics with CQL3

What is Big Data

• Brewer’s CAP theoremo Consistency - all nodes have same world viewo Availability - requests can be servicedo Partition tolerance - network/machine failureo Can’t have all 3 -- Pick 2!

• Exampleso MySQL – Consistent, Availableo HBase – Consistent, Partition Toleranto Cassandra – Available, Partition Tolerant

– and “Tunably Consistent”!

OpenSource Connections

Page 6: Cassandra Community Webinar: Back to Basics with CQL3

What is Big Data?

• Common theme: Denormalize everything!o What’s that?

• JOIN all the tables in the database...• … well not all the tables

o Why?• You can shard database at any point• All related data is co-located

• What this means for youo No joinso No transactions - potential for inconsistencyo Vastly simplified queryingo No data-modeling -- Instead, query-modelingo “Infinite and easy” scaling potential

OpenSource Connections

Page 7: Cassandra Community Webinar: Back to Basics with CQL3

How Does Cassandra Fit?

• No single point of failure• Optimized for writes, still good with reads• Can decide between Consistency and Availably

concerns

OpenSource Connections

Page 8: Cassandra Community Webinar: Back to Basics with CQL3

Outline

• Overview• Architecture• Data Modeling• Good At/Bad At• Using Cassandra

OpenSource Connections

• Ring architecture• Data partitioning

o Operationso Writeso Reads

Page 9: Cassandra Community Webinar: Back to Basics with CQL3

Ring Architecture

• No single point of failure• Nodes talk via gossip• Democratic - all nodes

are equal

OpenSource Connections

Page 10: Cassandra Community Webinar: Back to Basics with CQL3

Data Partitioning

Original partitioning method.

OpenSource Connections

Page 11: Cassandra Community Webinar: Back to Basics with CQL3

Data Partitioning

Flexible partitioning with virtual nodes.

OpenSource Connections

Page 12: Cassandra Community Webinar: Back to Basics with CQL3

Operations: Writes

OpenSource Connections

Requests sent out to nodes and replicants.

Page 13: Cassandra Community Webinar: Back to Basics with CQL3

Operations: Reads

OpenSource Connections

Coordinator node reaches out to relevant replicants.

Page 14: Cassandra Community Webinar: Back to Basics with CQL3

Outline

• Overview• Architecture• Data Modeling• Good At/Bad At• Using Cassandra

OpenSource Connections

• Internals• Cassandra Query Language• Modeling Strategy• Example

Page 15: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model

OpenSource Connections

Keyspace

Page 16: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model

OpenSource Connections

Keyspace

Column Family Column Family

Page 17: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model

OpenSource Connections

Keyspace

Column Family Column Family

Page 18: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model

OpenSource Connections

Keyspace

Column Family Column Family

Page 19: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model

OpenSource Connections

Row Key

Page 20: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model

OpenSource Connections

Row Key

Column Name

Column Value (or Tombstone)

Timestamp

Time-to-live

Column

Page 21: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model

OpenSource Connections

● Row Key, Column Name, Column Value have types

● Column Name has comparator● RowKey has partitioner● Rows can have any number of

columns - even in same column family

● Rows can have many columns● Column Values can be omitted● Time-to-live is useful!● Tombstones

Row Key

Column Name

Column Value (or Tombstone)

Timestamp

Time-to-live

Column

Page 22: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model: Writes

OpenSource Connections

MemTable

CommitLog

Row Cache

● Insert into MemTable

● Dump to CommitLog

● No read● Very Fast!● Blocks on CPU

before O/I!

Key Cache

SSTable

SSTable

SSTable

SSTableKey

CacheKey

CacheKey

Cache

BloomFilter

Page 23: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model: Writes

OpenSource Connections

MemTable

CommitLog

Row Cache

● Insert into MemTable

● Dump to CommitLog

● No read● Very Fast!● Blocks on CPU

before O/I!

Key Cache

SSTable

SSTable

SSTable

SSTableKey

CacheKey

CacheKey

Cache

BloomFilter

Page 24: Cassandra Community Webinar: Back to Basics with CQL3

C* Data Model: Writes

OpenSource Connections

MemTable

CommitLog

Row Cache

● Insert into MemTable

● Dump to CommitLog

● No read● Very Fast!● Blocks on CPU

before O/I!

Key Cache

SSTable

SSTable

SSTable

SSTableKey

CacheKey

CacheKey

Cache

BloomFilter

Page 25: Cassandra Community Webinar: Back to Basics with CQL3

OpenSource Connections

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTable

SSTable

SSTableKey

CacheKey

CacheKey

Cache

BloomFilter

● Get values from Memtable

● Get values from row cache if present

● Otherwise check bloom filter to find appropriate SSTables

● Check Key Cache for fast SSTable Search

● Get values from SSTables● Repopulate Row Cache● Super Fast Col.

retrieval● Fast row slicing

C* Data Model:Reads

Page 26: Cassandra Community Webinar: Back to Basics with CQL3

OpenSource Connections

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTable

SSTable

SSTableKey

CacheKey

CacheKey

Cache

BloomFilter

● Get values from Memtable

● Get values from row cache if present

● Otherwise check bloom filter to find appropriate SSTables

● Check Key Cache for fast SSTable Search

● Get values from SSTables● Repopulate Row Cache● Super Fast Col.

retrieval● Fast row slicing

C* Data Model:Reads

Page 27: Cassandra Community Webinar: Back to Basics with CQL3

OpenSource Connections

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTable

SSTable

SSTableKey

CacheKey

CacheKey

Cache

BloomFilter

● Get values from Memtable

● Get values from row cache if present

● Otherwise check bloom filter to find appropriate SSTables

● Check Key Cache for fast SSTable Search

● Get values from SSTables● Repopulate Row Cache● Super Fast Col.

retrieval● Fast row slicing

C* Data Model:Reads

Page 28: Cassandra Community Webinar: Back to Basics with CQL3

OpenSource Connections

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTable

SSTable

SSTableKey

CacheKey

CacheKey

Cache

BloomFilter

● Get values from Memtable

● Get values from row cache if present

● Otherwise check bloom filter to find appropriate SSTables

● Check Key Cache for fast SSTable Search

● Get values from SSTables● Repopulate Row Cache● Super Fast Col.

retrieval● Fast row slicing

C* Data Model:Reads

Page 29: Cassandra Community Webinar: Back to Basics with CQL3

OpenSource Connections

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTable

SSTable

SSTableKey

CacheKey

CacheKey

Cache

BloomFilter

● Get values from Memtable

● Get values from row cache if present

● Otherwise check bloom filter to find appropriate SSTables

● Check Key Cache for fast SSTable Search

● Get values from SSTables● Repopulate Row Cache● Super Fast Col.

retrieval● Fast row slicing

C* Data Model:Reads

Page 30: Cassandra Community Webinar: Back to Basics with CQL3

OpenSource Connections

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTable

SSTable

SSTableKey

CacheKey

CacheKey

Cache

BloomFilter

● Get values from Memtable

● Get values from row cache if present

● Otherwise check bloom filter to find appropriate SSTables

● Check Key Cache for fast SSTable Search

● Get values from SSTables● Repopulate Row Cache● Super Fast Col.

retrieval● Fast row slicing

C* Data Model:Reads

Page 31: Cassandra Community Webinar: Back to Basics with CQL3

Internals: Twitter Example• 4 ColumnFamilies

o followerso followingo tweetso timeline

OpenSource Connections

Page 32: Cassandra Community Webinar: Back to Basics with CQL3

Internals: Twitter Example• 4 ColumnFamilies

o followerso followingo tweetso timeline

• Nate follows Patriciao SET followers[Patricia][Nate] = ‘’;o SET following[Nate][Patricia] = ‘’;o storing data in column names (not values)o denormalized, redundant!

• Get all Nate’s followerso GET followers[Patricia]o => Nate,Eric,Scott,Matt,Doug,Kateo No JOIN!

OpenSource Connections

Page 33: Cassandra Community Webinar: Back to Basics with CQL3

Internals: Twitter Example

• Nate tweetso SET tweets[Nate][2013-07-19 T 09:20] = “Wonderful morning. This coffee is great.”

o SET tweets[Nate][2013-07-19 T 09:21] = “Oops, smoke is coming out of the SQL server!”

o SET tweets[Nate][2013-07-19 T 09:51] = “Now my coffee is cold :-(”

• Get Nate’s tweetso GET tweets[Nate]

…(what you’d expect)...

OpenSource Connections

Page 34: Cassandra Community Webinar: Back to Basics with CQL3

CQL (Cassandra Query

Language)

CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp );

OpenSource Connections

Page 35: Cassandra Community Webinar: Back to Basics with CQL3

CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp );

INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',’John','1975-09-15');

OpenSource Connections

CQL (Cassandra Query

Language)

Page 36: Cassandra Community Webinar: Back to Basics with CQL3

CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp );

INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),’Berryman’,’John’,’1975-09-15’);

UPDATE users SET firstname = ’John’ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6;

OpenSource Connections

CQL (Cassandra Query

Language)

Page 37: Cassandra Community Webinar: Back to Basics with CQL3

CQL (Cassandra Query

Language)

CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp );

INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',’John','1975-09-15');

UPDATE users SET firstname = 'John’ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6;

SELECT dateofbirth,firstname,lastname FROM users ;

dateofbirth | firstname | lastname--------------------------+-----------+---------- 1975-09-15 00:00:00-0400 | John | Berryman

OpenSource Connections

Page 38: Cassandra Community Webinar: Back to Basics with CQL3

The CQL/Cassandra Mapping

CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name));

OpenSource Connections

Page 39: Cassandra Community Webinar: Back to Basics with CQL3

The CQL/Cassandra Mapping

CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name));

OpenSource Connections

company | name | age | role--------+------+-----+-----OSC | eric | 38 | ceoOSC | john | 37 | devRKG | anya | 29 | leadRKG | ben | 27 | devRKG | chad | 35 | ops

Page 40: Cassandra Community Webinar: Back to Basics with CQL3

The CQL/Cassandra Mapping

CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name));

OpenSource Connections

company | name | age | role--------+------+-----+-----OSC | eric | 38 | ceoOSC | john | 37 | devRKG | anya | 29 | leadRKG | ben | 27 | devRKG | chad | 35 | ops

eric:age eric:role john:age john:role

OSC 38 dev 37 dev

anya:age anya:role ben:age ben:role chad:age chad:role

RKG 29 lead 27 dev 35 ops

Page 41: Cassandra Community Webinar: Back to Basics with CQL3

Modeling Strategy

• Don’t think about the data structure• Do think of the questions you’ll ask• Consider efficient operations for Cassandra

o Writing (4K writes per second per core)o Retrieving a rowo Retrieving a row sliceo Retrieving in natural order (which you control)

• Write the data in the way you will query it• Disk space is cheap• Seperate read-heavy and write-heavy task

o Make wise use of caches

OpenSource Connections

Page 42: Cassandra Community Webinar: Back to Basics with CQL3

Modeling Strategy: Anti-Patterns

• Read-then-write• Heavy deletes

o Scatters dead columns throughout SSTableso Won’t be corrected until first compaction after

gc_grace_seconds (10days)

• Distributed queue• JOIN-like behavior• Super wide-row sneak attack (>2B columns)

OpenSource Connections

Page 43: Cassandra Community Webinar: Back to Basics with CQL3

QUESTIONS?

OpenSource Connections