Download - Deep dive into CQL
©2014 DataStax Confidential. Do not distribute without consent.
@rstml
Rustam Aliyev Solution Architect
Deep dive into CQL and CQL improvements in Cassandra 2.1
1
What is CQL? * Cassandra Query Language (CQL)
* SQL-like language for communicating with Cassandra
* Simpler than the Thrift API
* An abstraction layer that hides implementation details
This is what we want to understand
Use Case * Messaging Application
* Group Conversations
* Attachments
Simple CQL Table
CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) );
TimeUUID * Also known as a Version 1 UUID
* Sortable
Timestamp to Microsecond + UUID = TimeUUID
04d580b0-9412-11e3-baa8-0800200c9a66 12 February 2014 13:18:06 GMT
http://www.famkruithof.net/uuid/uuidgen"
=
Primary Key
CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender uuid, PRIMARY KEY (conversation_id, message_id) );
Partition Key Clustering Column
* Also Primary Index
Partition Key conversation_id: 04d580b0-9412-…9a66
Replica * Determines partition (and replicas)
* Remaining columns are stored on the determined partition
RF=3
Clustering Column
Merged, Sorted and Stored Sequentially
04d580b0-9412-…9a66
2013-04-03 07:01:00 content: Hi! sender: [email protected]
2013-04-03 07:03:20 content: Hello! Sender: tom@example…
2013-04-03 07:04:52 content: Where are you? sender: [email protected]
2013-04-03 07:05:01 content: in Istanbul sender: tom@example…
2013-04-03 07:06:32 content: wow! how come sender: [email protected]
* Data on disk is ordered based on Clustering Column
* Efficient retrieval with range queries (slice)
SELECT * FROM messages WHERE conversation_id = '04d580b0-9412-…9a66' AND message_id > minTimeuuid('2013-04-03 07:04:00') AND message_id < maxTimeuuid('2013-04-03 07:10:00');
Data on Disk
Partition Key (Row Key)
Column Name 1 Column Value 1
Column Name 2 Column Value 2
Column Name 3 Column Value 3
...
Column Name N Column Value N
Data on Disk
04d580b0-9412-3a00-93d1-46196ee79a66
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:content Hello!
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]
...
Clustering Column (message_id) Column Name Column Value
Partition Key (conversation_id)
INSERT INTO messages (conversation_id, message_id, content, sender) VALUES (04d580b0-‐9412-‐3a00-‐93d1-‐46196ee79a66, 2f3feb0f-‐9c24-‐11e2-‐7f7f-‐7f7f7f7f7f7f, 'Hello!', '[email protected]');
Order of Clustering Keys
CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) ) WITH CLUSTERING ORDER BY (message_id DESC);
* We need only most recent N messages
* Storing messages in reverse TimeUUID order will speedup queries
Static Columns
CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) );
* Let’s add conversation owner (admin)
* Owner is related to conversation (Partition Key) not message (Clustering Key)
Static Columns
UPDATE messages SET conversation_owner = '[email protected]' WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66;
* Same UPDATE with non-static field will fail
Static Columns on Disk
04d580b0-9412-3a00-93d1-46196ee79a66
:null:conversation_owner [email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:content Hello!
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]
...
Static Column
Collections: Set
CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, PRIMARY KEY (conversation_id, message_id) );
* We want to keep message recipients
* List of recipients may vary as people join and leave conversation
Collections: Set UPDATE messages SET recipients = {'[email protected]', '[email protected]'} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;
Set on Disk
04d580b0-9412-3a00-93d1-46196ee79a66
:null:conversation_owner [email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:
...
Set
Collections: Map
CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, attachments map<text,text>, PRIMARY KEY (conversation_id, message_id) );
* Let’s add attachments to message
* Each attachment would have name and location (URI)
Collections: Map
UPDATE messages SET attachments = {'picture.png':'http://cdn.exmpl.com/1234.png', 'audio.wav':'http://cdn.exmpl.com/5678.wav'} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;
Map on Disk 04d580b0-9412-3a00-93d1-46196ee79a66
:null:conversation_owner [email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:attachments:picture.png http://cdn.exmpl.com/1234.png
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:attachments:audio.wav http://cdn.exmpl.com/5678.wav
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:
...
Map Name Key Value
Collections: List
CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, attachments map<text,text>, seen_by list<text>, PRIMARY KEY (conversation_id, message_id) );
* We want to know which participants have seen message and preserve order
Collections: List UPDATE messages SET seen_by = ['[email protected]', '[email protected]'] WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;
List on Disk 04d580b0-9412-3a00-93d1-46196ee79a66
:null:conversation_owner [email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]
dbcd9d0f-...-7f7f-7f7f7f7f7f7f:seen_by:26017c10-f487-11e2-801f-df9895e5d0f8 [email protected]
dbcd9d0f-...-7f7f-7f7f7f7f7f7f:seen_by:26017c11-f487-11e2-801f-df9895e5d0f8 [email protected]
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:
...
List Name Element ID (TimeUUID) Value
User Defined Types (UDT)
CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, seen_by list<text>, attachments map<text,attachment>, PRIMARY KEY (conversation_id, message_id) );
* New in Cassandra 2.1
* Let’s add more attributes to attachments
CREATE TYPE attachment ( size int, mime text, uri text );
User Defined Types UPDATE messages SET attachments = attachments + { 'picture.png': { size: 10240, mime: 'image/png', uri: 'http://cdn.exmpl.com/1234.png' }} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;
UDT on Disk 04d580b0-9412-3a00-93d1-46196ee79a66
:null:conversation_owner [email protected]
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!
dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]
dbcd9d0f-...-7f7f7f7f7f7f:recipient:[email protected]
dbcd9d0f-...-7f7f7f7f7f7f:recipient:[email protected]
dbcd9d0f-...-7f7f7f7f7f7f:attachments:picture.png 10240:'image/png':'http://cdn.exmpl.com/1234.png'
2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:
...
Map Key UDT Value
Secondary Indexes
CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, seen_by list<text>, attachments map<text,text>, PRIMARY KEY (conversation_id, message_id) );
* What if we want to lookup messages by sender?
CREATE INDEX sender_idx ON messages(sender); "
Secondary Indexes
Secondary Indexes Internally
sender_idx { "[email protected]" { 54bbfd0f-9c02-11e2-7f7f-7f7f7f7f7f7f : null, df04610f-9c02-11e2-7f7f-7f7f7f7f7f7f : null }, "[email protected]" { a82e4b0f-9c02-11e2-7f7f-7f7f7f7f7f7f : null } }
* Each node will keep reverse index for local data only
Indexes on Collections
CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, seen_by list<text>, attachments map<text,text>, PRIMARY KEY (conversation_id, message_id) );
* New in Cassandra 2.1
CREATE INDEX recipients_idx ON messages(recipients); "
Indexes on Collections
Way more information
• 5 minute interviews • Use cases • Free training!
www.planetcassandra.org
Questions?