cassandra community webinar | data model on fire

Patrick McFadin | Chief Evangelist DataStax @PatrickMcFadin

Data Model on Fire

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadin Chief Evangelist/Solution Architect - DataStax

Data Model On Fire

Data Model is King•With 2.0 we now have more choices •Sometimes the data model is only the first part •Understanding the underlying engine helps •You aren’t done until you tune

Load test baby!

Light Weight Transactions

The race is onProcess 1 Process 2

SELECT firstName, lastName!FROM users!WHERE username = 'pmcfadin';

SELECT firstName, lastName!FROM users!WHERE username = 'pmcfadin';

(0 rows)

(0 rows)

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Paul','McFadin',! ['[email protected]'],! 'ea24e13ad95a209ded8912e937d499de',! '2011-06-20 13:51:00');

T0

T1

T2

T3

Got nothing! Good to go!

This one wins

Solution LWTProcess 1

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')!IF NOT EXISTS;

T0

T1 [applied]!-----------! True

•Check performed for record •Paxos ensures exclusive access •applied = true: Success

mailto:[email protected]

Solution LWTProcess 2

T2

T3

[applied] | username | created_date | firstname | lastname !-----------+----------+--------------------------+-----------+----------! False | pmcfadin | 2011-06-20 13:50:00-0700 | Patrick | McFadin

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Paul','McFadin',! ['[email protected]'],! 'ea24e13ad95a209ded8912e937d499de',! '2011-06-20 13:51:00')!IF NOT EXISTS;

•applied = false: Rejected •No record stomping!

LWT Fine Print•Light Weight Transactions solve edge conditions •They have latency cost.

•Be aware

•Load test

•Consider in your data model

!

•Now go shut down that ZooKeeper mess you have!

Form Versioning: Revisited

Form Versioning Pt 1•From “Next top data model” •Great idea, but edge conditions

CREATE TABLE working_version (!! username varchar,!! form_id int,!! version_number int,!! locked_by varchar,!! form_attributes map<varchar,varchar> !! PRIMARY KEY ((username, form_id), version_number)!) WITH CLUSTERING ORDER BY (version_number DESC);

•Each user has a form •Each form needs versioning •Need an exclusive lock on the form

Form Versioning Pt 1

INSERT INTO working_version !(username, form_id, version_number, locked_by, form_attributes)!VALUES ('pmcfadin',1138,1,'',!{'FirstName<text>':'First Name: ',!'LastName<text>':'Last Name: ',!'EmailAddress<text>':'Email Address: ',!'Newsletter<radio>':'Y,N'});

UPDATE working_version !SET locked_by = 'pmcfadin'!WHERE username = 'pmcfadin'!AND form_id = 1138!AND version_number = 1;

INSERT INTO working_version !(username, form_id, version_number, locked_by, form_attributes)!VALUES ('pmcfadin',1138,2,null,!{'FirstName<text>':'First Name: ',!'LastName<text>':'Last Name: ',!'EmailAddress<text>':'Email Address: ',!'Newsletter<checkbox>':'Y'});

1. Insert first version

2. Lock for one user

3. Insert new version. Release lock

Danger Zone

Form Versioning Pt 2

INSERT INTO working_version !(username, form_id, version_number, locked_by, form_attributes)!VALUES ('pmcfadin',1138,1,'pmcfadin',!{'FirstName<text>':'First Name: ',!'LastName<text>':'Last Name: ',!'EmailAddress<text>':'Email Address: ',!'Newsletter<radio>':'Y,N'})!IF NOT EXISTS;

UPDATE working_version !SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: '!WHERE username = 'pmcfadin'!AND form_id = 1138!AND version_number = 1!IF locked_by = 'pmcfadin';

UPDATE working_version !SET form_attributes['EmailAddress<text>'] = 'Email Adx: '!WHERE username = 'pmcfadin'!AND form_id = 1138!AND version_number = 1!IF locked_by = 'dude';

1. Insert first version

Exclusive lock

Accepted

Rejected (sorry dude)

Form Versioning Pt 2•Old way: Edge cases with problems

•Use external locking?

•Take your chances?

!

•New way: Managed expectations (LWT) •Exclusive by existence check

•Continued with IF clause

•Downside: More latency

Fire: Bring it

Cassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf •Three big changes in 2.0 I like

Single pass compaction

Hints to reduce SSTable reads

Faster index reads from off-heap

Why is this important?•Reducing SStable reads mean less seeks •Disk seeks can add up fast •5 seeks on SATA = 60ms of just disk!

Avg Access Time* Rotation Speed

12ms 7200 RPM

7ms 10k RPM

5ms 15k RPM

.04ms SSD

* Source: www.tomshardware.com

Shared storage == Great sadness

http://www.tomshardware.com

Quick Diversion•cfhistograms is your friend •Histograms of statistics per table •Collected...

•per read

•per write

•SSTable flush

•Compaction

nodetool cfhistograms <keyspace> <table>

How do I even read this thing!

Histograms How to

nodetool cfhistograms videodb users!!videodb/users histograms!Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 107 0 0 0 0!2 0 0 0 0 0!10 0 0 0 0 5!250 0 5 0 0 0!800 0 10 50 0 0!1250 0 0 300 5 0

•Unit-less column •Units are assigned by each column •Numerical buckets

Histograms How to


•Per read. How many seeks? •Offset is number of SSTables read •Less == lower read latency •107 reads took 1 seek to satisfy

Histograms How to


•Per write. How fast? •Offset is microseconds

Histograms How to


•Per read. How fast? •Offset is microseconds

Histograms How to


•Per partition (storage row) •Offset is size in bytes •5 partitions are 1250 bytes

Histograms How to

•Per partition (storage row) •Offset is count of cells in partition •5 partitions have 10 cells


Histograms + Data Model•Your data model is the key to success •How do you ensure that?

Test

Measure

Repeat

Real World Example•Real Customer •Needed very tight SLA on reads

•Read response highly variable •Loading data increases latency

Problem

Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 2016550 0 0 0 0!2 2064495 0 0 0 0!3 434526 0 0 0 0!4 51084 0 0 0 0!5 0 0 0 0 0!6 0 0 0 0 0!7 0 0 0 0 0!8 0 0 0 0 0!10 0 0 0 0 1629!12 0 0 0 0 2971!14 0 0 0 0 1286!17 0 0 0 0 68!20 0 0 0 0 188!24 0 0 0 0 101!29 0 0 0 0 50799!35 0 0 0 0 269!42 0 0 0 0 132414!50 0 0 0 0 32943!60 0 0 0 0 62099!72 0 0 0 0 116855!86 0 0 0 0 41562!103 0 0 0 0 42796!124 0 0 0 0 46719!149 0 0 0 0 57693!179 0 0 3 0 27659!215 0 0 18 0 26941!258 0 0 47 0 21589!310 0 0 71 0 19494!372 0 0 141 0 8681!446 0 0 67 0 9499!535 0 0 36466 1629 9360!642 0 0 263829 0 4349!770 0 0 608488 2971 4242!924 0 0 209549 1468 2422!1109 0 0 398845 59 1685!1331 0 0 625099 45105 954!1597 0 0 462636 5731 610!1916 0 0 499920 132391 366!2299 0 0 380787 16265 303!2759 0 0 285323 20015 188!3311 0 0 202417 30980 106!3973 0 0 148920 44973 64!4768 0 0 106452 38502 55!5722 0 0 81533 69479 23!6866 0 0 55470 39218 15!8239 0 0 43512 23027 3!9887 0 0 30810 58498 2!11864 0 0 22375 73629 0!14237 0 0 15148 33444 1!17084 0 0 12047 28321 0!20501 0 0 11298 17021 0!24601 0 0 9652 13072 3!29521 0 0 6715 7790 0!35425 0 0 13788 7764 0!42510 0 0 15322 5890 0!51012 0 0 8585 4046 0!61214 0 0 5041 2973 0!73457 0 0 2892 1954 0!88148 0 0 1543 936 0!105778 0 0 900 661 0!126934 0 0 486 409 0!152321 0 0 285 289 0!

• Compactions behind

• Disk IO problems

• How to optimize?

Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 2045656 0 0 0 0!2 1813961 0 0 0 0!3 70496 0 0 0 0!4 0 0 0 0 0!5 0 0 0 0 0!6 0 0 0 0 0!7 0 0 0 0 0!8 0 0 0 0 0!10 0 0 0 0 47!12 0 0 0 0 860!14 0 0 0 0 393!17 0 0 0 0 50!20 0 0 0 0 0!24 0 0 0 0 21!29 0 0 0 0 34489!35 0 0 0 0 32!42 0 0 0 0 97226!50 0 0 0 0 24490!60 0 0 0 0 47077!72 0 0 0 0 94761!86 0 0 0 0 32559!103 0 0 0 0 33885!124 0 0 0 0 37051!149 0 0 1 0 48429!179 0 0 17 0 23272!215 0 0 95 0 22459!258 0 0 84 0 17953!310 0 0 174 0 16178!372 0 0 53082 0 7123!446 0 0 318074 0 7836!535 0 0 423140 47 7904!642 0 0 382926 0 3552!770 0 0 365670 860 3525!924 0 0 414824 392 1998!1109 0 0 442701 46 1411!1331 0 0 335862 30325 757!1597 0 0 302920 4082 518!1916 0 0 236448 97224 294!2299 0 0 171726 11843 254!2759 0 0 122880 15160 162!3311 0 0 90413 23484 89!3973 0 0 66682 34799 62!4768 0 0 53385 29619 54!5722 0 0 39121 53155 23!6866 0 0 26828 30702 12!8239 0 0 18930 18627 3!9887 0 0 12517 47739 2!11864 0 0 8269 61853 0!14237 0 0 6049 28875 1!17084 0 0 4614 24391 0!20501 0 0 5868 14450 0!24601 0 0 6167 11112 0!29521 0 0 2879 6609 0!35425 0 0 2054 6654 0!42510 0 0 8913 4986 0!51012 0 0 4429 3352 0!61214 0 0 1541 2465 0!73457 0 0 560 1607 0!88148 0 0 192 809 0!105778 0 0 59 523 0!126934 0 0 19 333 0!152321 0 0 0 262 0

2 ms!

Less seeks

• Tuned data disk

• Compactions better

• 1 less seek overall

• Further tuning made it even better!

What about the partition size?

Partition Size•Tuning is an option based on size in bytes •All about the reads

•index_interval •How many samples taken •Lower for faster access but more memory usage

•column_index_size_in_kb •Add column indexes to a row when the data reaches this size

•Partial row reads? Maybe smaller.

Tuning results•Spent a lot of time tuning disk •Played with

•index_interval (Lowered)

•concurrent_reads (Increased)

•column_index_size_in_kb (Lowered)

220 Million Ops/Day

10000 Transactions/Sec Peak

9ms at 95th percentile. Measured at the application!

Offset SSTables Write Latency Read Latency Row Size Column Count!1 27425403 0 0 0 0!2 0 0 0 0 0!3 0 0 0 0 0!4 0 0 1 0 0!5 0 0 24 0 0!6 0 0 56 0 0!7 0 0 92 0 0!8 0 0 283 0 0!10 0 0 2834 0 0!12 0 0 11954 0 0!14 0 0 32621 0 1218345!17 0 0 135311 0 0!20 0 0 314195 0 0!24 0 0 610665 0 0!29 0 0 536736 0 0!35 0 0 162541 0 0!42 0 0 25277 0 0!50 0 0 7847 0 0!60 0 0 5864 0 0!72 0 0 9580 0 0!86 0 0 5517 0 0!103 0 0 3822 0 0!124 0 0 1850 0 0!149 0 0 394 0 0!179 0 0 253 0 0!215 0 0 305 0 0!258 0 0 4657297 0 0!310 0 0 12748409 0 0!372 0 0 7475534 0 0!446 0 0 263549 0 0!535 0 0 217171 0 0!642 0 0 41908 1218345 0!770 0 0 24876 0 0!924 0 0 13566 0 0!1109 0 0 10875 0 0!1331 0 0 9379 0 0!1597 0 0 7111 0 0!1916 0 0 5333 0 0!2299 0 0 5072 0 0!2759 0 0 3987 0 0!3311 0 0 5290 0 0!3973 0 0 5169 0 0!4768 0 0 2867 0 0!5722 0 0 2093 0 0!6866 0 0 3177 0 0!8239 0 0 2161 0 0!9887 0 0 1552 0 0!11864 0 0 1200 0 0!14237 0 0 834 0 0!17084 0 0 1380 0 0!20501 0 0 6219 0 0!24601 0 0 4977 0 0!29521 0 0 2114 0 0!35425 0 0 6479 0 0!42510 0 0 18417 0 0!51012 0 0 5532 0 0

• The two hump problem

• Reads awesome until…

• Reading from disk

!!

• Solution:

• Throttle down compaction

• Tune disk

• Ignore it

Disk + Data Model•Understand the internals

•Size of partition

•Compaction

•Learn how to measure •Load test

*More? My data modeling talks:

The Data Model is Dead, Long Live the Data Model

Become a Super Modeler

The World's Next Top Data Model

!

Thank you! Time for questions...

cassandra community webinar | data model on fire

Technology

set form

users username

form versioning pt

new version

values pmcfadin

existsapplied username

username varchar

date firstname lastname