cassandra community webinar | data model on fire

32
Patrick McFadin | Chief Evangelist DataStax @PatrickMcFadin Data Model on Fire ©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin Chief Evangelist/Solution Architect - DataStax Data Model On Fire

Upload: datastax

Post on 26-Jan-2015

128 views

Category:

Technology


2 download

DESCRIPTION

Functional data models are great, but how can you squeeze out more performance and make them awesome? Let's talk through some example Cassandra 2.0 models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying Cassandra 2.0 internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?

TRANSCRIPT

Page 1: Cassandra Community Webinar | Data Model on Fire

Patrick McFadin | Chief Evangelist DataStax @PatrickMcFadin

Data Model on Fire

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadin Chief Evangelist/Solution Architect - DataStax

Data Model On Fire

Page 2: Cassandra Community Webinar | Data Model on Fire

Data Model is King•With 2.0 we now have more choices •Sometimes the data model is only the first part •Understanding the underlying engine helps •You aren’t done until you tune

Load test baby!

Page 3: Cassandra Community Webinar | Data Model on Fire

Light Weight Transactions

Page 4: Cassandra Community Webinar | Data Model on Fire

The race is onProcess 1 Process 2

SELECT firstName, lastName!FROM users!WHERE username = 'pmcfadin';

SELECT firstName, lastName!FROM users!WHERE username = 'pmcfadin';

(0 rows)

(0 rows)

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Paul','McFadin',! ['[email protected]'],! 'ea24e13ad95a209ded8912e937d499de',! '2011-06-20 13:51:00');

T0

T1

T2

T3

Got nothing! Good to go!

This one wins

Page 5: Cassandra Community Webinar | Data Model on Fire

Solution LWTProcess 1

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')!IF NOT EXISTS;

T0

T1 [applied]!-----------! True

•Check performed for record •Paxos ensures exclusive access •applied = true: Success

Page 6: Cassandra Community Webinar | Data Model on Fire

Solution LWTProcess 2

T2

T3

[applied] | username | created_date | firstname | lastname !-----------+----------+--------------------------+-----------+----------! False | pmcfadin | 2011-06-20 13:50:00-0700 | Patrick | McFadin

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Paul','McFadin',! ['[email protected]'],! 'ea24e13ad95a209ded8912e937d499de',! '2011-06-20 13:51:00')!IF NOT EXISTS;

•applied = false: Rejected •No record stomping!

Page 7: Cassandra Community Webinar | Data Model on Fire

LWT Fine Print•Light Weight Transactions solve edge conditions •They have latency cost.

•Be aware

•Load test

•Consider in your data model

!

•Now go shut down that ZooKeeper mess you have!

Page 8: Cassandra Community Webinar | Data Model on Fire

Form Versioning: Revisited

Page 9: Cassandra Community Webinar | Data Model on Fire

Form Versioning Pt 1•From “Next top data model” •Great idea, but edge conditions

CREATE TABLE working_version (!! username varchar,!! form_id int,!! version_number int,!! locked_by varchar,!! form_attributes map<varchar,varchar> !! PRIMARY KEY ((username, form_id), version_number)!) WITH CLUSTERING ORDER BY (version_number DESC);

•Each user has a form •Each form needs versioning •Need an exclusive lock on the form

Page 10: Cassandra Community Webinar | Data Model on Fire

Form Versioning Pt 1

INSERT INTO working_version !(username, form_id, version_number, locked_by, form_attributes)!VALUES ('pmcfadin',1138,1,'',!{'FirstName<text>':'First Name: ',!'LastName<text>':'Last Name: ',!'EmailAddress<text>':'Email Address: ',!'Newsletter<radio>':'Y,N'});

UPDATE working_version !SET locked_by = 'pmcfadin'!WHERE username = 'pmcfadin'!AND form_id = 1138!AND version_number = 1;

INSERT INTO working_version !(username, form_id, version_number, locked_by, form_attributes)!VALUES ('pmcfadin',1138,2,null,!{'FirstName<text>':'First Name: ',!'LastName<text>':'Last Name: ',!'EmailAddress<text>':'Email Address: ',!'Newsletter<checkbox>':'Y'});

1. Insert first version

2. Lock for one user

3. Insert new version. Release lock

Danger Zone

Page 11: Cassandra Community Webinar | Data Model on Fire

Form Versioning Pt 2

INSERT INTO working_version !(username, form_id, version_number, locked_by, form_attributes)!VALUES ('pmcfadin',1138,1,'pmcfadin',!{'FirstName<text>':'First Name: ',!'LastName<text>':'Last Name: ',!'EmailAddress<text>':'Email Address: ',!'Newsletter<radio>':'Y,N'})!IF NOT EXISTS;

UPDATE working_version !SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: '!WHERE username = 'pmcfadin'!AND form_id = 1138!AND version_number = 1!IF locked_by = 'pmcfadin';

UPDATE working_version !SET form_attributes['EmailAddress<text>'] = 'Email Adx: '!WHERE username = 'pmcfadin'!AND form_id = 1138!AND version_number = 1!IF locked_by = 'dude';

1. Insert first version

Exclusive lock

Accepted

Rejected (sorry dude)

Page 12: Cassandra Community Webinar | Data Model on Fire

Form Versioning Pt 2•Old way: Edge cases with problems

•Use external locking?

•Take your chances?

!

•New way: Managed expectations (LWT) •Exclusive by existence check

•Continued with IF clause

•Downside: More latency

Page 13: Cassandra Community Webinar | Data Model on Fire

Fire: Bring it

Page 14: Cassandra Community Webinar | Data Model on Fire

Cassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf •Three big changes in 2.0 I like

Single pass compaction

Hints to reduce SSTable reads

Faster index reads from off-heap

Page 15: Cassandra Community Webinar | Data Model on Fire

Why is this important?•Reducing SStable reads mean less seeks •Disk seeks can add up fast •5 seeks on SATA = 60ms of just disk!

Avg Access Time* Rotation Speed

12ms 7200 RPM

7ms 10k RPM

5ms 15k RPM

.04ms SSD

* Source: www.tomshardware.com

Shared storage == Great sadness

Page 16: Cassandra Community Webinar | Data Model on Fire

Quick Diversion•cfhistograms is your friend •Histograms of statistics per table •Collected...

•per read

•per write

•SSTable flush

•Compaction

nodetool cfhistograms <keyspace> <table>

Page 17: Cassandra Community Webinar | Data Model on Fire

How do I even read this thing!

Page 18: Cassandra Community Webinar | Data Model on Fire

Histograms How to

nodetool cfhistograms videodb users!!videodb/users histograms!Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 107 0 0 0 0!2 0 0 0 0 0!10 0 0 0 0 5!250 0 5 0 0 0!800 0 10 50 0 0!1250 0 0 300 5 0

•Unit-less column •Units are assigned by each column •Numerical buckets

Page 19: Cassandra Community Webinar | Data Model on Fire

Histograms How to

nodetool cfhistograms videodb users!!videodb/users histograms!Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 107 0 0 0 0!2 2 0 0 0 0!10 0 0 0 0 5!250 0 5 0 0 0!800 0 10 50 0 0!1250 0 0 300 5 0

•Per read. How many seeks? •Offset is number of SSTables read •Less == lower read latency •107 reads took 1 seek to satisfy

Page 20: Cassandra Community Webinar | Data Model on Fire

Histograms How to

nodetool cfhistograms videodb users!!videodb/users histograms!Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 107 0 0 0 0!2 2 0 0 0 0!10 0 0 0 0 5!250 0 5 0 0 0!800 0 10 50 0 0!1250 0 0 300 5 0

•Per write. How fast? •Offset is microseconds

Page 21: Cassandra Community Webinar | Data Model on Fire

Histograms How to

nodetool cfhistograms videodb users!!videodb/users histograms!Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 107 0 0 0 0!2 2 0 0 0 0!10 0 0 0 0 5!250 0 5 0 0 0!800 0 10 50 0 0!1250 0 0 300 5 0

•Per read. How fast? •Offset is microseconds

Page 22: Cassandra Community Webinar | Data Model on Fire

Histograms How to

nodetool cfhistograms videodb users!!videodb/users histograms!Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 107 0 0 0 0!2 2 0 0 0 0!10 0 0 0 0 5!250 0 5 0 0 0!800 0 10 50 0 0!1250 0 0 300 5 0

•Per partition (storage row) •Offset is size in bytes •5 partitions are 1250 bytes

Page 23: Cassandra Community Webinar | Data Model on Fire

Histograms How to

•Per partition (storage row) •Offset is count of cells in partition •5 partitions have 10 cells

nodetool cfhistograms videodb users!!videodb/users histograms!Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 107 0 0 0 0!2 2 0 0 0 0!10 0 0 0 0 5!250 0 5 0 0 0!800 0 10 50 0 0!1250 0 0 300 5 0

Page 24: Cassandra Community Webinar | Data Model on Fire

Histograms + Data Model•Your data model is the key to success •How do you ensure that?

Test

Measure

Repeat

Page 25: Cassandra Community Webinar | Data Model on Fire

Real World Example•Real Customer •Needed very tight SLA on reads

•Read response highly variable •Loading data increases latency

Problem

Page 26: Cassandra Community Webinar | Data Model on Fire

Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 2016550 0 0 0 0!2 2064495 0 0 0 0!3 434526 0 0 0 0!4 51084 0 0 0 0!5 0 0 0 0 0!6 0 0 0 0 0!7 0 0 0 0 0!8 0 0 0 0 0!10 0 0 0 0 1629!12 0 0 0 0 2971!14 0 0 0 0 1286!17 0 0 0 0 68!20 0 0 0 0 188!24 0 0 0 0 101!29 0 0 0 0 50799!35 0 0 0 0 269!42 0 0 0 0 132414!50 0 0 0 0 32943!60 0 0 0 0 62099!72 0 0 0 0 116855!86 0 0 0 0 41562!103 0 0 0 0 42796!124 0 0 0 0 46719!149 0 0 0 0 57693!179 0 0 3 0 27659!215 0 0 18 0 26941!258 0 0 47 0 21589!310 0 0 71 0 19494!372 0 0 141 0 8681!446 0 0 67 0 9499!535 0 0 36466 1629 9360!642 0 0 263829 0 4349!770 0 0 608488 2971 4242!924 0 0 209549 1468 2422!1109 0 0 398845 59 1685!1331 0 0 625099 45105 954!1597 0 0 462636 5731 610!1916 0 0 499920 132391 366!2299 0 0 380787 16265 303!2759 0 0 285323 20015 188!3311 0 0 202417 30980 106!3973 0 0 148920 44973 64!4768 0 0 106452 38502 55!5722 0 0 81533 69479 23!6866 0 0 55470 39218 15!8239 0 0 43512 23027 3!9887 0 0 30810 58498 2!11864 0 0 22375 73629 0!14237 0 0 15148 33444 1!17084 0 0 12047 28321 0!20501 0 0 11298 17021 0!24601 0 0 9652 13072 3!29521 0 0 6715 7790 0!35425 0 0 13788 7764 0!42510 0 0 15322 5890 0!51012 0 0 8585 4046 0!61214 0 0 5041 2973 0!73457 0 0 2892 1954 0!88148 0 0 1543 936 0!105778 0 0 900 661 0!126934 0 0 486 409 0!152321 0 0 285 289 0!

• Compactions behind

• Disk IO problems

• How to optimize?

Page 27: Cassandra Community Webinar | Data Model on Fire

Offset SSTables Write Latency Read Latency Partition Size Cell Count! (micros) (micros) (bytes)!1 2045656 0 0 0 0!2 1813961 0 0 0 0!3 70496 0 0 0 0!4 0 0 0 0 0!5 0 0 0 0 0!6 0 0 0 0 0!7 0 0 0 0 0!8 0 0 0 0 0!10 0 0 0 0 47!12 0 0 0 0 860!14 0 0 0 0 393!17 0 0 0 0 50!20 0 0 0 0 0!24 0 0 0 0 21!29 0 0 0 0 34489!35 0 0 0 0 32!42 0 0 0 0 97226!50 0 0 0 0 24490!60 0 0 0 0 47077!72 0 0 0 0 94761!86 0 0 0 0 32559!103 0 0 0 0 33885!124 0 0 0 0 37051!149 0 0 1 0 48429!179 0 0 17 0 23272!215 0 0 95 0 22459!258 0 0 84 0 17953!310 0 0 174 0 16178!372 0 0 53082 0 7123!446 0 0 318074 0 7836!535 0 0 423140 47 7904!642 0 0 382926 0 3552!770 0 0 365670 860 3525!924 0 0 414824 392 1998!1109 0 0 442701 46 1411!1331 0 0 335862 30325 757!1597 0 0 302920 4082 518!1916 0 0 236448 97224 294!2299 0 0 171726 11843 254!2759 0 0 122880 15160 162!3311 0 0 90413 23484 89!3973 0 0 66682 34799 62!4768 0 0 53385 29619 54!5722 0 0 39121 53155 23!6866 0 0 26828 30702 12!8239 0 0 18930 18627 3!9887 0 0 12517 47739 2!11864 0 0 8269 61853 0!14237 0 0 6049 28875 1!17084 0 0 4614 24391 0!20501 0 0 5868 14450 0!24601 0 0 6167 11112 0!29521 0 0 2879 6609 0!35425 0 0 2054 6654 0!42510 0 0 8913 4986 0!51012 0 0 4429 3352 0!61214 0 0 1541 2465 0!73457 0 0 560 1607 0!88148 0 0 192 809 0!105778 0 0 59 523 0!126934 0 0 19 333 0!152321 0 0 0 262 0

2 ms!

Less seeks

• Tuned data disk

• Compactions better

• 1 less seek overall

• Further tuning made it even better!

What about the partition size?

Page 28: Cassandra Community Webinar | Data Model on Fire

Partition Size•Tuning is an option based on size in bytes •All about the reads

•index_interval •How many samples taken •Lower for faster access but more memory usage

•column_index_size_in_kb •Add column indexes to a row when the data reaches this size

•Partial row reads? Maybe smaller.

Page 29: Cassandra Community Webinar | Data Model on Fire

Tuning results•Spent a lot of time tuning disk •Played with

•index_interval (Lowered)

•concurrent_reads (Increased)

•column_index_size_in_kb (Lowered)

220 Million Ops/Day

10000 Transactions/Sec Peak

9ms at 95th percentile. Measured at the application!

Page 30: Cassandra Community Webinar | Data Model on Fire

Offset SSTables Write Latency Read Latency Row Size Column Count!1 27425403 0 0 0 0!2 0 0 0 0 0!3 0 0 0 0 0!4 0 0 1 0 0!5 0 0 24 0 0!6 0 0 56 0 0!7 0 0 92 0 0!8 0 0 283 0 0!10 0 0 2834 0 0!12 0 0 11954 0 0!14 0 0 32621 0 1218345!17 0 0 135311 0 0!20 0 0 314195 0 0!24 0 0 610665 0 0!29 0 0 536736 0 0!35 0 0 162541 0 0!42 0 0 25277 0 0!50 0 0 7847 0 0!60 0 0 5864 0 0!72 0 0 9580 0 0!86 0 0 5517 0 0!103 0 0 3822 0 0!124 0 0 1850 0 0!149 0 0 394 0 0!179 0 0 253 0 0!215 0 0 305 0 0!258 0 0 4657297 0 0!310 0 0 12748409 0 0!372 0 0 7475534 0 0!446 0 0 263549 0 0!535 0 0 217171 0 0!642 0 0 41908 1218345 0!770 0 0 24876 0 0!924 0 0 13566 0 0!1109 0 0 10875 0 0!1331 0 0 9379 0 0!1597 0 0 7111 0 0!1916 0 0 5333 0 0!2299 0 0 5072 0 0!2759 0 0 3987 0 0!3311 0 0 5290 0 0!3973 0 0 5169 0 0!4768 0 0 2867 0 0!5722 0 0 2093 0 0!6866 0 0 3177 0 0!8239 0 0 2161 0 0!9887 0 0 1552 0 0!11864 0 0 1200 0 0!14237 0 0 834 0 0!17084 0 0 1380 0 0!20501 0 0 6219 0 0!24601 0 0 4977 0 0!29521 0 0 2114 0 0!35425 0 0 6479 0 0!42510 0 0 18417 0 0!51012 0 0 5532 0 0

• The two hump problem

• Reads awesome until…

• Reading from disk

!!

• Solution:

• Throttle down compaction

• Tune disk

• Ignore it

Page 31: Cassandra Community Webinar | Data Model on Fire

Disk + Data Model•Understand the internals

•Size of partition

•Compaction

•Learn how to measure •Load test

Page 32: Cassandra Community Webinar | Data Model on Fire

*More? My data modeling talks:

The Data Model is Dead, Long Live the Data Model

Become a Super Modeler

The World's Next Top Data Model

!

Thank you! Time for questions...