long term road test of c*
TRANSCRIPT
![Page 1: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/1.jpg)
C* Long Term Road TestKavika Vollmar & Chris Romary
![Page 2: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/2.jpg)
Introductions
● Christopher Romary● David “Kavika” Vollmar● Gnip / Twitter● Sponsor: DataStax
![Page 3: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/3.jpg)
Outline
● Current C* Usage / Size / Configurations● Issues ● Best practices● Questions
![Page 4: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/4.jpg)
Background
● 4 C* clusters across at least 2 environments (Production, staging, optional review)
● Prod Configurations○ Tango 16 nodes, 7 of 40 TB○ Uniform 20 nodes, 6 of 20 TB○ Yankee 20 nodes, 18 of 33 TB○ Whiskey 8 nodes, 0.3 of 10
![Page 5: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/5.jpg)
Different node and drive types
● EC2 and bare metal machines● Using ephemeral and EBS volumes in EC2● Dedicated SSDs on bare metal
![Page 6: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/6.jpg)
Versions
Cassandra 1.2Astyanax 1.56CentOS 6.4 (and now 7)
![Page 7: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/7.jpg)
Access Patterns
● Very different access patterns○ Rows written once versus continuously updated○ Fairly stable size (e.g. 30 days of data) vs
continuously growing○ Read once versus reading same rows over and over
again
![Page 8: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/8.jpg)
Customer Compliance Use Case
● Ever Expanding Data Set● Lots of small rows
○ 1 row per tweet, 2-5 columns of longs per row● Heavy Write Load
○ 10.000 writes / second● Heavy Read Load
○ 3.000 reads / second○ Random read distribution
![Page 9: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/9.jpg)
Picking an Instance Size
● Ephemeral (with spinning disks) did not provide required IOPS
● Moved to provisioned IOPS● Picked most inexpensive host type with
presumed sufficient IOPs and cores
![Page 10: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/10.jpg)
VNodes
● VNodes○ Easy to add/remove nodes○ Lose any 2 nodes and your cluster is down
(depending on R/F)● Token Rings (Single Token)
○ We don’t recommend it○ Hard to add more capacity
●●●
![Page 11: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/11.jpg)
Memory Usage
● Bloom Filters● Row Cache● Key Cache● Mem Tables● Indexes
![Page 12: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/12.jpg)
Bloom Filters
● Live Off Heap● Grow proportional to data size● % Chance to avoid hitting disk● Can Cause OOM
![Page 13: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/13.jpg)
Memory Settings
Small amount of RAM used by HeapLet OS manage paging/caching Tune off heap memory to prevent OOM
![Page 14: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/14.jpg)
Seed Configuration
● Point of contact for the cluster● Only important when joining the cluster● Test behavior with seeds across Data
Centers
![Page 15: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/15.jpg)
IOPs Throughput issues
● Did not achieve expected throughput● Usually performed up to expectations● New nodes (sometimes) had issues
![Page 16: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/16.jpg)
More on IO
● iostat ○ best case showed expected 3000 r/sec○ worst case showed < 1000 r/sec
● block size confusion○ Did one os “read” correspond to 1 EBS iop?
![Page 17: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/17.jpg)
Pre-warming volumes
● Read each sector to make sure it’s allocated● Recommended By Amazon● Wasn’t needed in most cases● Takes several hours so do it in advance
![Page 18: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/18.jpg)
Migration to new hardware
● Newer Instance Types○ Newer Hardware○ Less maintenance time
● Bigger Instance Types○ 2x RAM○ Ephemeral SSDs
![Page 19: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/19.jpg)
Actually MigratingSeveral ways to do it● Multiple data centers● Add/remove nodesNot Always Smooth● Relationship to consistency settings● Test Config Settings● Load on Prod System
![Page 20: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/20.jpg)
Data Per Node
More Data per Node:- Costs Less But…- Repairs Take Longer- Adding/Removing nodes takes longer- Everything takes longer
![Page 21: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/21.jpg)
General Best Practices (1)
● Generation of config files● Disk space alerts at 40%● Script Everything
○ Spin up○ Repairs
● Lock down prod with different credentials ○ Securing EC2 in general is a much larger topic
![Page 22: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/22.jpg)
General Best Practices (2)
● Verify client settings and configurations● Always use the same KPI’s● Performance testing with separate
applications
![Page 23: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/23.jpg)
General Best Practices (3)
Have a “staging” system that mirrors prod as close as possible
Test all changes in staging first
![Page 24: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/24.jpg)
Questions?
![Page 25: Long Term Road Test of C*](https://reader034.vdocuments.us/reader034/viewer/2022042701/55aa460e1a28ab7d658b48c1/html5/thumbnails/25.jpg)
The Future...
How far will this scale? As...- The size of the data grows- The number of nodes in the cluster growsWhat will the next AWS issue be?