crowdstrike: real world dtcs for operators
TRANSCRIPT
![Page 1: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/1.jpg)
Real World DTCS For Operators
![Page 2: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/2.jpg)
An Introduction to CrowdStrike
We Are CyberSecurity Technology Company
We Detect, Prevent And Respond To All Attack Types In Real Time, Protecting Organizations From
Catastrophic Breaches
We Provide Next Generation Endpoint Protection, Threat Intelligence & Pre &Post IR Services
NEXT- GEN ENDPOINT
INCIDENTRESPONSE
THREATINTEL
![Page 3: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/3.jpg)
What Is Compaction?
• Cassandra write path:– First the Commitlog– Then the Memtable– Eventually flushed to a SSTable
• Each SSTable is written exactly once• Over time, Cassandra combines files
– Duplicate cells are merged– Obsolete data is purged
• The algorithm Cassandra uses to determine when and how to combine files is pluggable, and choosing the right strategy may be important at scale
3© 2015. All Rights Reserved.
![Page 4: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/4.jpg)
What Is Compaction?
• SizeTieredCompactionStrategy– Each time min_threshold (4) files of the same size appear, combine them into a new file
– Over time, you’ll naturally end up with a distribution of old data in large files, new data in small files
– Deleted data in large files stays on disk longer than desired because those files are very rarely compacted
4© 2015. All Rights Reserved.
![Page 5: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/5.jpg)
SizeTieredCompactionStrategy
© 2015. All Rights Reserved. 5
![Page 6: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/6.jpg)
SizeTieredCompactionStrategy
If each of the smallest blocks represent 1 day of data, and each write had a 90 day TTL, when do you actually delete files and reclaim disk
space?
© 2015. All Rights Reserved. 6
![Page 7: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/7.jpg)
Why Compaction Strategy Matters
© 2015. All Rights Reserved. 7
• We keep some data from sensors for a fixed time period• Processes• DNS queries• Files created
• It’s a LOT of data• Talk tomorrow morning: One million writes per second with 60 nodes
• We’re WELL past 60 nodes• If we can’t delete it efficiently, costs go way, way up
![Page 8: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/8.jpg)
DateTieredCompactionStrategy
• Early tickets suggested creating a way to stop compacting cold data– CASSANDRA-5515 – track sstable coldness, stop compacting cold sstables (measured by READ counts)
• CASSANDRA-6602 – optimize for time series specifically– Solution provided by Björn Hegerfors from Spotify– Use sstable’s min timestamp to find a target window– Compact sstables within the same target– Stop compacting sstables if max timestamp is older than a specified cutoff
© 2015. All Rights Reserved. 8
![Page 9: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/9.jpg)
DTCS In Pictures
© 2015. All Rights Reserved. 9
![Page 10: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/10.jpg)
DTCS Parameters
• max_sstable_age_days• base_time_seconds• timestamp_resolution• Min_threshold
– Common to all compaction strategies
• Max Threshold– Common to all compaction strategies
© 2015. All Rights Reserved. 10
![Page 11: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/11.jpg)
DTCS In Pictures
© 2015. All Rights Reserved. 11
![Page 12: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/12.jpg)
DTCS BenefitsIn Theory…
• You can stop data compacting at a point you choose!– max_sstable_age_days
• You can adjust the window size so that you can quickly expire data when it’s approximately the size you want– It’s not immediately intuitive, but you CAN calculate it (min_threshold and base_time_seconds)
• We know cold data won’t be recompacted, so we can potentially enable cold storage directories with cheaper disk – CASSANDRA-8460 – patch available, I need to rebase
© 2015. All Rights Reserved. 12
![Page 13: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/13.jpg)
Do people consider DTCS Production Ready?
• It was added to 2.0 after 2.1 was out. Usually this means:– Trivial and low risk, or– Experimental and meant for advanced users only
© 2015. All Rights Reserved. 13
![Page 14: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/14.jpg)
Do people consider DTCS Production Ready?
• It was added to 2.0 after 2.1 was out. Usually this means:– Trivial and low risk, or– Experimental and meant for advanced users only– I challenge you to find documentation on which is true for DTCS
© 2015. All Rights Reserved. 14
![Page 15: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/15.jpg)
Do people consider DTCS Production Ready?
• It was added to 2.0 after 2.1 was out. Usually this means:– Trivial and low risk, or– Experimental and meant for advanced users only– I challenge you to find documentation on which is true for DTCS
• Spotify’s intro blog notes that they use it in production• I’ve been told by a project committer that they feel DTCS is for advanced users only, but I’ve never seen any public facing messaging that normal users should avoid it
• It seems so easy, what could possibly go wrong…
© 2015. All Rights Reserved. 15
![Page 16: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/16.jpg)
DTCS Caveats
• The initial blogs give us some insight about what type of things may not behave as intended– “But something that works against the efforts of the strategy is writes with highly out-of-order timestamps”• How much is “highly out of order”?
– “Consider turning off read repairs. Anti-entropy repairs and hinted handoff don’t incur as much additional work for DTCS and may be used like usual.”
© 2015. All Rights Reserved. 16
![Page 17: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/17.jpg)
Out of order timestamps
• When an sstable gets flushed with an old timestamp in a new table:– The max timestamp is used to determine when to stop compacting, but– The min timestamp is used to determine which other files will be compacted with this sstable
© 2015. All Rights Reserved. 17
![Page 18: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/18.jpg)
Out of order timestamps
© 2015. All Rights Reserved. 18
![Page 19: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/19.jpg)
Out of order timestamps
© 2015. All Rights Reserved. 19
![Page 20: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/20.jpg)
Out of order timestamps
© 2015. All Rights Reserved. 20
• Windows are tiered, and they get bigger and bigger • With default settings and 1 year of data, the largest window covers 180 days– This means even if most of the file is past max_sstable_age_days, you can still end up compacting with a brand new sstable with read repaired data
• “DTCS never stops compacting”– Read repairs pull old data into new windows triggering recompaction
![Page 21: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/21.jpg)
Out of order timestamps
© 2015. All Rights Reserved. 21
• Windows are tiered, and they get bigger and bigger • With default settings and 1 year of data, the largest window covers 180 days– This means even if most of the file is past max_sstable_age_days, you can still end up compacting with a brand new sstable with read repaired data
• “DTCS never stops compacting”– Read repairs pull old data into new windows triggering recompaction– Does that mean we better run repair?
![Page 22: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/22.jpg)
Small SSTables from Repairs(and other streaming operations)
• “If an SSTable contains timestamps that don’t match the time when it was actually written to disk, it violates the size-to-age correspondence that DTCS tries to maintain.”
• The suggestions on Spotify and Datastax blogs say run repair more often than max_sstable_age_days, but that isn’t the only cause of small sstables– Bootstrap– Decommission– Bulk Loader
© 2015. All Rights Reserved. 22
![Page 23: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/23.jpg)
Real Pain:If you can’t expand your cluster, what’s the point?
© 2015. All Rights Reserved. 23
SSTable Count Per Node
![Page 24: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/24.jpg)
Real Pain:If you can’t expand your cluster, what’s the point?
© 2015. All Rights Reserved. 24
Damn you, vnodes!
![Page 25: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/25.jpg)
Well…
© 2015. All Rights Reserved. 25
![Page 26: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/26.jpg)
Small SSTables Shouldn’t Be Ignored
• If the small sstables are beyond max_sstable_age_days, they won’t be compacted– After all, that’s the point of max_sstable_age_days, right?
• If you raise max_sstable_age_days, the ever-growing DTCS tiered windows will cause existing sstables to merge and get much larger, negating one of the benefits of DTCS
• If you don’t raise max_sstable_age_days, you have to deal with performance implications of ten thousand sstables– Reduced somewhat by CASSANDRA-9882– Before #9882, too many sstables could block flushing for a long time
© 2015. All Rights Reserved. 26
![Page 27: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/27.jpg)
Embarrassing Admission
• Our early bulk loading plan and bootstrapping procedure acknowledged that sstables will be abandoned beyond max_sstable_age_days
• We have python scripts that check the timestamps, and manually submit compactions through JMX forceUserDefinedCompaction()
© 2015. All Rights Reserved. 27
![Page 28: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/28.jpg)
Really Embarrassing Admission
• Our early bulk loading plan and bootstrapping procedure acknowledged that sstables will be abandoned beyond max_sstable_age_days
• We have python scripts that check the timestamps, and manually submit compactions through JMX forceUserDefinedCompaction()
• Yes, really.
© 2015. All Rights Reserved. 28
![Page 29: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/29.jpg)
Really Embarrassing Admission
• Our early bulk loading plan and bootstrapping procedure acknowledged and accepted that sstables will be abandoned beyond max_sstable_age_days
• We have python scripts that check the timestamps, and manually submit compactions through JMX forceUserDefinedCompaction()
• Yes, really.• Does it actually scale?
© 2015. All Rights Reserved. 29
![Page 30: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/30.jpg)
When should you use DTCS?
• You TTL ALL of your data and writes come in order• Fixed sized cluster and no plans for bulk loading, or rarely changing cluster size and not using vnodes– If you plan on growing, you better have a plan for small sstables– If you do need to add/remove nodes, vnodes will cause far more small sstables than single-token-per-node
• Extra space available for compaction– You can’t rely on theoretical table sizes calculated with max_sstable_age_days, because read repair, hints, etc, can force those files to span much larger time ranges than you expect
© 2015. All Rights Reserved. 30
![Page 31: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/31.jpg)
Being Honest
© 2015. All Rights Reserved. 31
![Page 32: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/32.jpg)
What if?
• Do we really need max_sstable_age_days?– The conventional logic is to use it to denote cold data, but we use it to force window sizes
– If we give up tiering, and stick with fixed sized windows, do we need max_sstable_age_days?
• Without tiering, can we swap base_time_seconds for more intuitive configuration option?
© 2015. All Rights Reserved. 32
![Page 33: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/33.jpg)
TimeWindowCompactionStrategy
• Designed to be simple and efficient– Group sstables into logical buckets– STCS within each time window– No more rolling re-compaction– No more streaming leftovers– No more confusing options, just Window Size + Window Unit
• “12 Hours”, “3 Days”, “6 Minutes”
© 2015. All Rights Reserved. 33
![Page 34: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/34.jpg)
TimeWindowCompactionStrategy
• Submitted to Apache Cassandra as CASSANDRA-9666• For now, we use it at Crowdstrike to clean up after streaming:
– echo "set -b org.apache.cassandra.db:columnfamily=table,keyspace=keyspace,type=ColumnFamiliesCompactionStrategyClassorg.apache.cassandra.db.compaction.TimeWindowCompactionStrategy" | java -jar jmxterm.jar -l $IP:$PORT
– It’s not an accident that the TWCS defaults use 1 day windows with microsecond timestamp resolution, that matches our sstable needs, but we think it’s a good default
• Patches (and Tests) Available for 2.1, 2.2, 3.0
© 2015. All Rights Reserved. 34
![Page 35: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/35.jpg)
TimeWindowCompactionStrategy
• No more continuous compaction• No more tiny streaming leftovers• No more confusing options
– Just Window Size, Window Unit– “12 Hours”, “3 Days”, “6 Minutes”
• Work is ongoing for both DTCS and TWCS– CASSANDRA-9645 to make DTCS easier to use– CASSANDRA-10276 to make DTCS do STCS within each window (patch available)
– CASSANDRA-10280 to make DTCS work well with old data
© 2015. All Rights Reserved. 35
![Page 36: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/36.jpg)
TimeWindowCompactionStrategy
• There’s no guarantee that TWCS will make it into the project– TWCS is certainly easier to reason about, but DTCS was there first and is already deployed by real users
– Anecdotal evidence and preliminary benchmarks suggest TWCS comes out ahead based on current state of both strategies (at the time of these slides)
– Formal benchmarking is needed– DTCS probably wins for reads/SELECTS in SOME data models
• Even if TWCS doesn’t make it in, the source is available now on (see: CASSANDRA-9666)– It’s likely we’ll continue to maintain it, even if it’s not accepted upstream, so pull requests are welcome
© 2015. All Rights Reserved. 36
![Page 37: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/37.jpg)
Q&A
• Talk to me about Cassandra or DTCS on twitter: @jjirsa• Try to stop me from talking about DTCS on IRC: #cassandra• Crowdstrike is awesome and hiring
– www.crowdstrike.com/careers/• Jim Plush and Dennis Opacki, tomorrow morning
– “1 Million Writes Per Second on 60 Nodes with Cassandra and EBS”
© 2015. All Rights Reserved. 37
![Page 38: CrowdStrike: Real World DTCS For Operators](https://reader031.vdocuments.us/reader031/viewer/2022021922/587c19c71a28abb5068b4e9b/html5/thumbnails/38.jpg)
Thank you