cassandra on ec2
TRANSCRIPT
@mdennis
Cassandra On EC2
Matthew F. Dennis // @mdennis
@mdennis
Instance Sizes
● m1.xlarge is by far the most common size● m1.large is ok for many use cases● m2.4xlarge in some cases
● keep the entire dataset in memory
● c1.xlarge / cc1.4xlarge● Smallish but very hot set of data
– regardless of how much data is on disk
● Extremely high request rate● Encrypted node-node communications and high traffic● Usually better off with many m1.xlarge instances because of
the extra memory, but not always
@mdennis
Configuration
● Stripe All Ephemeral Drives● data directory and commit log on same volume
● Only applies to EC2 and SSDs, not physical HW● Why?
● 6-8 GB heap on m1.xlarge● 3-4 GB heap on m1.large● Phi Convict Threshold? Maybe ...
@mdennis
EBS versus Ephemeral
● Ephemeral drives are:● Generally faster for C*● More stable (no pauses/freezes; outages?)● Cheaper● Easier to initially configure
● Striped EBS?● yeah, about that …
● TL;DL don't use EBS for C* on EC2
@mdennis
Multi-Zone
● Alternate zones in your token topology● No really, this is important, alternate zones
– We should probably fix this ...
● “complicated, but possible” to add new zones after initial deployment
● Never move a *token* to a different region or zone● If you think that is what you want to do, really you
want to bootstrap new one at token-1 in the new region/zone and then decom the old one
@mdennis
Multi-Region C* on EC2
● Connectivity is the complicated part
● Ec2MultiRegionSnitch is not the entire answer
– https://issues.apache.org/jira/browse/CASSANDRA-2452● Don't try to make a “fail over” DC, just go with active-active
● If you insist, then do the fail over in your application and configure C* the same as you would active-active
● Generally requires a lot more storage● Doesn't matter though because you're using ephemeral drives (right?)
and don't want a TB of data on each node anyway
@mdennis
Multi-Region Connectivity Options
● VPN
● Encrypted node-node communication● CPU utilization is often a downside
● VPNCubed / VPCPlus● I've never deployed it, heard good things about it though
● Amazon VPC● anyone know if a single VPC can span regions yet?
● SSH Tunnels
● EC2 security groups
● IPTables
● Encrypted node-node + public IP binding + AWS security groups + IPTables (EIPs may simplify this, never actually tried it)
@mdennis
Recovery From Failures
● Don't “fix” EC2 nodes, replace them● boostrap at token-1, remove old token
– bootstrap can be slow, but will get better
● Other than that it's the same in EC2 as not ...
@mdennis
Node Maintenance
● “Maintenance” On EC2?● Usually not required (just replace the node)● If it is, just stop C*, CL+HH/repair/RR will fix it
● Same as physical HW● https://issues.apache.org/jira/browse/CASSANDRA-2034
● Stop Trying To Decom Nodes Just To Replace a Disk !!!
@mdennis
Backups
● C* snapshots and push to S3
● Directory Watcher that pushes new files to S3● SimpleGeo: https://github.com/simplegeo/tablesnap
● Netflix: http://slidesha.re/NFOnCassBkup
● Keep a log of all incoming writes● Not specific to S3● Can be coupled with snapshots / S3● Useful for other reasons as well
● Compression in transit to S3 (or where ever) can be done on a separate EC2 instance to avoid burning CPU● Usually not worth the extra complexity / cost
@mdennis
Changing Node Sizes
● Start a new instance● rsync data from from original node to new node● Shutdown C* on original node● rsync data from from original node to new node● Start C* on new node● Shutdown original instance● NB: Assumes same token, region, zone, etc
@mdennis
Elastic Load Balancers
● They're awesome, use them
● Could be more awesome (e.g. better integration with Route 53)
● What I really want is TCP anycast for ELB across regions (AWS could make it work)
● Balance across regions with GeoIP / GeoDNS
● Zerigo, TZOHA, Neustar, “homegrown”, etc
● Route 53? You wish (though Route 53 itself is run over anycast)
– “in the future we plan for Route 53 to also give you greater control over … the route your users take to reach an endpoint” --Werner Vogels
● Put them in front of your app servers, not your C* instances
● Keep your app servers stateless or at least “weakly” stateless (e.g. no sticky sessions required)
@mdennis
AMIs versus Scripted Setup
● DataStax publishes C* AMIs● Chef Recipes as well● Or roll your own …● Whatever you do, just make sure it's automated
and repeatable● *personally* I prefer scripting the setup
remotely, but this is … “less than ideal”● PSSH is, in general, awesome
@mdennis
WTF?!
● Your zone X is not the same as my zone X● Consistent within an EC2 account● Problematic across accounts● Does not apply to regions (i.e. your region X is my region X)
● EIPs resolve to private IPs from within AWS● EBS volumes sometimes just “freeze”
● AWS: “yeah, that happens sometimes under load”
● steal% sometimes 20% or more (1%-3% is “normal”)● This is AWS literally stealing your money● Thankfully not all that common, but watch out for it
@mdennis
Missing AWS Features
● ELB over anycast● Probably doable by AWS, but not others ...
● GeoDNS from Route53● No really, WTF Doesn't Route53 Do GeoDNS ?!?!
● Multi-Region VPC● Local SSDs
@mdennis
We're Hiring !
● Developers● QA● Community Manager● Sales / SE● Interns
– Dev
– Support
– QA
● Smart People Interested In Cassandra
@mdennis
Q?(yes, I'll post the slides on slideshare)
Cassandra On EC2