always on: building highly available applications on cassandra
TRANSCRIPT
![Page 1: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/1.jpg)
Always On:Building Highly Available Applications on Cassandra
Robbie Strickland
![Page 3: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/3.jpg)
Who Am I?• Contributor to C*
community since 2010• DataStax MVP 2014/15/16• Author, Cassandra High
Availability & Cassandra 3.x High Availability
• Founder, ATL Cassandra User Group
![Page 4: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/4.jpg)
What is HA?• Five nines – 99.999% uptime?– Roughly 9 hours per year– … or a full work day of down time!
• Can we do better?
![Page 5: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/5.jpg)
Cassandra + HA• No SPOF• Multi-DC replication• Incremental backups• Client-side failure handling• Server-side failure handling• Lots of JMX stats
![Page 6: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/6.jpg)
HA by Design (it’s not an add-on)• Properly designed topology• Data model that respects C* architecture• Application that handles failure• Monitoring strategy with early warning• DevOps mentality
![Page 7: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/7.jpg)
Table Stakes• NetworkTopologyStrategy• GossipingPropertyFileSnitch– Or [YourCloud]Snitch
• At least 5 nodes• RF=3• No load balancer
![Page 8: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/8.jpg)
HA Topology
![Page 9: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/9.jpg)
Consistency Basics• Start with LOCAL_QUORUM reads & writes– Balances performance & availability, and provides
single DC full consistency– Experiment with eventual consistency (e.g.
CL=ONE) in a controlled environment• Avoid non-local CLs in multi-DC environments– Otherwise it’s a crap shoot
![Page 10: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/10.jpg)
Rack Failure• Don’t put all your
nodes in one rack!• Use rack awareness– Places replicas in
different racks• But don’t use
RackAwareSnitch
![Page 11: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/11.jpg)
Rack Awareness
R2
R3R1
Rack A Rack B
![Page 12: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/12.jpg)
Rack Awareness
R2
R3R1
Rack A Rack B
GossipingPropertyFileSnitchcassandra-rackdc.properties
dc=dc1rack=a
dc=dc1rack=b
![Page 13: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/13.jpg)
Rack Awareness (Cloud Edition)
R2
R3R1
Availability Zone A
Availability Zone B
[YourCloud]Snitch(it’s automagic!)
![Page 14: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/14.jpg)
Data Center Replication
dc=us-1 dc=eu-1
![Page 15: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/15.jpg)
Data Center ReplicationCREATE KEYSPACE myKeyspaceWITH REPLICATION = {
‘class’:’NetworkTopologyStrategy’,‘us-1’:3,‘eu-1’:3
}
![Page 16: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/16.jpg)
Multi-DC Consistency?
dc=us-1 dc=eu-1Assumption: LOCAL_QUORUM
![Page 17: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/17.jpg)
Multi-DC Consistency?
dc=us-1 dc=eu-1Assumption: LOCAL_QUORUM
Fullyconsistent
Fullyconsistent
![Page 18: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/18.jpg)
Multi-DC Consistency?
dc=us-1 dc=eu-1Assumption: LOCAL_QUORUM
Fullyconsistent
Fullyconsistent
?
![Page 19: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/19.jpg)
Multi-DC Consistency?
dc=us-1 dc=eu-1Assumption: LOCAL_QUORUM
Fullyconsistent
Fullyconsistent
Eventually
consistent
![Page 20: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/20.jpg)
Multi-DC Routing with LOCAL CLClient App
us-1
Client App
eu-1
![Page 21: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/21.jpg)
Multi-DC Routing with LOCAL CLClient App
us-1
Client App
eu-1
![Page 22: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/22.jpg)
Multi-DC Routing with non-LOCAL CL
Client App
us-1
Client App
eu-1
![Page 23: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/23.jpg)
Multi-DC Routing with non-LOCAL CL
Client App
us-1
Client App
eu-1
![Page 24: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/24.jpg)
Multi-DC Routing• Use DCAwareRoundRobinPolicy wrapped by
TokenAwarePolicy– This is the default– Prefers local DC – chosen based on host distance
and seed list– BUT this can fail for logical DCs that are physically
co-located, or for improperly defined seed lists!
![Page 25: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/25.jpg)
Multi-DC RoutingPro tip:val localDC = //get from configval dcPolicy =
new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder()
.withLocalDc(localDC)
.build())
Be explicit!!
![Page 26: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/26.jpg)
Handling DC Failure• Make sure backup DC has sufficient capacity– Don’t try to add capacity on the fly!
• Try to limit updates– Avoids potential consistency issues on recovery
• Be careful with retry logic– Isolate it to a single point in the stack– Don’t DDoS yourself with retries!
![Page 27: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/27.jpg)
Topology Lessons• Leverage rack awareness• Use LOCAL_QUORUM
– Full local consistency– Eventual consistency across DCs
• Run incremental repairs to maintain inter-DC consistency• Explicitly route local app to local C* DC• Plan for DC failure
![Page 28: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/28.jpg)
Data Modeling
![Page 29: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/29.jpg)
Quick Primer• C* is a distributed hash table– Partition key (first field in PK declaration) determines
placement in the cluster– Efficient queries MUST know the key!
• Data for a given partition is naturally sorted based on clustering columns
• Column range scans are efficient
![Page 30: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/30.jpg)
Quick Primer• All writes are immutable– Deletes create tombstones– Updates do not immediately purge old data– Compaction has to sort all this out
![Page 31: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/31.jpg)
Who Cares?• Bad performance = application downtime &
lost users• Lagging compaction is an operations
nightmare• Some models & query patterns create serious
availability problems
![Page 32: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/32.jpg)
Do• Choose a partition key that distributes evenly• Model your data based on common read
patterns• Denormalize using collections & materialized
views• Use efficient single-partition range queries
![Page 33: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/33.jpg)
Don’t• Create hot spots in either data or traffic
patterns• Build a relational data model• Create an application-side join• Run multi-node queries• Use batches to group unrelated writes
![Page 34: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/34.jpg)
Problem Case #1SELECT *FROM contactsWHERE id IN (1,3,5,7,9)
![Page 35: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/35.jpg)
Client
Problem Case #1
SELECT *FROM contactsWHERE id IN (1,3,5,7)
1 26 5
4 72 8
3 67 8
1 35 2
4 57 8
1 36 4
Must ask every 4 out of 6 nodes in the cluster to satisfy quorum!
![Page 36: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/36.jpg)
Client
Problem Case #1
SELECT *FROM contactsWHERE id IN (1,3,5,7)
1 26 5
4 72 8
3 67 8
1 35 2
4 57 8
1 36 4
“Not enough replicas available for query at consistency LOCAL_QUORUM” X
X1,3,5 all have sufficient replicas,yet entire query fails because of 7
![Page 37: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/37.jpg)
Solution #1• Option 1: Be optimistic and run it anyway– If it fails, you can fall back to option 2
• Option 2: Run parallel queries for each key– Return the results that are available– Fall back to CL ONE for failed keys– Client token awareness means coordinator does less
work
![Page 38: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/38.jpg)
Problem Case #2CREATE INDEX ON contacts(birth_year)
SELECT *FROM contactsWHERE birth_year=1975
![Page 39: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/39.jpg)
Client
Problem Case #2
SELECT *FROM contactsWHERE birth_year=1975
1975:JimSue
1975:SamJim
1975:SueTim
1975:TimJim
1975:SueSam
1975:SamTim
Index lives with the source data… so 5 nodes must be queried!
![Page 40: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/40.jpg)
Client
Problem Case #2
SELECT *FROM contactsWHERE birth_year=1975
1975:JimSue
1975:SamJim
1975:SueTim
1975:TimJim
1975:SueSam
1975:SamTim
“Not enough replicas available for query at consistency LOCAL_QUORUM”
Index lives with the source data… so 5 nodes must be queried!
X
X
![Page 41: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/41.jpg)
Solution #2• Option 1: Build your own index– App has to maintain the index
• Option 2: Use a materialized view– Not available before 3.0
• Option 3: Run it anyway– Ok for small amounts of data (think 10s to 100s of rows) that
can live in memory– Good for parallel analytics jobs (Spark, Hadoop, etc.)
![Page 42: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/42.jpg)
Problem Case #3CREATE TABLE sensor_readings (
sensorID uuid,timestamp int,reading decimal,PRIMARY KEY (sensorID, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
![Page 43: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/43.jpg)
Problem Case #3• Partition will grow unbounded– i.e. it creates wide rows
• Unsustainable number of columns in each partition
• No way to archive off old data
![Page 44: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/44.jpg)
Solution #3CREATE TABLE sensor_readings (
sensorID uuid,time_bucket int,timestamp int,reading decimal,PRIMARY KEY ((sensorID, time_bucket),
timestamp)) WITH CLUSTERING ORDER BY (timestamp DESC);
![Page 45: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/45.jpg)
Monitoring
![Page 46: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/46.jpg)
Monitoring Basics• Enable remote JMX• Connect a stats collector (jmxtrans, collectd,
etc.)• Use nodetool for quick single-node queries• C* tells you pretty much everything via JMX
![Page 47: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/47.jpg)
Thread Pools• C* is a SEDA architecture– Essentially message queues feeding thread pools– nodetool tpstats
• Pending messages are bad:Pool Name Active Pending Completed Blocked All time blockedCounterMutationStage 0 0 0 0 0ReadStage 0 0 103 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 13234794 0 0 0
![Page 48: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/48.jpg)
Lagging Compaction• Lagging compaction is the reason for many
performance issues• Reads can grind to a halt in the worst case• Use nodetool tablestats/cfstats &
compactionstats
![Page 49: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/49.jpg)
Lagging Compaction• Size-Tiered: watch for high SSTable counts:
Keyspace: my_keyspaceRead Count: 11207Read Latency: 0.047931114482020164 ms.Write Count: 17598Write Latency: 0.053502954881236506 ms.Pending Flushes: 0
Table: my_tableSSTable count: 84
![Page 50: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/50.jpg)
Lagging Compaction• Leveled: watch for SSTables remaining in L0:
Keyspace: my_keyspaceRead Count: 11207Read Latency: 0.047931114482020164 ms.Write Count: 17598Write Latency: 0.053502954881236506 ms.Pending Flushes: 0
Table: my_tableSSTable Count: 70SSTables in each level: [50/4, 15/10, 5/100]
50 in L0 (should be 4)
![Page 51: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/51.jpg)
Lagging Compaction Solution• Triage:– Check stats history to see if it’s a trend or a blip– Increase compaction throughput using nodetool
setcompactionthroughput– Temporarily switch to SizeTiered
• Do some digging:– I/O problem?– Add nodes?
![Page 52: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/52.jpg)
Wide Rows / Hotspots• Only takes one to wreak havoc• It’s a data model problem• Early detection is key!• Watch partition max bytes– Make sure it doesn’t grow unbounded– … or become significantly larger than mean bytes
![Page 53: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/53.jpg)
Wide Rows / Hotspots• Use nodetool toppartitions to sample
reads/writes and find the offending partition• Take action early to avoid OOM issues with:– Compaction – Streaming– Reads
![Page 54: Always On: Building Highly Available Applications on Cassandra](https://reader036.vdocuments.us/reader036/viewer/2022070512/588a41361a28abc6168b748d/html5/thumbnails/54.jpg)
For More Info…
(shameless book plug)