cassandra community webinar | practice makes perfect: extreme cassandra optimization

44
PRACTICE MAKES PERFECT: EXTREME CASSANDRA OPTIMIZATION @AlTobey Tech Lead, Compute and Data Services #CASSANDRA Thursday, August 8, 13

Upload: datastax

Post on 26-Jan-2015

105 views

Category:

Technology


0 download

DESCRIPTION

Ooyala has been using Apache Cassandra since version 0.4.Their data ingest volume has exploded since 0.4 and Cassandra has scaled along with it. In this webinar, Al will share lessons that he has learned across an array of topics from an operational perspective including how to manage, tune, and scale Cassandra in a production environment. Speaker: Al Tobey, Tech Lead, Compute and Data Services at Ooyala Al Tobey is Tech Lead of the Compute and Data services team at Ooyala. His team develops and operates Ooyala's internal big data platform, consisting of Apache Cassandra, Hadoop, and internally developed tools. When not in front of a computer, Al is a father, husband, and trombonist.

TRANSCRIPT

Page 1: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

PRACTICE MAKES PERFECT:EXTREME CASSANDRA OPTIMIZATION

@AlTobeyTech Lead, Compute and Data Services

#CASSANDRAThursday, August 8, 13

Page 2: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

2

⁍ About me / Ooyala⁍ How not to manage your Cassandra clusters⁍ Make it suck less⁍ How to be a heuristician⁍ Tools of the trade⁍ More Settings⁍ Show & Tell

#CASSANDRA

Outline

Thursday, August 8, 13

Page 3: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

3

⁍ Tech Lead, Compute and Data Services at Ooyala, Inc.⁍ C&D team is #devops: 3 ops, 3 eng, me⁍ C&D team is #bdaas: Big Data as a Service⁍ ~100 Cassandra nodes, expanding quickly⁍ Obligatory: we’re hiring

#CASSANDRA

@AlTobey

Thursday, August 8, 13

Page 4: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

4

⁍ Founded in 2007⁍ 230+ employees globally⁍ 200M unique users,110+ countries⁍ Over 1 billion videos played per month⁍ Over 2 billion analytic events per day

#CASSANDRA

Ooyala

Thursday, August 8, 13

Page 5: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

5

Ooyala has been using Cassandra since v0.4Use cases: ⁍ Analytics data (real-time and batch) ⁍ Highly available K/V store ⁍ Time series data ⁍ Play head tracking (cross-device resume) ⁍ Machine Learning Data

#CASSANDRA

Ooyala & Cassandra

Thursday, August 8, 13

Page 6: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

Ooyala: Legacy Platform

cassandracassandracassandracassandra

6

S3

hadoophadoophadoophadoophadoop

cassandra

ABE Service

APIloggersplayers

START HERE

#CASSANDRA

read-modify-write

Thursday, August 8, 13

Page 7: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

memTable

Avoiding read-modify-write

7#CASSANDRA

Albert 6 Wednesday 0

Evan Tuesday 0 Wednesday 0

Frank Tuesday 3 Wednesday 3

Kelvin Tuesday 0 Wednesday 0

cassandra13_drinks column family

Krzysztof Tuesday 0 Wednesday 0

Phillip Tuesday 12 Wednesday 0

Tuesday

Thursday, August 8, 13

Page 8: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

memTable

Avoiding read-modify-write

8#CASSANDRA

Al Tuesday 2 Wednesday 0

Phillip Tuesday 0 Wednesday 1

cassandra13_drinks column family

ssTable

Albert 6 Wednesday 0

Evan Tuesday 0 Wednesday 0

Frank Tuesday 3 Wednesday 3

Kelvin Tuesday 0 Wednesday 0

Krzysztof Tuesday 0 Wednesday 0

Phillip Tuesday 12 Wednesday 0

Tuesday

Thursday, August 8, 13

Page 9: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

memTable

Avoiding read-modify-write

9#CASSANDRA

Albert Tuesday 22 Wednesday 0

cassandra13_drinks column family

ssTableAlbert Tuesday 2 Wednesday 0

Phillip Tuesday 0 Wednesday 1

ssTable

Albert 6 Wednesday 0

Evan Tuesday 0 Wednesday 0

Frank Tuesday 3 Wednesday 3

Kelvin Tuesday 0 Wednesday 0

Krzysztof Tuesday 0 Wednesday 0

Phillip Tuesday 12 Wednesday 0

Tuesday

Thursday, August 8, 13

Page 10: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

Avoiding read-modify-write

10#CASSANDRA

cassandra13_drinks column family

ssTable

Albert Tuesday 22 Wednesday 0

Evan Tuesday 0 Wednesday 0

Frank Tuesday 3 Wednesday 3

Kelvin Tuesday 0 Wednesday 0

Krzysztof Tuesday 0 Wednesday 0

Phillip Tuesday 0 Wednesday 1

Thursday, August 8, 13

Page 11: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

2011: 0.6 ➜ 0.8

11

⁍ Migration is still a largely unsolved problem⁍ Wrote a tool in Scala to scrub data and write via Thrift⁍ Rebuilt indexes - faster than copying

hadoopcassandra

GlusterFS P2Pcassandra

Thrift

#CASSANDRA

Scala Map/Reduce

Thursday, August 8, 13

Page 12: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

Changes: 0.6 ➜ 0.8

12

⁍ Cassandra 0.8⁍ 24GiB heap⁍ Sun Java 1.6 update⁍ Linux 2.6.36⁍ XFS on MD RAID5⁍ Disabled swap or at least vm.swappiness=1

#CASSANDRAThursday, August 8, 13

Page 13: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

13

⁍ 18 nodes ➜ 36 nodes⁍ DSE 3.0⁍ Stale tombstones again!⁍ No downtime!

cassandraGlusterFS P2P

DSE 3.0

Thrift

#CASSANDRA

Scala Map/Reduce

2012: Capacity Increase

Thursday, August 8, 13

Page 14: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

System Changes: Apache 1.0 ➜ DSE 3.0

14

⁍ DSE 3.0 installed via apt packages⁍ Unchanged: heap, distro⁍ Ran much faster this time!⁍ Mistake: Moved to MD RAID 0 Fix: RAID10 or RAID5, MD, ZFS, or btrfs⁍ Mistake: Running on Ubuntu Lucid Fix: Ubuntu Precise

#CASSANDRAThursday, August 8, 13

Page 15: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

Config Changes: Apache 1.0 ➜ DSE 3.0

15

⁍ Schema: compaction_strategy = LCS⁍ Schema: bloom_filter_fp_chance = 0.1⁍ Schema: sstable_size_in_mb = 256⁍ Schema: compression_options = Snappy⁍ YAML: compaction_throughput_mb_per_sec: 0

#CASSANDRAThursday, August 8, 13

Page 16: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

16

⁍ 36 nodes ➜ lots more nodes⁍ As usual, no downtime!

#CASSANDRA

DSE 3.1DSE 3.1

replication

2013: Datacenter Move

Thursday, August 8, 13

Page 17: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

17

Upcoming use cases: ⁍ Store every event from our players at full resolution ⁍ Cache code for our Spark job server ⁍ AMPLab Tachyon backend?

#CASSANDRA

Coming Soon for Cassandra at Ooyala

Thursday, August 8, 13

Page 18: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

18

spark

APIloggersplayers kafka

ingest

job server

#CASSANDRA

DSE 3.1

Next Generation Architecture: Ooyala Event Store

Tachyon?

Thursday, August 8, 13

Page 19: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

19

⁍ Security⁍ Cost of Goods Sold⁍ Operations / support⁍ Developer happiness⁍ Physical capacity (cpu/memory/network/disk)⁍ Reliability / Resilience⁍ Compromise

#CASSANDRA

There’s more to tuning than performance:

Thursday, August 8, 13

Page 20: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

20

⁍ I’d love to be more scientific, but production comes first⁍ Sometimes you have to make educated guesses⁍ It’s not as difficult as it’s made out to be⁍ Your brain is great at heuristics. Trust it.⁍ Concentrate on bottlenecks⁍ Make incremental changes⁍ Read Malcom Gladwell’s “Blink”

#CASSANDRA

I am not a scientist ... heuristician?

Thursday, August 8, 13

Page 21: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

21

Observe, Orient, Decide, Act:⁍ Observe the system in production under load⁍ Make small, safe changes⁍ Observe⁍ Commit or Revert

#CASSANDRA

The OODA Loop

Thursday, August 8, 13

Page 22: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

Testing Shiny Things

22

⁍ Like kernels⁍ And Linux distributions⁍ And ZFS⁍ And btrfs⁍ And JVM’s & parameters⁍ Test them in production!

#CASSANDRAThursday, August 8, 13

Page 23: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

ext4

ext4

ext4

ZFS

ext4

kernelupgrade

ext4

btrfs

Testing Shiny Things: In Production

23#CASSANDRAThursday, August 8, 13

Page 24: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

24#CASSANDRA

Brendan Gregg’s Tool Chart

http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x

Thursday, August 8, 13

Page 25: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

25#CASSANDRA

dstat -lrvn 10

Thursday, August 8, 13

Page 26: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

26#CASSANDRA

cl-netstat.pl

https://github.com/tobert/perl-ssh-tools

Thursday, August 8, 13

Page 27: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

27#CASSANDRA

iostat -x 1

Thursday, August 8, 13

Page 28: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

28#CASSANDRA

htop

Thursday, August 8, 13

Page 29: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

29#CASSANDRA

jconsole

Thursday, August 8, 13

Page 30: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

30#CASSANDRA

opscenter

Thursday, August 8, 13

Page 31: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

31#CASSANDRA

nodetool ring

10.10.10.10 Analytics rack1 Up Normal 47.73 MB 1.72% 10120466947217566370246917203789658009810.10.10.10 Analytics rack1 Up Normal 63.94 MB 0.86% 10267140381235212259670785569061971894010.10.10.10 Analytics rack1 Up Normal 85.73 MB 0.86% 10413813815252858149094653934334285778210.10.10.10 Analytics rack1 Up Normal 47.87 MB 0.86% 10560487249270504038518522299606599662410.10.10.10 Analytics rack1 Up Normal 39.73 MB 0.86% 10707160683288149927942390664878913546610.10.10.10 Analytics rack1 Up Normal 40.74 MB 1.75% 11004239456625750601145828592000033495010.10.10.10 Analytics rack1 Up Normal 40.08 MB 2.20% 11378142086690767579161636803057946630110.10.10.10 Analytics rack1 Up Normal 56.19 MB 3.45% 11965015139561879701796205307352452448710.10.10.10 Analytics rack1 Up Normal 214.88 MB 11.62% 13942488677708971556132479214987206104910.10.10.10 Analytics rack1 Up Normal 214.29 MB 2.45% 14358821087139961811070002843144079930510.10.10.10 Analytics rack1 Up Normal 158.49 MB 1.76% 14657736862492802169017525034490443612910.10.10.10 Analytics rack1 Up Normal 40.3 MB 0.92% 148140168357822348318107048925037023042

Thursday, August 8, 13

Page 32: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

32#CASSANDRA

nodetool cfstatsKeyspace: gostress Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Column Family: stressful SSTable count: 1 Space used (live): 32981239 Space used (total): 32981239 Number of Keys (estimate): 128 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Bloom Filter False Positives: 0 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 336 Compacted row minimum size: 7007507 Compacted row maximum size: 8409007 Compacted row mean size: 8409007

Could be using a lot of heap

Controllable by sstable_size_in_mb

Thursday, August 8, 13

Page 33: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

33#CASSANDRA

nodetool proxyhistogramsOffset Read Latency Write Latency Range Latency35 0 20 042 0 61 050 0 82 060 0 440 072 0 3416 086 0 17910 0103 0 48675 0124 1 97423 0149 0 153109 0179 2 186205 0215 5 139022 0258 134 44058 0310 2656 60660 0372 34698 742684 0446 469515 7359351 0535 3920391 31030588 0642 9852708 33070248 0770 4487796 9719615 0924 651959 984889 0

Thursday, August 8, 13

Page 34: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

34#CASSANDRA

nodetool compactionstats

al@node ~ $ nodetool compactionstatspending tasks: 3 compaction type keyspace column family bytes compacted bytes total progress Compaction hastur gauge_archive 9819749801 16922291634 58.03% Compaction hastur counter_archive 12141850720 16147440484 75.19% Compaction hastur mark_archive 647389841 1475432590 43.88%Active compaction remaining time : n/aal@node ~ $ nodetool compactionstatspending tasks: 3 compaction type keyspace column family bytes compacted bytes total progress Compaction hastur gauge_archive 10239806890 16922291634 60.51% Compaction hastur counter_archive 12544404397 16147440484 77.69% Compaction hastur mark_archive 1107897093 1475432590 75.09%Active compaction remaining time : n/a

Thursday, August 8, 13

Page 35: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

35#CASSANDRA

⁍ cassandra-stress⁍ YCSB⁍ Production⁍ Terasort (DSE)⁍ Homegrown

Stress Testing Tools

Thursday, August 8, 13

Page 36: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

36#CASSANDRA

kernel.pid_max = 999999fs.file-max = 1048576vm.max_map_count = 1048576net.core.rmem_max = 16777216net.core.wmem_max = 16777216net.ipv4.tcp_rmem = 4096 65536 16777216net.ipv4.tcp_wmem = 4096 65536 16777216vm.dirty_ratio = 10vm.dirty_background_ratio = 2vm.swappiness = 1

/etc/sysctl.conf

Thursday, August 8, 13

Page 37: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

37#CASSANDRA

ra=$((2**14))# 16kss=$(blockdev --getss /dev/sda)blockdev --setra $(($ra / $ss)) /dev/sda

echo 256 > /sys/block/sda/queue/nr_requestsecho cfq > /sys/block/sda/queue/schedulerecho 16384 > /sys/block/md7/md/stripe_cache_size

/etc/rc.local

Thursday, August 8, 13

Page 38: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

38#CASSANDRA

-Xmx8G leave it alone-Xms8G leave it alone-Xmn1200M 100MiB * nCPU-Xss180k should be fine

-XX:+UseNUMAnumactl --interleave

JVM Args

Thursday, August 8, 13

Page 39: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

cgroups

39#CASSANDRA

Provides fine-grained control over Linux resources⁍ Makes the Linux scheduler better⁍ Lets you manage systems under extreme load⁍ Useful on all Linux machines⁍ Can choose between determinism and flexibility

Thursday, August 8, 13

Page 40: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

cgroups

40#CASSANDRA

cat >> /etc/default/cassandra <<EOFcpucg=/sys/fs/cgroup/cpu/cassandramkdir $cpucgcat $cpucg/../cpuset.mems >$cpucg/cpuset.memscat $cpucg/../cpuset.cpus >$cpucg/cpuset.cpusecho 100 > $cpucg/sharesecho $$ > $cpucg/tasksEOF

Thursday, August 8, 13

Page 41: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

Successful Experiment: btrfs

41#CASSANDRA

mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1mount -o compress=lzo /dev/sdc1 /data

Thursday, August 8, 13

Page 42: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

Successful Experiment: ZFS on Linux

42#CASSANDRA

zpool create data raidz /dev/sd[c-h]zfs create data/cassandrazfs set compression=lzjb data/cassandrazfs set atime=off data/cassandrazfs set logbias=throughput data/cassandra

Thursday, August 8, 13

Page 43: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

Conclusions

43#CASSANDRA

⁍ Tuning is multi-dimensional⁍ Production load is your most important benchmark⁍ Lean on Cassandra, experiment!⁍ No one metric tells the whole story

Thursday, August 8, 13

Page 44: Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

Questions?

44#CASSANDRA

⁍ Twitter: @AlTobey⁍ Github: https://github.com/tobert⁍ Email: [email protected] / [email protected]

Thursday, August 8, 13