c* summit 2013 - hindsight is 20/20. mysql to cassandra by michael kjellman

28
Hindsight is 20/20: MySQL to Cassandra Michael Kjellman (@mkjellman) Barracuda Networks #cassandra13

Upload: planet-cassandra

Post on 05-Dec-2014

1.610 views

Category:

Technology


1 download

DESCRIPTION

Abstract A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.

TRANSCRIPT

Page 1: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Hindsight is 20/20:MySQL to Cassandra

Michael Kjellman (@mkjellman)Barracuda Networks

#cassandra13

Page 2: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

What I Do

• Build and maintain “real-time” Spam detection and Web Filter classification

• Java/Perl/C (and bits of everything else)• Author perlcassa (Perl C* client)• Frontend? Backend? Customer? Internal?

Broken RAID Card? Bad Disk? I touch it all.

#cassandra13

Page 3: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Our C* Cluster

• In production for ~2 years since 0.8• Running 1.2.5 + minor patches• 24 nodes in 2 datacenters• (2) 2TB Hard Drives (no RAID)• (1) Small SSD for small hot CFs• 64GB of RAM• Puppet for management• Cobbler for deployment• Target max load at 600GB per node#cassandra13

Page 4: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

What is “real-time” exactly?

#cassandra13

Page 5: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

#cassandra13

Page 6: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Our Rewrite by the Numbers

Cassandra Based

MySQL Based

Average Application Latency 2.41ms 5.0ms

Elements in Database 32,836,767 3,946,713

Elements Application Handles

32,836,767 314,974

Element Seen Prior to Tracking

1st request Various Thresholds

Datacenters 2 1

Average Latency of Automated Classification

3 seconds 8 minutes

#cassandra13

Page 7: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Should you Rewrite?

• How To Survive a Ground-Up Rewrite Without Losing Your Sanity[1] – Joel Spolsky

• Past engineering decisions preventing implementation of new business requirements

• New threats smarter and more targeted

[1]http://onstartups.com/tabid/3339/bid/97052/How-To-Survive-a-Ground-Up-Rewrite-Without-Losing-Your-Sanity.aspx

#cassandra13

Page 8: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Evolving Legacy Systems

• Even good developers can write sloppy code

• Too much duct tape– Most layers applied around the database

#cassandra13

Page 9: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Hitting the Reset Button

• Plan for continuous failure• Easily Scalable• No Single Point of Failure – that you know

of • Many smaller boxes vs. one monolithic box

#cassandra13

Page 10: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Whiteboard to Reality

• Get technical buy-in from all parties• Migrate and rewrite in stages– Business requirements forced hybrid period

with the old and new systems operated in parallel

#cassandra13

Page 11: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

#cassandra13

Page 12: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Cassandra is Not…

1. Direct MySQL replacement2. Magic bullet to solve everything

#cassandra13

Page 13: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Migrating

• Painful• Painful• Painful• Tons of rewriting• Tons of regressions• Did I say painful?

#cassandra13

Page 14: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

So Why Migrate?

• C* is the best option for persistence tier• Business success motivation• Don’t let your database hold you back

#cassandra13

Page 15: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Lessons Learned (the good)

• Carefully defining data model up front• Creating a flexible systems architecture

that adapts well to changes during implementation

• Seriously – “Measure twice, cut once.”

#cassandra13

Page 16: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Lessons Learned (the bad)

• Consider migration and delivery requirements from the very beginning

• Adjust expectations – didn’t expect relying on legacy systems for so long

• Make syncing data between systems a priority

#cassandra13

Page 17: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Tips

1. Define requirements early2. Start with the queries3. Think differently regarding reads4. Syncing and migrating data5. Don’t use C* as a queue6. Estimate capacity7. Automate, Automate, Automate8. Some maintenance required#cassandra13

Page 18: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

1. Define Requirements Early

• What kind of queries will your application make?

• Do you need ordered results for all of your rows?

• What is your read load? Write load?

#cassandra13

Page 19: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

2. Start with the Queries

• C* != “#dontneedtothinkaboutmyschema”• Counters and Composites• Optimize for use case– Don’t be afraid of writes. Storage is cheap. – Optimize to reduce the number of tombstones

#cassandra13

Page 20: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

3. Think Differently Regarding Reads

• Do you really need all that data at once?

• mysql> SELECT * FROM mysupercooltable WHERE foo = ‘bar’;– Slow, but eventually will work

• cqlsh> SELECT * FROM myreallybigcf WHERE foo = ‘bar’;– Won’t work. Expect RPC timeout exceptions on reads generally

after ~10,000 rows even with paging

• Our solutions:– ElasticSearch– Hadoop/Pig

#cassandra13

Page 21: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

4. Syncing and Migrating Data

• Sync and migration scripts – take more seriously than production code

• Design sync to be continuous with both systems running in parallel during migration

• Prioritize the sync

#cassandra13

Page 22: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

5. Don’t use C* as a Queue

• Cassandra anti-patterns: Queues and queue-like datasets[2] – Aleksey Yeschenko

• Tombstones + read performance• Our solution: – Kafka (multiple publisher, multiple consumer

durable queue)

[2]http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets#cassandra13

Page 23: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

6. Estimate Capacity

• Don’t forget the Java heap (8GB Max)• Plan capacity – today and future• Stress Tool – profile node and multiply• MySQL hardware != Cassandra hardware• New bottlenecks thanks to C* being so

awesome?• I/O still an important concern with C*#cassandra13

Page 24: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

7. Automate, Automate, Automate

• Love your inner Ops self. Distributed systems move complexity to operations.

• Puppet or something similar (really)• Learn CCM earlier rather than later– www.github.com/pcmanus/ccm

#cassandra13

Page 25: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

8. Some Maintenance Required

• Repairs & Cleanup ops– automate and run

frequently

• Rolling restart meet rolling repair

• Learn jconsole• Solution:– Jolokia (JMX via HTTP)

#cassandra13

Page 26: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Where is Barracuda Today?

• 2 years in production with Cassandra• Definitely the right choice for our

persistence tier• 2 product lines on C* based system and

another major product in beta• Achieved “real-time” response

#cassandra13

Page 27: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

2.0 and Beyond

• Thrift -> CQL• CQL helps the MySQL to C* migration – Easier to comprehend / grasp

• Everyone understands SELECT * FROM cf WHERE key = ‘foo’;

• CAS and other 2.0 features make C* an even better replacement option for MySQL

#cassandra13

Page 28: C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

C* Community

• Supercalifragilisticexpialidocious community!• Riak, HBase, Oracle are other options. How is

their dev community?• Great client support. Great people. Great

motivated developers.• IRC: #cassandra on freenode• Mailing List: [email protected]

#cassandra13