cassandra summit 2014: cassandra at instagram 2014
DESCRIPTION
Presenter: Rick Branson, Infrastructure Engineer at Instagram As Instagram has scaled to over 200 million users, so has our use of Cassandra. We've built new features and rebuilt old on Cassandra, and it's become an extremely mission-critical foundation of our production infrastructure. Rick will deliver a refresh of our use cases and go deep on the technical challenges we faced during our expansion.TRANSCRIPT
CASSANDRA @ INSTAGRAM 2014Rick Branson, Software Engineer, Backend Engineering
@rbranson Cassandra Summit
September 11th, 2014
2013 in a Nutshell
• Moving some log-style data from Redis to Cassandra
• Saving $, better availability, more elasticity
• Some problems, but we worked through them
Overview
• "The Trouble with Redis"
• A whole new category of functionality
• Useful consumer advice
"THE TROUBLE WITH REDIS"
REDIS: THE GÜD PARTS
The Network PartIt's a heap on the network with a great text-based protocol.
Extremely EfficientIt does a lot with every ounce of CPU
StableVery much so. Sets the bar for the vast NoSQL space.
Useful, Rich TypesYou won't find anywhere else.
REDIS: ZDARKSIDE
What's in there? Hello?
Snapshots (BGSAVE)
Single-Threaded (!)
Memory Allocator
(Lack of) Durability
Unsafe Replication
THE PHILOSOPHY
CACHE OR DATABASE? Neither.
Memory Cliff
Unlikely to Change :(
ZOOKEEPER???
ZOZO: *VERY* few writes, extremely high volume, no-
latency reads without hotspots.
zozo agent
/var/cache/zozo
Consumer Process
Consumer Process
Consumer Process
WATCH /zozo/files
WriterUPDATE /zozo/files/knobs
ZooKeeper Ensemble
Examples
• "Knobs" key/integer live configuration
• Small (<250K) User Lists
• Security Data Sets (i.e. Blocked IPs)
JSON sucks.
Stay tuned.
zozo postmortem
• Agent is dead simple, never needs updating
• ZooKeeper resists both machine & human errors
• Multi-region strategy coming
REDIS2CASSANDRA (omg, finally)
Some things we moved
• Ad Serving Metadata
• Account Suggestion Metadata
• Churn Tracking
• Follow Requests
Not a drop-in.
A B
source target
"User A has requested to follow User B"
Key Column ColumnA B ...
Key Column Column
B A ...
Outbound Requests
Inbound Requests
INSTAGRAM DIRECT
Y U CASSANDRA?
• Elasticity + Scalability
• Write Availability
• Storage Efficiency
• Good Data Model Fit
WE CHOSE THRIFT.
fan-out-on-write
• Caching doesn't work
• Copy of full data for each recipient
• Atomic batches super useful
#nocache
CompositeType(IntegerType(reversed=true), IntegerType, IntegerType)
postid
typecode
entryid
POST_HEADER => creator_id: 12938 media_type: 1 asset_path: "/objects/23904801..." recipients: [28910, 208192]
(SEEN, 28910) (SEEN, 208192) (LIKE, 208192)
(COMMENT, 9302...) => [19238, "George wishes he ha..."]
(COMMENT, 3499...) => [28910, "Weird"]
IMMUTABLE (mostly) LOG
FILTER-ON-READ
CQL3, please. no? ok. :(
THE PENDING QUEUE
Key Column ColumnB [A's PostID] ...
C [A's PostID] ...
D [A's PostID] ...
A C
sender
recipients
D
B
DFirst read, delete pending
2000000 (POINTER, 1000000)1500000 (POST_HEADER, 0) => ...
(SEEN, 902394)(SEEN, 94083)(LIKE, 902394)
1000000 (POST_HEADER, 0) => ...(BUMPED, 0) => 2000000(SEEN, 1298333)(SEEN, 848183)...
Justin's Pending
25000
24000
23000
22000
21000
20000
19000
...
1234
Justin's Inbox
...
(1234, POST..)
(1234, LIKE..)
(1234, SEEN..)
(1234, SEEN..)
(1010, POST..)
(1010, SEEN..)
(1010, COMM..)
...
Inbox
...
(1234, POST..)
(1234, LIKE..)
(1234, SEEN..)
(1234, SEEN..)
(1010, POST..)
(1010, SEEN..)
(1010, COMM..)
...
Inbox
...
(1234, POST..)
(1234, LIKE..)
(1234, SEEN..)
(1234, SEEN..)
(1010, POST..)
(1010, SEEN..)
(1010, COMM..)
...
Inbox
...
(1234, POST..)
(1234, LIKE..)
(1234, SEEN..)
(1234, SEEN..)
(1010, POST..)
(1010, SEEN..)
(1010, COMM..)
...
(1234, EXCLUDE, <JUSTIN>)
Rick's Unseen
849300
738939
649019
641192
629012
610392
483201
348931
Rick's Unseen
849300
738939
649019
641192
629012
610392
483201
348931
Rick's Unseen
849300
738939
649019
641192
629012
610392
483201
348931
8
DEPLOYMENT
• 60 x hi1.4xlarge EC2 instances
• 2X our highest estimates
• Shrunk cluster after we knew load
• Peace of mind
• Data models held together
We ran C* in production.We learned some things.
young gc fun!
Yung GC has problems
• Reads = young gen garbage
• Double-collection bug in JDK 1.7+
• Best practices were for nodes with more writes
now we run: 10G Young Gen 20G Heap
<blink>DEAR GOD
ALMIGHTY!!!</blink>
This won't work out of the box.
-XX:+CMSScavengeBeforeRemark
-XX:CMSMaxAbortablePrecleanTime=60000
-XX:CMSWaitDuration=30000
-XX:+CMSEdenChunksRecordAlways -XX:+CMSParallelInitialMarkEnabled
20-core Ivy Bridge 144GB RAM PCIe Flash
15,000 reads/sec 2,000 writes/sec
TWO ISSUES
• Counters. Don't use them.
• Hundreds of thousands / millions of tombstones.
Async Storage Port Handler
READ 12349081
Client
RPC Thread Pool
Queue
ReadStage Thread Pool
Queue
GET 12349081
THANK YOU.