Download - Building an Activity Feed with Cassandra
![Page 1: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/1.jpg)
Building an Activity Feed with Cassandra
Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic
![Page 2: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/2.jpg)
DisclaimerNot an operations person.
Will pretend to be one for the purpose of this talk.
![Page 3: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/3.jpg)
Quick OverviewWhat is the Behance Activity Feed?
![Page 4: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/4.jpg)
![Page 5: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/5.jpg)
• Actions
• Comments, Appreciations, Etc
• Entities
• Projects, Works in Progress
• Actors
• Users
![Page 6: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/6.jpg)
Project Entity
Actions taken by actors
![Page 7: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/7.jpg)
Activity Fan Out
![Page 8: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/8.jpg)
User A publishes a new project
Write to Follower A’s feed
Write to Follower B’s feed
Write to Follower C’s feed
Write to Follower D’s feed
![Page 9: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/9.jpg)
Now that that’s over…
![Page 10: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/10.jpg)
MongoDB 2011
![Page 11: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/11.jpg)
• Smaller user base (~340,000).
• Built very quickly. Worked well at the time.
• Not well researched.
![Page 12: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/12.jpg)
Fast forward to 2014
![Page 13: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/13.jpg)
• Frequent node failures
• Heavy disk fragmentation caused by deletes
• Slow reads from disk. Started storing in RAM.
• Primary -> Secondary caused downtime for some.
• Scaled out vertically and horizontally.
![Page 14: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/14.jpg)
Why Cassandra?
![Page 15: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/15.jpg)
• Riak
• Very close. Community seemed lacking.
• Redis
• No native cluster. Too much maintenance.
• Memcached/MySQL
• Too much complex app logic.
![Page 16: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/16.jpg)
Cassandra Wins.
![Page 17: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/17.jpg)
• Fantastic community. #cassandra on Twitter
• Easy to read documentation
• Linearly scalable. Easy to grow cluster.
• Low maintenance overhead for ops team.
• Handles time series data very well.
![Page 18: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/18.jpg)
Learning
![Page 19: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/19.jpg)
• Cassandra Summit 2014
• Other team in Adobe
• Long nights reading documentation
![Page 20: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/20.jpg)
Our Data
![Page 21: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/21.jpg)
• Ephemeral
• “Source of truth” lives in a MySQL database
• Okay with *some* data loss
![Page 22: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/22.jpg)
Our Rules
![Page 23: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/23.jpg)
• User’s feed is comprised of entities with one set of actions
• User’s feed only contains one of any given entity
• An entity’s set of actions contains up to seven of the most recent actions taken by that user’s network
![Page 24: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/24.jpg)
Planning
![Page 25: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/25.jpg)
Language Support
• Most services on Behance are PHP
• No official Datastax PHP driver
![Page 26: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/26.jpg)
–Mark Dunphy, 2014
“Looks like I’m learning python.”
![Page 27: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/27.jpg)
Go to ProductionNo, nothing is working yet. I didn’t skip a slide.
![Page 28: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/28.jpg)
• App/cluster in production before anything works
• Test real life load
• Fail spectacularly without anybody noticing
• Deploy risky changes without fear
• Run alongside MongoDB
![Page 29: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/29.jpg)
January 19th, 2015
![Page 30: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/30.jpg)
Query Patterns• “Create your data models based on the queries
you want to run” - Basically Everybody
• Wanted to…
• Read a user’s feed entities by type and time of most recent action…separately.
• Write/Update a user’s feed entities with new actions while knowing only user id and entity id
![Page 31: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/31.jpg)
Data Models
![Page 32: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/32.jpg)
–Mark Dunphy, January 2015
“An UPDATE in Cassandra works like an UPSERT! Let’s store the user’s entire feed in a
single row in a table! It’s so simple!”
First Data Model
![Page 33: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/33.jpg)
CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int);
CREATE TYPE activity.entity ( entity_type_id int, entity_id int);
![Page 34: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/34.jpg)
CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id))
![Page 35: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/35.jpg)
CREATE TABLE activity.feeds ( modified_entities list<frozen<entity>>, modified_on timestamp, project_ids list<int>, user_id int, wip_revision_ids list<int>, PRIMARY KEY(user_id))
![Page 36: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/36.jpg)
First Data Model
![Page 37: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/37.jpg)
First Data Model
Moments Before Everything Exploded
![Page 38: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/38.jpg)
–Mark Dunphy, January 2015
“Okay let’s keep nearly the same model, but use INSERT and DELETE instead of always
UPDATE. Just use batch statements.”
Second Data Model
![Page 39: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/39.jpg)
Second Data Model
This was also a very very bad idea.
![Page 40: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/40.jpg)
• Lose the benefit of Cassandra being distributed
• All queries go through the same coordinator which puts a lot of stress and responsibility on one node.
• Use concurrency and prepared statements instead. Datastax drivers make this easy.
Second Data Model
![Page 41: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/41.jpg)
Second Data Model
Oops
![Page 42: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/42.jpg)
Okay…
![Page 43: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/43.jpg)
Now we’ve got it.
![Page 44: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/44.jpg)
Winning Data Model
![Page 45: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/45.jpg)
CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int);
![Page 46: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/46.jpg)
CREATE TABLE activity.projects ( created_on timestamp, user_id int, entity_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, created_on, entity_id))
![Page 47: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/47.jpg)
CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id))
![Page 48: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/48.jpg)
Much Nicer
![Page 49: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/49.jpg)
Write Strategy• “User A comments on Project A. User B follows
User A.”
• Request out to add the comment action to User B’s feed
• Read existing actions for that entity (Project A) in B’s feed. Push new action on top.
• Write new actions list into new “row” in projects table
![Page 50: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/50.jpg)
Read Strategy
• SELECT * FROM projects WHERE user_id = 123 AND created_on > 123214373
• Optimized for quick/easy reads. More important that a user’s feed loads quickly than it updating quickly.
• Use timestamp to “page” through data.
![Page 51: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/51.jpg)
Lessons Learned
• Duplicate your data to achieve desired queries. Storage is cheap. Writes are cheap.
• Think outside the box. Cassandra is not relational.
• Never ever ever ignore inserts/deletes in favor of an update only workflow. Never. It is literally insane.
![Page 52: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/52.jpg)
Final Specs• 16 node cluster on AWS EC2 c3.8xlarge
• Mix of SizeTieredCompactionStrategy and DateTieredCompactionStrategy
• NetworkTopologyStrategy
• Replication factor 3
• ConsistencyLevel = ONE for most requests
![Page 53: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/53.jpg)
Final Specs
• Bursty write volume. Consistent read volume.
• 5k to 80k writes per second
• 2k to 4k reads per second
![Page 54: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/54.jpg)
Questions?I might have answers.
![Page 55: Building an Activity Feed with Cassandra](https://reader031.vdocuments.us/reader031/viewer/2022022200/58a99fe31a28abc2518b6357/html5/thumbnails/55.jpg)
Thank you!
Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic