building an activity feed with cassandra

55
Building an Activity Feed with Cassandra Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic

Upload: mark-dunphy

Post on 19-Feb-2017

1.223 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Building an Activity Feed with Cassandra

Building an Activity Feed with Cassandra

Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic

Page 2: Building an Activity Feed with Cassandra

DisclaimerNot an operations person.

Will pretend to be one for the purpose of this talk.

Page 3: Building an Activity Feed with Cassandra

Quick OverviewWhat is the Behance Activity Feed?

Page 4: Building an Activity Feed with Cassandra
Page 5: Building an Activity Feed with Cassandra

• Actions

• Comments, Appreciations, Etc

• Entities

• Projects, Works in Progress

• Actors

• Users

Page 6: Building an Activity Feed with Cassandra

Project Entity

Actions taken by actors

Page 7: Building an Activity Feed with Cassandra

Activity Fan Out

Page 8: Building an Activity Feed with Cassandra

User A publishes a new project

Write to Follower A’s feed

Write to Follower B’s feed

Write to Follower C’s feed

Write to Follower D’s feed

Page 9: Building an Activity Feed with Cassandra

Now that that’s over…

Page 10: Building an Activity Feed with Cassandra

MongoDB 2011

Page 11: Building an Activity Feed with Cassandra

• Smaller user base (~340,000).

• Built very quickly. Worked well at the time.

• Not well researched.

Page 12: Building an Activity Feed with Cassandra

Fast forward to 2014

Page 13: Building an Activity Feed with Cassandra

• Frequent node failures

• Heavy disk fragmentation caused by deletes

• Slow reads from disk. Started storing in RAM.

• Primary -> Secondary caused downtime for some.

• Scaled out vertically and horizontally.

Page 14: Building an Activity Feed with Cassandra

Why Cassandra?

Page 15: Building an Activity Feed with Cassandra

• Riak

• Very close. Community seemed lacking.

• Redis

• No native cluster. Too much maintenance.

• Memcached/MySQL

• Too much complex app logic.

Page 16: Building an Activity Feed with Cassandra

Cassandra Wins.

Page 17: Building an Activity Feed with Cassandra

• Fantastic community. #cassandra on Twitter

• Easy to read documentation

• Linearly scalable. Easy to grow cluster.

• Low maintenance overhead for ops team.

• Handles time series data very well.

Page 18: Building an Activity Feed with Cassandra

Learning

Page 19: Building an Activity Feed with Cassandra

• Cassandra Summit 2014

• Other team in Adobe

• Long nights reading documentation

Page 20: Building an Activity Feed with Cassandra

Our Data

Page 21: Building an Activity Feed with Cassandra

• Ephemeral

• “Source of truth” lives in a MySQL database

• Okay with *some* data loss

Page 22: Building an Activity Feed with Cassandra

Our Rules

Page 23: Building an Activity Feed with Cassandra

• User’s feed is comprised of entities with one set of actions

• User’s feed only contains one of any given entity

• An entity’s set of actions contains up to seven of the most recent actions taken by that user’s network

Page 24: Building an Activity Feed with Cassandra

Planning

Page 25: Building an Activity Feed with Cassandra

Language Support

• Most services on Behance are PHP

• No official Datastax PHP driver

Page 26: Building an Activity Feed with Cassandra

–Mark Dunphy, 2014

“Looks like I’m learning python.”

Page 27: Building an Activity Feed with Cassandra

Go to ProductionNo, nothing is working yet. I didn’t skip a slide.

Page 28: Building an Activity Feed with Cassandra

• App/cluster in production before anything works

• Test real life load

• Fail spectacularly without anybody noticing

• Deploy risky changes without fear

• Run alongside MongoDB

Page 29: Building an Activity Feed with Cassandra

January 19th, 2015

Page 30: Building an Activity Feed with Cassandra

Query Patterns• “Create your data models based on the queries

you want to run” - Basically Everybody

• Wanted to…

• Read a user’s feed entities by type and time of most recent action…separately.

• Write/Update a user’s feed entities with new actions while knowing only user id and entity id

Page 31: Building an Activity Feed with Cassandra

Data Models

Page 32: Building an Activity Feed with Cassandra

–Mark Dunphy, January 2015

“An UPDATE in Cassandra works like an UPSERT! Let’s store the user’s entire feed in a

single row in a table! It’s so simple!”

First Data Model

Page 33: Building an Activity Feed with Cassandra

CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int);

CREATE TYPE activity.entity ( entity_type_id int, entity_id int);

Page 34: Building an Activity Feed with Cassandra

CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id))

Page 35: Building an Activity Feed with Cassandra

CREATE TABLE activity.feeds ( modified_entities list<frozen<entity>>, modified_on timestamp, project_ids list<int>, user_id int, wip_revision_ids list<int>, PRIMARY KEY(user_id))

Page 36: Building an Activity Feed with Cassandra

First Data Model

Page 37: Building an Activity Feed with Cassandra

First Data Model

Moments Before Everything Exploded

Page 38: Building an Activity Feed with Cassandra

–Mark Dunphy, January 2015

“Okay let’s keep nearly the same model, but use INSERT and DELETE instead of always

UPDATE. Just use batch statements.”

Second Data Model

Page 39: Building an Activity Feed with Cassandra

Second Data Model

This was also a very very bad idea.

Page 40: Building an Activity Feed with Cassandra

• Lose the benefit of Cassandra being distributed

• All queries go through the same coordinator which puts a lot of stress and responsibility on one node.

• Use concurrency and prepared statements instead. Datastax drivers make this easy.

Second Data Model

Page 41: Building an Activity Feed with Cassandra

Second Data Model

Oops

Page 42: Building an Activity Feed with Cassandra

Okay…

Page 43: Building an Activity Feed with Cassandra

Now we’ve got it.

Page 44: Building an Activity Feed with Cassandra

Winning Data Model

Page 45: Building an Activity Feed with Cassandra

CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int);

Page 46: Building an Activity Feed with Cassandra

CREATE TABLE activity.projects ( created_on timestamp, user_id int, entity_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, created_on, entity_id))

Page 47: Building an Activity Feed with Cassandra

CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id))

Page 48: Building an Activity Feed with Cassandra

Much Nicer

Page 49: Building an Activity Feed with Cassandra

Write Strategy• “User A comments on Project A. User B follows

User A.”

• Request out to add the comment action to User B’s feed

• Read existing actions for that entity (Project A) in B’s feed. Push new action on top.

• Write new actions list into new “row” in projects table

Page 50: Building an Activity Feed with Cassandra

Read Strategy

• SELECT * FROM projects WHERE user_id = 123 AND created_on > 123214373

• Optimized for quick/easy reads. More important that a user’s feed loads quickly than it updating quickly.

• Use timestamp to “page” through data.

Page 51: Building an Activity Feed with Cassandra

Lessons Learned

• Duplicate your data to achieve desired queries. Storage is cheap. Writes are cheap.

• Think outside the box. Cassandra is not relational.

• Never ever ever ignore inserts/deletes in favor of an update only workflow. Never. It is literally insane.

Page 52: Building an Activity Feed with Cassandra

Final Specs• 16 node cluster on AWS EC2 c3.8xlarge

• Mix of SizeTieredCompactionStrategy and DateTieredCompactionStrategy

• NetworkTopologyStrategy

• Replication factor 3

• ConsistencyLevel = ONE for most requests

Page 53: Building an Activity Feed with Cassandra

Final Specs

• Bursty write volume. Consistent read volume.

• 5k to 80k writes per second

• 2k to 4k reads per second

Page 54: Building an Activity Feed with Cassandra

Questions?I might have answers.

Page 55: Building an Activity Feed with Cassandra

Thank you!

Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic