c*ollege credit: is my app a good fit for cassandra?
TRANSCRIPT
Is My App A Good Fit For Cassandra?
Eric Lubow @elubow [email protected]
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Overview • Planning
• Data Stores
• Comparisons
• Use/Cases
• Final Thoughts
• Questions
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Where am I
• Planning Stages
• MVP (Minimum Viable Product)
• Iteration
• “Final Decision”
Is My App A Good Fit For Cassandra Eric Lubow @elubow
What Am I Building
• User App
• Hobby Project
• Learning Project
• Big Data System
Is My App A Good Fit For Cassandra Eric Lubow @elubow
What is Big Data • Depends on the user
• Bigger Than Excel
• Bigger Than One Server
• Bigger Than One Rack
Is My App A Good Fit For Cassandra Eric Lubow @elubow
• Even with the right tools, 80% of the work of building a big data system is acquiring and refining the raw data into usable data.
Big Data Truth Bomb
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Other Financial
Data
Planning Questions
Tech • Do I have legal requirements (HIPAA/FIPS/Sarbanes Oxley/PII)?
• What kind of enterprise support is available?
• What is the community like?
• Does the product roadmap pertain to my roadmap?
• Are my display requirements for realtime data?
• Do I need to aggregate data on the fly?
• Is my data structured or unstructured?
• Does my data lend itself to a specific design pattern?
• What are my query patterns?
• Is my data ingestion high volume/high velocity?
• Am I batch loading data?
• Am I write heavy or read heavy?
• Are data relationships important?
• Does my data need to be immediately available everywhere?
• Am I cloud based?
• Am I hardware based?
• Am I a cloud/iron hybrid?
• How much am I willing to spend?
• How much am I willing to spend if something goes wrong?
• How fault tolerant is the system?
• What supporting tools do I need?
• Is there support for my language?
• Is the encryption/authentication/authorization support sufficient for my needs?
• Are there monitoring architectures already built?
• Are there best practices guides already
• Will the data need to be distributed?
Data Tech
Financial Other
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Tools C*
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Languages
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Right Tool For The Job
Is My App A Good Fit For Cassandra Eric Lubow @elubow
• Large data volume ingestion at high velocity
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows (slicing)
• Opscenter
• Data toolkit: more than a data storage layer
• TTLs for small group aggregation
Cassandra C*
Is My App A Good Fit For Cassandra Eric Lubow @elubow
• RowKey: 1345161600000:b198fa61-833a-6e78-fb83-233ec50b356e• => (column=facebook:1345162260136000, value={"like_count":17,"url":"http://mysite.com/586352/celebrities-with-children/"},
timestamp=1345162260136000)• => (column=facebook:1345162260167000, value={"like_count":18,"url":"http://mysite.com/586352/celebrities-with-children/"},
timestamp= 1345162260167000)• => (column=facebook:1345162260261564, value={"like_count":21,"url":"http://mysite.com/586352/celebrities-with-children/"},
timestamp= 1345162260261564)• => (column=pageviews:1345162259307830, value={"user-agent":"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0;
OfficeLiveConnector.1.3; OfficeLivePatch.0.0; InfoPath.1; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 2.0.50727)","languages":"es-ve","user-id":"aede5694-3eb3-4cd0-810d-99d6bc2e0cb5","ip":"186.24.6.80"}, timestamp=1345162259307830)
• => (column=pageviews:1345162259302140, value={"user-agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)","languages":"en-US","user-id":"a85679ab-9fd7-4aeb-93ab-2b66eddcf66a","ip":"192.168.255.182"}, timestamp=1345162259302140)
• => (column=pageviews:1345162259302000, value={"referrer":"http://www.tv-links.eu/_gate_way.html?data=VfMjMzOTE2Nw==","user-agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)","languages":"en-NZ","user-id":"ba0c6320-c4ca-4cb8-b5d4-e6ef21dbdc3c","ip":"219.89.75.163"}, timestamp=1345162259302000)
• => (column=pageviews:1345162259402000, value={"referrer":"http://foo.com/pop-culture/2012/09/40-Most-Weird-Comics-Ever","user-agent":"Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11","languages":"en-US,en;q=0.8","user-id":"899f51ab-3e08-475a-9392-7eee5446edc3","ip":"24.118.178.215"}, timestamp=1345162259402000)
• => (column=twitter:1345162260246000, value={"count":17,"url":"http://mysite.com/586352/celebrities-with-children/"}, timestamp=1345162260246000)
Cassandra Data
Is My App A Good Fit For Cassandra Eric Lubow @elubow
• Fast atomic increments (Node.js is native JSON)
• Sharding
• Solid ORM for Rails (MongoID)
• Fast access for pub/sub of durable/persisted documents
• B-Tree Indexes
• Document based via JSON
• TTLs for ephemeral data
MongoDB
Is My App A Good Fit For Cassandra Eric Lubow @elubow
• { "_id" : ObjectId("505a089275885cc53cd66520"),• "account_id" : ObjectId("4e87f81ca782f3404200000a"),• "day" : ISODate("2012-01-01T00:00:00Z"),• "md5" : "54f762d1025aadd6e2687005db657dac", • "stats" : {• "sum" : { "fb" : 108, "fba" : 108, "fbc" : 1, "fbl" : 71, "fbr" : 326, "fbs" : 36, "gp" : 2,
"gpa" : 2, "li" : 3, "lia" : 3, "p" : 1840, "pi" : 1, "pia" : 1, "pspv" : 859.384, "soca" : 173, "socr" : 493, "srchr" : 27, "srt" : 86.48748533542772, "su" : 1, "sua" : 1, "tw" : 58, "twa" : 58, "twflc" : 4025418, "twfrc" : 139758, "twp" : 50, "twpa" : 50, "twr" : 167 },
• "18" : { "sum" : { "fb" : 4, "fba" : 4, "fbl" : 2, "fbr" : 2, "fbs" : 2, "p" : 179, "pspv" : 105.1336, "soca" : 18, "socr" : 8, "srchr" : 10, "srt" : 60.337503923146954, "srtv" : 89.4550357667952, "tw" : 14, "twa" : 14, "twflc" : 107842, "twfrc" : 108111, "twg" : 8, "twp" : 7, "twpa" : 7, "twr" : 6 } },
• "19" : { "sum" : { "fb" : 63, "fba" : 63, "fbl" : 40, "fbr" : 179, "fbs" : 23, "gp" : 2, "gpa" : 2, "p" : 498, "pi" : 1, "pia" : 1, "pspv" : 278.6148999999999, "soca" : 74, "socr" : 200, "srchr" : 5, "srt" : 74.27775496525277, "srtv" : 89.71819309386892, "tw" : 8, "twa" : 8, "twflc" : 9941, "twfrc" : 4228, "twg" : 7, "twp" : 7, "twpa" : 7, "twr" : 21 } } } }
MongoDB Data
Is My App A Good Fit For Cassandra Eric Lubow @elubow
• Supports hundreds of thousands transactions per second
• Great caching engine
• Supports useful variable types like sets, sorted set, lists
• Everything is guaranteed to Memory Mapped (mmap)
• Transactional and supports bulk operations
• Centralized queueing and locking system
Redis
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Cons • Redis
• Can only utilize a single core
• Data must be smaller than memory
• No clustering
• Cassandra
• No btree indexes
• Mongo
• Non-hashed shard keys
• Indexes must fit in memory.
• Forced replica ping times.
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Use Cases • Time Series
• Counters
• Feed Based Activity
• Large Amounts of Data
Is My App A Good Fit For Cassandra Eric Lubow @elubow
The Cloud • Open source libraries into the API
• Auto-scaling for magical scalabilty
• Quickly test assumptions
• Spot Instances
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Support and Expertise • What happens when you need help?
• How do you become experts?
• What happens when you need more experts?
Is My App A Good Fit For Cassandra Eric Lubow @elubow
Summary • Have answers to the important questions
• Know your data read/write patterns
• Know the tools available to you
• Know your compromises