apache cassandra: nosql in the enterprise
DESCRIPTION
TRANSCRIPT
Apache Cassandra:NoSQL in theEnterprise, today
Jonathan Ellis CTO
@spyced
Cassandra Job Trends (indeed.com)
“Big Data” trend
Why Big Data Matters
Research done by McKinsey & Company shows the eye-opening, 10-year category growth rate differences between businesses that smartly use their big data and those that do not.
Big data
Analytics(Hadoop)
Realtime(“NoSQL”)
?
✤ Financial✤ Social Media✤ Advertising✤ Entertainment✤ Energy✤ E-tail✤ Health care✤ Government
Some users
Common use cases
✤ Time series data✤ Messaging✤ Ad tracking✤ Data mining✤ User activity streams✤ User sessions✤ Anything requiring:
Scalable + performant + highly available
Why Cassandra?
✤ Fully distributed, no SPOF✤ Multi-master, multi-DC✤ Linearly scalable✤ Larger-than-memory datasets✤ Best-in-class performance (not just writes!)✤ Fully durable✤ Integrated caching✤ Tuneable consistency
Classing partitioning with SPOF
master
slave
slave
partition 1 partition 2 partition 3 partition 4
request router
Fully distributed, no SPOF
client
p1
p1
p1p3
p6
Performance summary
“With Cassandra, we get better business agility, and we don’t have to plan capacity in advance, we don’t need to ask permission of other people to build things for us, and we don’t worry about running out of space or power.”
Adrian Cockcroft, Cloud Architect
Netflix on Cassandra
✤ Could not build datacenters fast enough✤ Made decision to go to cloud (AWS)✤ Applications include Netflix’s subscriber system, AB
testing, and viewing history service
✤ Over a year in, Netflix finds Cassandra to be✤ Fast✤ Cost-effective✤ Scalable✤ Flexible✤ Reliable: no SPOF
“Without Cassandra, our engineers would’ve had to create something that could scale to our needs, that would’ve prevented us from focusing on building product and solving problems for Backupify’s users, which are far more important tasks.”
Matt Conway, VP Engineering
Backupify on Cassandra
✤ Cloud-based utility that enables businesses and consumers to backup, search and restore the content of popular online applications such as Google Apps, Gmail, Facebook, Twitter, and Blogger
✤ Cassandra findings:✤ Solved scaling, allowing engineers to focus on their business✤ DataStax OpsCenter made it easy to monitor the health and
performance of their cluster✤ Reliable, redundant and scalable data storage helped
eliminate down-time✤ Ability to offer both backup and storage, but also analysis
“You can seamlessly add new nodes and expand your total capacity without deteriorating the performance of the data store. Cassandra has allowed us to scale very effectively.”
Harry Robertson, Tech Lead
Ooyala on Cassandra
✤ Ooyala provides a suite of technologies and services that support content owners in managing, analyzing and monetizing the digital video they publish online
✤ Cassandra findings:✤ Classic “Big Data” problem did not require re-architecting✤ Delivered ability to respond to increasingly sophisticated
analytic needs of customers✤ Developers spend time building application features, not
figuring out how to scale
“Cassandra has allowed us to build bigger features faster and more reliably, while using less money and without needing to expand our staff.”
Kyle Ambroff, Sr. Engineer
Formspring on Cassandra
✤ Users of Formspring engage with and learn more about each other by asking and responding to questions. Close to 4B responses in the system and 30M unique users
✤ Cassandra experience✤ No sharding needed – just add nodes to scale✤ Performance – the popular users with many followers saw no
speed reduction. No more memcached!✤ Flexibility of a schema-optional architecture is very developer
friendly
Big data
Analytics(Hadoop)
Realtime(“NoSQL”)
?
The evolution of Analytics
Analytics + Realtime
The evolution of Analytics
Analytics Realtime
replication
The evolution of Analytics
ETL
Big data
Analytics(Hadoop)
Realtime(“NoSQL”)
DatastaxEnterprise
DataStax Enterprise re-unifiesrealtime and analytics
Portfolio Demo dataflow
Portfolios
Historical Prices
Intermediate Results
Largest loss
Portfolios
Live Prices for today
Largest loss
Operations
✤ “Vanilla” Hadoop✤ 8+ services to setup, monitor, backup, and recover
(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...)
✤ Single points of failure✤ Can't separate online and offline processing
✤ DataStax Enterprise✤ Single, simplified component✤ Self-organizes based on workload✤ Peer to peer✤ JobTracker failover
Managing & Monitoring Big Data
✤ DataStax OpsCenter manages and monitors all Cassandra and Hadoop operations
Questions?