coming to cassandra from relational world
TRANSCRIPT
Coming to Cassandra from Relational World
April 2016
Nenad [email protected]@smartcat.io
SmartCatwww.smartcat.io
@SmartCat_io
Relational Databases
• easy to start with, need only SQL
• can be used without knowledge of internals
• we choose the vendor which we know best
• limitless indexes and query patterns
We Live in Data Intensive Environment
• Everything is connected to internet, expansion of IoT, social networks
• 90% of stored data is gathered in the last 2 years
• Enterprise is moving to BASE
Agenda
• NoSQL space
• Cassandra architecture
• How (not) to start with Cassandra
• Cassandra learning path
NoSQL Space
NoSQL
• they are not hype anymore
• not general purpose storage, built to solve specific use case
• built with performance and scalability in mind
• each type, even more, each vendor is different
http://bigdata-blog.com/key-value-database
Polyglot Persistence
http://martinfowler.com/bliki/PolyglotPersistence.html
Cassandra
Cassandra Overview
• Data is available, partitioned with tunable consistency
• Both data and load is distributed
• Replication factor - how many replicas
• Masterless architecture
• Fast, scalable and fault tolerant
• Native multi-datacenter support
Architecture
Client contact
Architecture
Client request
Consistency level 1Replication factor 3
Architecture
Client request
response
Consistency level 1Replication factor 3
Architecture
DC1 DC2
Cluster
Single - Row Partitions
http://www.slideshare.net/planetcassandra/datastax-rigorous-cassandra-data-modeling-for-the-relational-data-architect
Multi - Row Partitions
http://www.slideshare.net/planetcassandra/datastax-rigorous-cassandra-data-modeling-for-the-relational-data-architect
Data Modeling
• query based modeling
• data is denormalized
• data is duplicated
Use Cases
• when high availability is crucial, and eventual consistency is tolerable
• event sourcing
• logging continuous streams of data
• deep visitor analytics
• early prototyping with significant query changes
• referential integrity required
• dynamic access patterns on data
Cassandra in Data Science
• native support for multi DC (main and analytics DC)
• Cassandra Spark connector with data locality
• Cassandra is suitable for web analytics
• SMACK stack - real time analytics
Our Two Cents on Cassandra
Red Flags - how not to start with Cassandra
• do not use Cassandra on single node
• watch out for write - read antipattern
• watch out for read - write antipattern
• change the way you model data, avoid relational data modeling
• do not be afraid of evolving data model
Learning path
• watch Martin Fowler Introduction to NoSQL video
• read history (DynamoDB and BigTable)
• finish up DS201: Cassandra Core Concepts course
• finish up DS220: Data Modeling course
• bonus: listen to Patrick McFaddin on topic of data modeling
Links
• SmartCat Blog post - How (not) to start with Cassandra
• SmartCat Blog post - Polyglot persistence in NoSQL space
• Kristof Kovacs - NoSQL database blog post
• Martin Fowler - Introduction to NoSQL
• DS201 - Cassandra Core Concepts
• DS220 - Data Modeling
Q&A
Thank you
Nenad Bozic@NenadBozicNs
SmartCatwww.smartcat.io
@SmartCat_io