2015 03-19 - cassandra for .net developers
TRANSCRIPT
Apache CassandraC* (Not Only) for C# developers
Grégory Ruiz (@gregoryruiz)
.NET Tech Lead at
NoSQL - Season01 Episode02(March 19, 2015)
§1: Relational → NoSQL§2: Firebase§3: Cassandra
1. What is Cassandra anyway?
2. How does Cassandra work?
3. Which tools to play with Cassandra?
Agenda
● open source NoSQL distributed DBMS
● kind of a hybrid betweenkey-value & column-oriented database
● very high availability● painless scalability● predictable performances● fault tolerant resistant
1. Cassandra? / Brief Definition
datastaxhttp://www.datastax.com/documentation/cassandra/2.0/cassandra/gettingStartedCassandraIntro.html
Cassandra -- affectionately “C*” -- is an open source, distributed store for structured data that scales-out on cheap, commodity hardware and stays up even when things get really bad.
-- DataStax
1. Cassandra? / Academic Background
the scalable & efficient storage model from[2006]
the fault resistant & distributed design from[2007] Dynamo
Rick Branson - How do I Cassandra ?http://fr.slideshare.net/rbranson/how-do-i-cassandra
● early 2008developed @ Facebook
● july 2008open source project
● march 2009Apache Incubator
● february 2010top-level project
1. Cassandra? / Early Timeline
wikipediahttp://en.wikipedia.org/wiki/Apache_Cassandra
● on march 2015○ 8th most popular DBMS○ 2nd most popular NoSQL
○ 1st most popular wide-column store
● Prominent users
1. Cassandra? / Popularity Rank
db engineshttp://db-engines.com/en/system/Cassandra
● node● data center● cluster● gossip: p2p protocol
2. Architecture / Node, Cluster, Gossip
⇒ no SPOF
⇒ masterless architecture
● partition key● consistent hashing● nodes own hash ranges
2. Architecture / Partitionner, Hash Ranges
Netflix Tech Bloghttp://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
2. Architecture / Painless Scalability
2. Architecture / Fault Tolerance Resistance
RDBMS (comparison)Key-Value~Column oriented / TabularDocument oriented
MS-SQL
Vertica
Dynamo
Cassandra
CouchDB
Redis
HyperTable
MongoDB
C* = _AP system Eventually Consistent
Data Replication
Replication Strategy :
the nodes where replicas are placed
Replication Factor (RF) :
the total number of replicas across the cluster
2. Architecture / Fault Tolerance Resistance
Data Replication :
Simple Strategy
CREATE KEYSPACE “Test” WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
2. Architecture / Fault Tolerance Resistance
Data Replication :
Net. Topology Strategy
CREATE KEYSPACE “Test” WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DCEurope' : 3, 'DCAsia' : 4 };
2. Architecture / Fault Tolerance Resistance
● ALL++ consistency, -- availability
● QUORUM / EACH_QUORUM / LOCAL_QUORUM+ consistency, - availability
● ONE, TWO, THREE / LOCAL_ONE- consistency, + availability
● ANY-- consistency, ++ availability
2. Architecture / Fault Tolerance Resistance
2. Data Modeling / shift your vocabulary
RDBMS C*
Structured Data,Fixed Schema
Unstructured Data,Flexible Schema
Array of Arrays ⇒ “2D”(row X column)
Nested K-V pairs ⇒ “3D”(rowKey X colKey X colValue)
Schema / Database KeySpace
Table Column Family Table
Row Partition (aka Row)⇒ unit of replication
Column Cluster (aka Column)⇒ unit of storage
Foreign Keys, Joins … ACID no Ref Integrity, Collections … CAP
2. Data Modeling / Column Family Table
Geo age:30 car:camaro gender:M
Line age:20 car:clio gender:F
Céd age:15 gender:M
Greg age:60 gender:M
2. Data Modeling / Column Family Table
Geo age:30 car:camaro gender:M
Line age:20 car:clio gender:F
Céd age:15 gender:M
Greg age:60 gender:M
Row Key
2. Data Modeling / Column Family Table
Geo age:30 car:camaro gender:M
Line age:20 car:clio gender:F
Céd age:15 gender:M
Greg age:60 gender:M
Columns
2. Data Modeling / Column Family Table
Geo age:30 car:camaro gender:M
Line age:20 car:clio gender:F
Céd age:15 gender:M
Greg age:60 gender:M
Columns
2. Data Modeling / Column Family Table
Line1 age:202015/03/19 12:15
car:clio2015/03/19 12:15
gender:F2015/03/19 12:19
Line2 age:202015/03/19 12:15
car:clio2015/03/19 12:15
gender:F2015/03/19 12:19
Line3 age:202015/03/19 12:15
car:clio2015/03/19 12:15
gender:F2015/03/19 12:19
INSERT INTO coworker(id,car) VALUES (’Line’,’merco’);
2. Data Modeling / Column Family Table
Line1 age:202015/03/19 12:15
car:clio2015/03/19 12:15
gender:F2015/03/19 12:19
Line2 age:202015/03/19 12:15
car:clio2015/03/19 12:15
gender:F2015/03/19 12:19
Line3 age:202015/03/19 12:15
car:clio2015/03/19 12:15
gender:F2015/03/19 12:19
INSERT INTO coworker(id,car) VALUES (’Line’,’merco’);
2015/03/19 18:30 > 2015/03/19 12:15
2. Data Modeling / Column Family Table
Line1 age:202015/03/19 12:15
car:clio2015/03/19 12:15
gender:F2015/03/19 12:19
Line2 age:202015/03/19 12:15
car:merco2015/03/19 18:30
gender:F2015/03/19 12:19
Line3 age:202015/03/19 12:15
car:clio2015/03/19 12:15
gender:F2015/03/19 12:19
INSERT INTO coworker(id,car) VALUES (’Line’,’merco’);
2015/03/19 18:30 > 2015/03/19 12:15
Line1 age:202015/03/19 12:15
car:merco2015/03/19 18:30
gender:F2015/03/19 12:19
Line2 age:202015/03/19 12:15
car:merco2015/03/19 18:30
gender:F2015/03/19 12:19
Line3 age:202015/03/19 12:15
car:merco2015/03/19 18:30
gender:F2015/03/19 12:19
2. Data Modeling / Column Family Table
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time));
/******/
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES (’1234ABCD’,’2013-04-03 07:01:00′,’72F’);
/******/
SELECT event_time,temperature FROM temperature
WHERE weatherstation_id=’1234ABCD’;
/******/
SELECT temperature FROM temperature
WHERE weatherstation_id=’1234ABCD’
AND event_time > ’2013-04-03 07:01:00′ AND event_time < ’2013-04-03 07:04:00′;
2. Data Modeling / ex: Time Series, Single Device/row
CREATE TABLE temperature_by_day (
weatherstation_id text,
date text,
event_time timestamp,
temperature text,
PRIMARY KEY ((weatherstation_id,date),event_time));
/******/
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES (’1234ABCD’,’2013-04-03′,’2013-04-03 07:01:00′,’72F’);
/******/
SELECT *
FROM temperature_by_day
WHERE weatherstation_id=’1234ABCD’
AND date=’2013-04-03′;
2. Data Modeling / ex: Time Series, Partitioning
CREATE TABLE latest_temperatures (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time),
) WITH CLUSTERING ORDER BY (event_time DESC);
/******/
INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature)
VALUES (’1234ABCD’,’2013-04-03 07:03:00′,’72F’) USING TTL 20;
2. Data Modeling / ex: Time Series, DESC & TTL
● wikipedia○ http://en.wikipedia.org/wiki/Apache_Cassandra
● db rank○ http://db-engines.com/en/ranking
● netflix○ http://techblog.netflix.com/2011/11/benchmarking-
cassandra-scalability-on.html
● C* wiki○ https://wiki.apache.org/cassandra○ http://www.datastax.com/docs○ http://planetcassandra.org
5. References
Questions?
Special thanks
● Geoffrey Garnotel● Aline Paponaud● Cédric Tesnière
● my sweet little bae
● you● my coffee maker