cassandra - deep dive

CassandraA Decentralized Structured Storage System

By Sameera Nelson

Outline …

Introduction

Data Model

System Architecture

Failure Detection & Recovery

Local Persistence

Performance

Statistics

What is Cassandra ?

Distributed Storage System

Manages Structured Data

Highly available , No SPoF

Not a Relational Data Model

Handle high write throughput

◦ No impact on read efficiency

Motivation

Operational Requirements in Facebook

◦ Performance

◦ Reliability/ Dealing with Failures

◦ Efficiency

◦ Continues Growth

Application◦ Inbox Search Problem, Facebook

Similar Work

Google File System◦ Distributed FS, Single master/Slave

Ficus/ Coda

◦ Distributed FS

Farsite

◦ Distributed FS, No centralized server

Bayou◦ Distributed Relational DB System

Dynamo

◦ Distributed Storage system

Data Model

Figure from Eben Hewitt’s slides.

Supported Operations

insert(table; key; rowMutation)

get(table; key; columnName)

delete(table; key; columnName)

Query Language

CREATE TABLE users

( user_id int PRIMARY KEY,

fname text,

lname text );

INSERT INTO users

(user_id, fname, lname) VALUES (1745, 'john', 'smith');

SELECT * FROM users;

Data Structure

Log-Structured Merge Tree

System Architecture

Architecture

Fully Distributed …No Single Point of Failure

Cassandra Architecture

PartitioningData distribution across nodes

ReplicationData duplication across nodes

Cluster MembershipNode management in cluster

adding/ deleting

Partitioning

The Token Ring

Partitioning Partitions using Consistent hashing

Partitioning Assignment in to the relevant partition

Partitioning, Vnodes

Replication

Based on configured replication factor

Replication

Different Replication Policies

◦Rack Unaware

◦Rack Aware

◦Data center Aware

Cluster Membership

Based on scuttlebutt

Efficient Gossip based mechanism

Inspired for real life rumor

spreading.

Anti Entropy protocol

◦ Repair replicated data by comparing &

reconciling differences

Cluster Membership

Gossip Based

Failure Detection &

Recovery

Failure DetectionTrack state

◦ Directly, Indirectly

Accrual Detection mechanism

Permanent Node change

◦ Admin should explicitly add or remove

◦ Data to be replayed in replication

◦ Saved in system.hints table

Accrual Failure Detector

• Node is faulty, suspicion level

monotonically increases.

• Φ(t) k• k - threshold variable

• Node is correct

• Φ(t) = 0

Local Persistence

Write Request

Write Operation

Write OperationLogging data in commit log/ memtable

Flushing data from the memtable

◦Flushing data on threshold

Storing data on disk in SSTables Mark with tombstone

Compaction Remove deletes, Sorts, Merges data,

consolidation

Write Operation

Compaction

Read RequestDirect/ Background (Read repair)

Read Operation

Delete Operation

Data not removed immediately

Only Tombstone is written

Deleted in Compacting Process

Additional Features

Adding compression Snappy Compression

Secondary index support

SSL support

◦ Client/ Node

◦ Node/ Node

Rolling commit logs

SSTable data file merging

Performance

High Throughput & Low Latency

◦ Eliminating on-disk data modification

◦ Eliminate erase-block cycles

◦ No Locking for concurrency control

◦ Maintaining integrity not required

High Availability

Linear Scalability

Fault Tolerant

Statistics

Stats from Netflix

Liner scalability

Stats from Netflix

Some users

Thank you

Read Detailed

Structure

cassandra - deep dive

operation data

structured data

hints data

merges data

operation logging data

data model figure

disk data modification

relational data model

Technology

mobility deep dive

bre deep dive

deep dive into cassandra

fcoe deep dive

dynamodb deep dive

a deep dive into understanding apache cassandra

the deep dive

zfs deep dive

nosql deep dive mit cassandra -...

powerapps deep dive

docker deep dive

netwrix auditor deep dive deep€¦ · deep dive netwrix...

mts deep dive data center and csd deep dive optiq

budgeting deep dive

nehalem deep dive

wcf- basichttprelay deep dive - microsoft adapter deep dive...

cdi deep dive

starling deep dive

deep dive xamarin.android

deep dive phonegap