cassandra community webinar: apache cassandra internals

Post on 26-Jan-2015

128 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years. In this webinar Aaron Morton will step through read and write requests, automatic processes and manual maintenance tasks. He will also discuss the general approach to solving the problem and drill down to the code responsible for implementation. Speaker: Aaron Morton, Apache Cassandra Committer Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010 he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.

TRANSCRIPT

CASSANDRA COMMUNITY WEBINARS AUGUST 2013

CASSANDRA INTERNALS

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

About The Last PickleWork with clients to deliver and improve

Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP,

Hector Maintainer, 6+ years combined Cassandra experience.

Based in New Zealand & Austin, TX.

ArchitectureCode

www.thelastpickle.com

Cassandra Architecture

API's

Cluster Aware

Cluster Unaware

Clients

Disk

www.thelastpickle.com

Cassandra Cluster Architecture

API's

Cluster Aware

Cluster Unaware

Clients

Disk

API's

Cluster Aware

Cluster Unaware

Disk

Node 1 Node 2

www.thelastpickle.com

Dynamo Cluster Architecture

API's

Dynamo

Database

Clients

Disk

API's

Dynamo

Database

Disk

Node 1 Node 2

www.thelastpickle.com

ArchitectureAPI

DynamoDatabase

www.thelastpickle.com

API Transports

ThriftNative Binary

www.thelastpickle.com

Thrift Transport

//Custom TServer implementations

o.a.c.thrift.CustomTThreadPoolServero.a.c.thrift.CustomTNonBlockingServero.a.c.thrift.CustomTHsHaServer

www.thelastpickle.com

API Transports

ThriftNative Binary

www.thelastpickle.com

Native Binary Transport

Beta in Cassandra 1.2Uses Netty

Enabled with start_native_transport

(Disabled by default)

www.thelastpickle.com

o.a.c.transport.Server.run()

//Setup the Netty servernew ExecutionHandler()new NioServerSocketChannelFactory()ServerBootstrap.setPipelineFactory()

www.thelastpickle.com

o.a.c.transport.Message.Dispatcher.messageReceived()

//Process message from clientServerConnection.validateNewMessage()Request.execute()ServerConnection.applyStateTransition()Channel.write()

www.thelastpickle.com

Messages

Defined in the Native Binary Protocol

$SRC/doc/native_protocol.spec

www.thelastpickle.com

API Services

JMXThrift

CQL 3

www.thelastpickle.com

JMX Management Beans

Spread around the code base.

Interfaces named *MBean

www.thelastpickle.com

JMX Management Beans

Registered with names such as org.apache.cassandra.db:

type=StorageProxy

www.thelastpickle.com

API Services

JMXThriftCQL 3

www.thelastpickle.com

o.a.c.thrift.CassandraServer

// Implements Thrift Interface// Access control// Input validation// Mapping to/from Thrift and internal types

www.thelastpickle.com

Thrift Interface

Thrift IDL$SRC/interface/cassandra.thrift

www.thelastpickle.com

o.a.c.thrift.CassandraServer.get_slice()

// get columns for one rowTracing.begin()ClientState cState = state()cState.hasColumnFamilyAccess()multigetSliceInternal()

www.thelastpickle.com

CassandraServer.multigetSliceInternal()

// get columns for may rowsThriftValidation.validate*()// Create ReadCommandsgetSlice()

www.thelastpickle.com

CassandraServer.getSlice()

// Process ReadCommands// return Thrift types

readColumnFamily()thriftifyColumnFamily()

www.thelastpickle.com

CassandraServer.readColumnFamily()

// Process ReadCommands// Return ColumnFamilies

StorageProxy.read()

www.thelastpickle.com

API Services

JMXThrift

CQL 3

www.thelastpickle.com

o.a.c.cql3.QueryProcessor

// Prepares and executes CQL3 statements// Used by Thrift & Native transports// Access control// Input validation// Returns transport.ResultMessage

www.thelastpickle.com

CQL3 Grammar

ANTLR Grammar$SRC/o.a.c.cql3/Cql.g

www.thelastpickle.com

o.a.c.cql3.statements.ParsedStatement

// Subclasses generated by ANTLR// Tracks bound term count// Prepare CQLStatementprepare()

www.thelastpickle.com

o.a.c.cql3.statements.CQLStatement

checkAccess(ClientState state)validate(ClientState state)execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables)

www.thelastpickle.com

statements.SelectStatement.RawStatement

// Implements ParsedStatement// Input validationprepare()

www.thelastpickle.com

statements.SelectStatement.execute()

// Create ReadCommandsStorageProxy.read()

www.thelastpickle.com

ArchitectureAPI

DynamoDatabase

www.thelastpickle.com

Dynamo Layero.a.c.service

o.a.c.net

o.a.c.dhto.a.c.gms

o.a.c.locatoro.a.c.stream

www.thelastpickle.com

o.a.c.service.StorageProxy

// Cluster wide storage operations// Select endpoints & check CL available// Send messages to Stages// Wait for response// Store Hints

www.thelastpickle.com

o.a.c.service.StorageService

// Ring operations// Track ring state// Start & stop ring membership// Node & token queries

www.thelastpickle.com

o.a.c.service.IResponseResolver

preprocess(MessageIn<T> message)resolve() throws DigestMismatchException

RowDigestResolverRowDataResolverRangeSliceResponseResolver

www.thelastpickle.com

Response Handlers / Callback

implements IAsyncCallback<T>

response(MessageIn<T> msg)

www.thelastpickle.com

o.a.c.service.ReadCallback.get()

//Wait for blockfor & datacondition.await(timeout, TimeUnit.MILLISECONDS)

throw ReadTimeoutException()

resolver.resolve()

www.thelastpickle.com

o.a.c.service.StorageProxy.fetchRows()

getLiveSortedEndpoints()new RowDigestResolver()new ReadCallback()MessagingService.sendRR()---------------------------------------ReadCallback.get() # blockingcatch (DigestMismatchException ex)catch (ReadTimeoutException ex)

www.thelastpickle.com

Dynamo Layero.a.c.service

o.a.c.net

o.a.c.dhto.a.c.gms

o.a.c.locatoro.a.c.stream

www.thelastpickle.com

o.a.c.net.MessagingService.verb<<enum>>

MUTATIONREADREQUEST_RESPONSETREE_REQUESTTREE_RESPONSE

(And more...)

www.thelastpickle.com

o.a.c.net.MessagingService.verbHandlers

new EnumMap<Verb, IVerbHandler>(Verb.class)

www.thelastpickle.com

o.a.c.net.IVerbHandler<T>

doVerb(MessageIn<T> message, String id);

www.thelastpickle.com

o.a.c.net.MessagingService.verbStages

new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class)

www.thelastpickle.com

o.a.c.net.MessagingService.receive()

runnable = new MessageDeliveryTask( message, id, timestamp);

StageManager.getStage( message.getMessageType());

stage.execute(runnable);

www.thelastpickle.com

o.a.c.net.MessageDeliveryTask.run()

// If dropable and rpc_timeoutMessagingService.incrementDroppedMessag

es(verb);

MessagingService.getVerbHandler(verb)verbHandler.doVerb(message, id)

www.thelastpickle.com

ArchitectureAPI Layer

Dynamo LayerDatabase Layer

www.thelastpickle.com

Database Layero.a.c.concurrent

o.a.c.db

o.a.c.cacheo.a.c.io

o.a.c.trace

www.thelastpickle.com

o.a.c.concurrent.StageManager

stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class);

getStage(Stage stage)

www.thelastpickle.com

o.a.c.concurrent.Stage

READMUTATIONGOSSIPREQUEST_RESPONSEANTI_ENTROPY

(And more...)www.thelastpickle.com

Database Layero.a.c.concurrent

o.a.c.db

o.a.c.cacheo.a.c.io

o.a.c.trace

www.thelastpickle.com

o.a.c.db.Table

// Keyspaceopen(String table)getColumnFamilyStore(String cfName)

getRow(QueryFilter filter)apply(RowMutation mutation, boolean writeCommitLog)

www.thelastpickle.com

o.a.c.db.ColumnFamilyStore

// Column FamilygetColumnFamily(QueryFilter filter)getTopLevelColumns(...)

apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)

www.thelastpickle.com

o.a.c.db.IColumnContainer

addColumn(IColumn column)remove(ByteBuffer columnName)

ColumnFamilySuperColumn

www.thelastpickle.com

o.a.c.db.ISortedColumns

addColumn(IColumn column, Allocator allocator)removeColumn(ByteBuffer name)

ArrayBackedSortedColumnsAtomicSortedColumnsTreeMapBackedSortedColumns

www.thelastpickle.com

o.a.c.db.Memtable

put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)

flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context)

www.thelastpickle.com

o.a.c.db.ReadCommand

getRow(Table table)

SliceByNamesReadCommandSliceFromReadCommand

www.thelastpickle.com

o.a.c.db.IDiskAtomFilter

getMemtableColumnIterator(...)getSSTableColumnIterator(...)

IdentityQueryFilterNamesQueryFilterSliceQueryFilter

www.thelastpickle.com

SummaryCustomTThreadPoolServer Message.Dispatcher

CassandraServer QueryProcessor

ReadCommand

StorageProxy

IResponseResolver

IAsyncCallback

MessagingService

IVerbHandler

Table ColumnFamilyStore IDiskAtomFilter

API

Dynamo

Database

www.thelastpickle.com

Thanks.

www.thelastpickle.com

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

top related