apache con na 2013 - cassandra internals

81
APACHECON NORTH AMERICA 2013 CASSANDRA INTERNALS Aaron Morton @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Upload: aaronmorton

Post on 22-Apr-2015

4.475 views

Category:

Technology


1 download

DESCRIPTION

Talk from ApacheCon North America 2013 on Cassandra Internals by Aaron Morton.

TRANSCRIPT

Page 1: Apache Con NA 2013 - Cassandra Internals

APACHECON NORTH AMERICA 2013

CASSANDRA INTERNALS

Aaron Morton@aaronmorton

www.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Apache Con NA 2013 - Cassandra Internals

About MeFreelance Cassandra Consultant

Based in Wellington, New ZealandApache Cassandra Committer

Data Stax MVP for Apache Cassandra

Page 3: Apache Con NA 2013 - Cassandra Internals

ArchitectureCode

Page 4: Apache Con NA 2013 - Cassandra Internals

Cassandra Architecture

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Page 5: Apache Con NA 2013 - Cassandra Internals

Cassandra Cluster Architecture

API's

Cluster Aware

Cluster Unaware

Clients

Disk

API's

Cluster Aware

Cluster Unaware

Disk

Node 1 Node 2

Page 6: Apache Con NA 2013 - Cassandra Internals

Dynamo Cluster Architecture

API's

Dynamo

Database

Clients

Disk

API's

Dynamo

Database

Disk

Node 1 Node 2

Page 7: Apache Con NA 2013 - Cassandra Internals

ArchitectureAPI

DynamoDatabase

Page 8: Apache Con NA 2013 - Cassandra Internals

API Transports

ThriftNative Binary

Read LineRMI

Page 9: Apache Con NA 2013 - Cassandra Internals

Thrift Transport

//Custom TServer implementations

o.a.c.thrift.CustomTThreadPoolServero.a.c.thrift.CustomTNonBlockingServero.a.c.thrift.CustomTHsHaServer

Page 10: Apache Con NA 2013 - Cassandra Internals

API Transports

ThriftNative Binary

Read LineRMI

Page 11: Apache Con NA 2013 - Cassandra Internals

Native Binary Transport

Beta in Cassandra 1.2Uses Netty 3.5Enabled with

start_native_transport(Disabled by default)

Page 12: Apache Con NA 2013 - Cassandra Internals

o.a.c.transport.Server.run()

//Setup the Netty servernew ExecutionHandler()new NioServerSocketChannelFactory()ServerBootstrap.setPipelineFactory()

Page 13: Apache Con NA 2013 - Cassandra Internals

o.a.c.transport.Message.Dispatcher.messageReceived()

//Process message from clientServerConnection.validateNewMessage()Request.execute()ServerConnection.applyStateTransition()Channel.write()

Page 14: Apache Con NA 2013 - Cassandra Internals

o.a.c.transport.messages

CredentialsMessage()EventMessage()ExecuteMessage()PrepareMessage()QueryMessage()ResultMessage()

(And more...)

Page 15: Apache Con NA 2013 - Cassandra Internals

Messages

Defined in the Native Binary Protocol

$SRC/doc/native_protocol.spec

Page 16: Apache Con NA 2013 - Cassandra Internals

API Services

JMXCLI

ThriftCQL 3

Page 17: Apache Con NA 2013 - Cassandra Internals

JMX Management Beans

Spread around the code base.

Interfaces named *MBean

Page 18: Apache Con NA 2013 - Cassandra Internals

JMX Management Beans

Registered with the names such as

org.apache.cassandra.db:type=StorageProxy

Page 19: Apache Con NA 2013 - Cassandra Internals

API Services

JMXCLI

ThriftCQL 3

Page 20: Apache Con NA 2013 - Cassandra Internals

o.a.c.cli.CliMain.main()

// Connect to server to read inputthis.connect()this.evaluateFileStatements()this.processStatementInteractive()

Page 21: Apache Con NA 2013 - Cassandra Internals

CLI Grammar

ANTLR Grammar$SRC/src/java/o/a/c/cli/CLI.g

Page 22: Apache Con NA 2013 - Cassandra Internals

o.a.c.cli.CliClient.executeCLIStatement()

// Process statementCliCompiler.compileQuery() #ANTLRswitch (tree.getType()) case...

Page 23: Apache Con NA 2013 - Cassandra Internals

API Services

JMXCLI

ThriftCQL 3

Page 24: Apache Con NA 2013 - Cassandra Internals

o.a.c.thrift.CassandraServer

// Implements Thrift Interface// Access control// Input validation// Mapping to/from Thrift and internal types

Page 25: Apache Con NA 2013 - Cassandra Internals

Thrift Interface

Thrift IDL$SRC/interface/cassandra.thrift

Page 26: Apache Con NA 2013 - Cassandra Internals

o.a.c.thrift.CassandraServer.get_slice()

// get columns for one rowTracing.begin()ClientState cState = state()cState.hasColumnFamilyAccess()multigetSliceInternal()

Page 27: Apache Con NA 2013 - Cassandra Internals

CassandraServer.multigetSliceInternal()

// get columns for may rowsThriftValidation.validate*()// Create ReadCommandsgetSlice()

Page 28: Apache Con NA 2013 - Cassandra Internals

CassandraServer.getSlice()

// Process ReadCommands// return Thrift types

readColumnFamily()thriftifyColumnFamily()

Page 29: Apache Con NA 2013 - Cassandra Internals

CassandraServer.readColumnFamily()

// Process ReadCommands// Return ColumnFamilies

StorageProxy.read()

Page 30: Apache Con NA 2013 - Cassandra Internals

API Services

JMXCLI

ThriftCQL 3

Page 31: Apache Con NA 2013 - Cassandra Internals

o.a.c.cql3.QueryProcessor

// Prepares and executes CQL3 statements// Used by Thrift & Native transports// Access control// Input validation// Returns transport.ResultMessage

Page 32: Apache Con NA 2013 - Cassandra Internals

CQL3 Grammar

ANTLR Grammar$SRC/o.a.c.cql3/Cql.g

Page 33: Apache Con NA 2013 - Cassandra Internals

o.a.c.cql3.statements.ParsedStatement

// Subclasses generated by ANTLR// Tracks bound term count// Prepare CQLStatementprepare()

Page 34: Apache Con NA 2013 - Cassandra Internals

o.a.c.cql3.statements.CQLStatement

checkAccess(ClientState state)validate(ClientState state)execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables)

Page 35: Apache Con NA 2013 - Cassandra Internals

o.a.c.cql3.functions.Function

argsType()returnType()execute(List<ByteBuffer> parameters)

Page 36: Apache Con NA 2013 - Cassandra Internals

statements.SelectStatement.RawStatement

// Implements ParsedStatement// Input validationprepare()

Page 37: Apache Con NA 2013 - Cassandra Internals

statements.SelectStatement.execute()

// Create ReadCommandsStorageProxy.read()

Page 38: Apache Con NA 2013 - Cassandra Internals

ArchitectureAPI

DynamoDatabase

Page 39: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 40: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.StorageProxy

// Cluster wide storage operations// Select endpoints & check CL available// Send messages to Stages// Wait for response// Store Hints

Page 41: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.StorageService

// Ring operations// Track ring state// Start & stop ring membership// Node & token queries

Page 42: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.IResponseResolver

preprocess(MessageIn<T> message)resolve() throws DigestMismatchException

RowDigestResolverRowDataResolverRangeSliceResponseResolver

Page 43: Apache Con NA 2013 - Cassandra Internals

Response Handlers / Callback

implements IAsyncCallback<T>

response(MessageIn<T> msg)

Page 44: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.ReadCallback.get()

//Wait for blockfor & datacondition.await(timeout, TimeUnit.MILLISECONDS)

throw ReadTimeoutException()

resolver.resolve()

Page 45: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.StorageProxy.fetchRows()

getLiveSortedEndpoints()new RowDigestResolver()new ReadCallback()MessagingService.sendRR()---------------------------------------ReadCallback.get() # blockingcatch (DigestMismatchException ex)catch (ReadTimeoutException ex)

Page 46: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 47: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessagingService.verb<<enum>>

MUTATIONREADREQUEST_RESPONSETREE_REQUESTTREE_RESPONSE

(And more...)

Page 48: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessagingService.verbHandlers

new EnumMap<Verb, IVerbHandler>(Verb.class)

Page 49: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.IVerbHandler<T>

doVerb(MessageIn<T> message, String id);

Page 50: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessagingService.verbStages

new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class)

Page 51: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessagingService.receive()

runnable = new MessageDeliveryTask( message, id, timestamp);

StageManager.getStage( message.getMessageType());

stage.execute(runnable);

Page 52: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessageDeliveryTask.run()

// If dropable and rpc_timeoutMessagingService.incrementDroppedMessag

es(verb);

MessagingService.getVerbHandler(verb)verbHandler.doVerb(message, id)

Page 53: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 54: Apache Con NA 2013 - Cassandra Internals

o.a.c.dht.IPartitioner<T extends Token>

getToken(ByteBuffer key)getRandomToken()

LocalPartitionerRandomPartitionerMurmur3Partitioner

Page 55: Apache Con NA 2013 - Cassandra Internals

o.a.c.dht.Token<T>

compareTo(Token<T> o)

BytesTokenBigIntegerTokenLongToken

Page 56: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 57: Apache Con NA 2013 - Cassandra Internals

o.a.c.locator.IEndpointSnitch

getRack(InetAddress endpoint)getDatacenter(InetAddress endpoint)sortByProximity(InetAddress address,

List<InetAddress> addresses)

SimpleSnitchPropertyFileSnitchEc2MultiRegionSnitch

Page 58: Apache Con NA 2013 - Cassandra Internals

o.a.c.locator.AbstractReplicationStrategy

getNaturalEndpoints( RingPosition searchPosition)calculateNaturalEndpoints(Token searchToken, TokenMetadata tokenMetadata)

SimpleStrategyNetworkTopologyStrategy

Page 59: Apache Con NA 2013 - Cassandra Internals

o.a.c.locator.TokenMetadata

BiMultiValMap<Token, InetAddress> tokenToEndpointMapBiMultiValMap<Token, InetAddress> bootstrapTokensSet<InetAddress> leavingEndpoints

Page 60: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 61: Apache Con NA 2013 - Cassandra Internals

o.a.c.gms.VersionedValue

// VersionGenerator.getNextVersion()

public final int version;public final String value;

Page 62: Apache Con NA 2013 - Cassandra Internals

o.a.c.gms.ApplicationState<<enum>>

STATUSLOADSCHEMADCRACK

(And more...)

Page 63: Apache Con NA 2013 - Cassandra Internals

o.a.c.gms.HeartBeatState

//VersionGenerator.getNextVersion();

private int generation;private int version;

Page 64: Apache Con NA 2013 - Cassandra Internals

o.a.c.gms.Gossiper.GossipTask.run()

// SYN -> ACK -> ACK2makeRandomGossipDigest()new GossipDigestSyn()

// Use MessagingService.sendOneWay()Gossiper.doGossipToLiveMember()Gossiper.doGossipToUnreachableMember()Gossiper.doGossipToSeed()

Page 65: Apache Con NA 2013 - Cassandra Internals

gms.GossipDigestSynVerbHandler.doVerb()

Gossiper.examineGossiper()new GossipDigestAck()MessagingService.sendOneWay()

Page 66: Apache Con NA 2013 - Cassandra Internals

gms.GossipDigestAck2VerbHandler.doVerb()

Gossiper.notifyFailureDetector()Gossiper.applyStateLocally()

Page 67: Apache Con NA 2013 - Cassandra Internals

ArchitectureAPI Layer

Dynamo LayerDatabase Layer

Page 68: Apache Con NA 2013 - Cassandra Internals

Database Layero.a.c.concurrent

o.a.c.db

o.a.c.cacheo.a.c.io

o.a.c.trace

Page 69: Apache Con NA 2013 - Cassandra Internals

o.a.c.concurrent.StageManager

stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class);

getStage(Stage stage)

Page 70: Apache Con NA 2013 - Cassandra Internals

o.a.c.concurrent.Stage

READMUTATIONGOSSIPREQUEST_RESPONSEANTI_ENTROPY

(And more...)

Page 71: Apache Con NA 2013 - Cassandra Internals

Database Layero.a.c.concurrent

o.a.c.db

o.a.c.cacheo.a.c.io

o.a.c.trace

Page 72: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.Table

// Keyspaceopen(String table)getColumnFamilyStore(String cfName)

getRow(QueryFilter filter)apply(RowMutation mutation, boolean writeCommitLog)

Page 73: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.ColumnFamilyStore

// Column FamilygetColumnFamily(QueryFilter filter)getTopLevelColumns(...)

apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)

Page 74: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.IColumnContainer

addColumn(IColumn column)remove(ByteBuffer columnName)

ColumnFamilySuperColumn

Page 75: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.ISortedColumns

addColumn(IColumn column, Allocator allocator)removeColumn(ByteBuffer name)

ArrayBackedSortedColumnsAtomicSortedColumnsTreeMapBackedSortedColumns

Page 76: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.Memtable

put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)

flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context)

Page 77: Apache Con NA 2013 - Cassandra Internals

Memtable.FlushRunnable.writeSortedContents()

// SSTableWritercreateFlushWriter()

// Iterate through rows & CF’s in orderwriter.append()

Page 78: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.ReadCommand

getRow(Table table)

SliceByNamesReadCommandSliceFromReadCommand

Page 79: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.IDiskAtomFilter

getMemtableColumnIterator(...)getSSTableColumnIterator(...)

IdentityQueryFilterNamesQueryFilterSliceQueryFilter

Page 80: Apache Con NA 2013 - Cassandra Internals

Thanks.

Page 81: Apache Con NA 2013 - Cassandra Internals

Aaron Morton@aaronmorton

www.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License