cassandra codebase 2011
DESCRIPTION
TRANSCRIPT
Codebase 2011
Getting to know the codebase
Gary Dusbabek@gdusbabek
Questions?
Outline
• How to contribute• Internals• Some thoughts
How to Contribute
How to Contribute
• http://wiki.apache.org/cassandra/HowToContribute
• JIRA: “lhf” label (Low hanging fruit)• Scratch your itch
How to Contribute
• Run the tests• ant test• nosetests test/system/test_thrift_server.py
How to Contribute
• http://wiki.apache.org/cassandra/CodeStyle
• Avoid:– Reformatting white space– Renaming things everywhere– Unrelated changes
How to Contribute
• Use git• Attach patches– git format-patch as jira attachments.– Group them sensibly
How to Contribute
• Someone will review your code– Usually a committer– Persistence helps
• Don’t get your feelings hurt– It usually takes a few rounds
How to Contribute
• Participate!–#cassandra-dev on freenode– [email protected]
Internals
Services
• Ring Operations (StorageService)• Storage Operations (StorageProxy)
Startup Sequence
• bin/cassandra– Finds cassandra.in.sh• $CLASSPATH (mandatory)• $CASSANDRA_HOME• $CASSANDRA_CONF (mandatory)
– Executes $CASSANDRA_CONF/cassandra-env.sh• Sets heap sizes (gc tuning goes here!)
o.a.c.thrift.CassandraDaemon
AbstractCassandraDaemon
• ACD.setup():– Reads configuration: DatabaseDescriptor
– Loads schema: DD.loadSchemas()– Scrub directories– Initialize storage (keyspaces + CFs)– Commit log recovery: CL.recover()– StorageService.initServer() -> StorageService.joinTokenRing()
Attn Tinkerers!
• Abstracted initialization of transport.• Handy if you’re experimenting with
transports/RPC• Just extend AbstractCassandraDaemon and make sure that class is started up via bin/cassandra.
o.a.c.thrift.CassandraServer
• Implements thrift interface methods (the API).
• Start here when trying to understand the read/write path and RPC.
Configuration
• DatabaseDescriptor– Side-effect of ACD.setup()
• Reads config settings from yaml• Defines system tables• Changes regularly
• I hate this code. Please fix it.
Main Singletons
• StorageService• StorageProxy• MessagingService• CompactionManager• StageManager• MigrationManager
Did you just say ‘Singletons?’
Main Singletons
• StorageService• StorageProxy• MessagingService• CompactionManager• StageManager• MigrationManager
JMX MBeans
• Tooling supplied by Mbeans• Anything that does
measureable/configurable work is tooled– Thread pools– Compaction– Hinted handoff– Streaming– Storage– Commit log
StorageService
• initServer() -> joinTokenRing()– Starts gossip– Starts MessagingService– Negotiates bootstrap
• Many ring operations live here.• Repository of ring topology– TokenMetadata (quasi-singleton via
SS.tokenMetadata_)– Partitioner instance is also here
MessagingService
• Verb handlers live here (initialized from SS).– Main event handlers, haven’t changed much.
• Socket listener– 2 threads per ring node
• Message gateway– emitted from MessageProducer impls– MS.sendRR()– MS.sendOneWay()– MS.receive()
• Messages are versioned now (0.8)– IncomingTCPConnection
StorageProxy
• Top level of all read/write operations– Called from o.a.c.thrift.CassandraServer–Write path changed because of counters• Notion of WritePerformer
• Eventually to Table and ColumnFamilyStore
• Further, to SSTable and related classes.
StageManager
• Fancy java ThreadPoolExecutor• SEDA:
http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf
• consumes callables from a queue.• Manages concurrency.• Hasn’t changed much.
Adding API Methods
• Define method+structures in IDL– interface/cassandra.thrift
• Regenerate files– ant gen-thrift-java gen-thrift-py
• Implement stubs:– o.a.c.thrift.CassandraServer
• Create a system test– tests/system/test_thrift_server.py
Reading
• Socket->CassandraServer– Permissions– Request validation–Marshalling
• ReadCommands created in CS.multigetSliceInternal, passed to StorageProxy– 1 per key
Reading
• StorageProxy.read(), fetchRows()
• For each ReadCommand– Determine endpoints– Local & remote
branches
Reading
• StorageProxy local• READ stage executes a LocalReadRunnable– True read vs digest– Table, ColumnFamilyStore– CFS.getTopLevelColumns
• Make QueryFilter• Query Memtables• Query SSTables• Coalesce in iterators
Reading
• StorageProxy remote– read command– Response handler– Send to remote nodes
• Read repair happens in SP.fetchRows().
Writing
• CS.doInsert()– Marshalling, creates RMs
• StorageProxy – local/remote branch– SP.sendToHintedEndpoints()
• RowMutation– one Key per (several
CFs)
• ColumnFamily– Collection of column
modifications
Writing
• RM.apply->Table.apply–Write to CL– Iterate over RM CFs– CFS.apply()• Overwrites results
on pre-existing column families
Writing
• RM is serialized into a Message and sent to other nodes–Waits for ACKs
depending on CL
Challenges
• To have an in-depth understanding of everything.–Hard for hobbyist/part-timers–Outside of Datastax, little support
for full-timers–Still changing fast• Keeping up
Challenge: Lines of Code
• 0.4 (Sep 2009)– 52 kloc
• 0.5 (Jan 2010)– 59 kloc
• 0.6 (Apr 2010)– 73 kloc
• 0.7 (Jan 2011)– 122 kloc
• 0.8 (Jun 2011)– 146 kloc
• Trunk (yesterday)– 149 kloc
Average:4,500 lines per month
Challenges
• Codewise Growing pains– Software maturity– Decisions made early on