on cassandra development: past, present and future
DESCRIPTION
Slices from my talk at Cassandra EuropeTRANSCRIPT
On Cassandra DevelopmentPast, Present and Future
Sylvain Lebresne
1
©2012 DataStax
About:me
•Sylvain Lebresne
•@pcmanus
•Be sure to check DataStax Entreprise 2.0 !
2
2
©2012 DataStax
Apache Cassandra•Apache Top-Level project since February 2010
•∼200 contributors (15 official committers)
3
3
©2012 DataStax
Releases
•Current tentative release cycle: 4 months (including 1 month freeze)
•Minor releases “when appropriate”4
Version Release date0.5 24 Jan 2010
0.6 13 Apr 2010
0.7 9 Jan 2011
0.8 2 Jun 2011
1.0 18 Oct 2011
1.1 Soon
4
©2012 DataStax
Past: Cassandra 1.0
5
5
©2012 DataStax
Cassandra 1.0•Released October 18, 2011
•Current stable version
•Last minor revision: 1.0.8 (February 27, 2012)
6
6
©2012 DataStax
Cassandra 1.0 features•SSTables Compression (Snappy, Deflate)
•SSTables Checksumming
•Leveled Compaction
•Improved memory management:
• Simpler memtable_total_space_mb and commitlog_total_space_in_mb settings
•Arena allocations for memtables
•Off-heap row cache by default
•Reads optimisations
7
7
©2012 DataStax
Cassandra 1.0 features cont’d•More reliable hinted handoffs
•Faster disk space reclamation
•Single-pass streaming
•Repair improvements (nodetool repair -pr)
•Multi-threaded compactions
•...
8
8
©2012 DataStax
Present: Cassandra 1.1
9
9
©2012 DataStax
Cassandra 1.1•Beta 2 released yesterday (March 27, 2012)
•Final release slated for ... soon
10
10
©2012 DataStax
Global caches•Prior to 1.1, one key cache and one row cache per
column family
•Now: one global key cache and one global row cache
•Motivations:
• Simpler configuration ({key,row}_cache_size_in_mb)
• Better use of the LRU list
•Per-CF option reduced to: • caching=ALL|KEYS_ONLY|ROWS_ONLY|NONE
11
11
©2012 DataStax
Row level isolation•Batched writes are atomic (for a row) since day 1
•Batched writes within a row are now isolated
•When doing
12
UPDATE UsersSET login=‘tom’ AND password=‘1234’WHERE id=‘550e8400-e29b-41d4-a716-446655440’
UPDATE UsersSET login=‘t0m’ AND password=‘abcd’WHERE id=‘550e8400-e29b-41d4-a716-446655440’
followed by
⇒ guarantees that no reader can see (tom, abcd) or (t0m, 1234)
12
©2012 DataStax
CQL 3.0•Motivation:
• Better wide row syntax
•Native syntax for composite types
•Not backward compatible with CQL 2.0
•Only beta, final slated for Cassandra 1.2
13
CREATE TABLE timeline ( user_id uuid, posted_at timestamp, posted_by uuid, content text, device int PRIMARY KEY (user_id, posted_at))
SELECT * FROM timelineWHERE userid=<some_user>AND posted_at > <some_date>
13
©2012 DataStax
Fine-grained storage control•Old:
• /var/lib/cassandra/ks/cf-hc-1-Data.db
•New:
• /var/lib/cassandra/ks/cf/ks-cf-hc-1-Data.db
•Allow to put different CFs on different device
14
14
©2012 DataStax
Concurrent Schema changes•Fixes http://wiki.apache.org/cassandra/
FAQ#schema_disagreement
•Reuse Cassandra Data model to store the schema (simpler integration of ‘describe schema’ for CQL
•Speeds up new nodes addition (the schema size is proportional to the number of keyspaces and column families, not the number of schema operations anymore)
15
15
©2012 DataStax
Hadoop improvements•Secondary index support
•Wide row support
•New (faster) BulkOutputFormat (compatible with old ColumnFamilyOutputFormat)• job.setOutputFormatClass(BulkOutputFormat.class)
16
16
©2012 DataStax
Other features•Off-heap cache on Windows (no more JNA)
•Write survey mode
•Commit log segment pre-allocation/recycling
•Abortable compactions
•Multi-threaded streaming
•...
17
17
©2012 DataStax
Future
18
18
©2012 DataStax
What’s maybe next (subjectif!)
19
•Wide rows speed improvements (#2319)
•Smarter compaction of expired tombstone (#3442)
•CQL3 improvements/finalization (custom protocol)
•Row cache for wide rows (#1956)
•Remove super columns internally (#3237)
•Query tracing (#1123)
•Big cluster improvements
•...
19
Questions?
20