c* summit 2013: time is money jake luciani and carl yeksigian

31
Time is Money Financial Time Series Jake Luciani and Carl Yeksigian BlueMountain Capital

Upload: planet-cassandra

Post on 10-May-2015

1.033 views

Category:

Technology


1 download

DESCRIPTION

This session will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.

TRANSCRIPT

Page 1: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Time is Money

Financial Time Series Jake Luciani and Carl Yeksigian

BlueMountain Capital

Page 2: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

About this talk Part 1: Our use case and architecture Part 2: Our deployment and tuning Part 3: Q&A

Page 3: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your problem. 1000s of consumers ..creating and reading data as fast as possible ..consistent to all readers ..and handle ad-hoc user queries ..quickly ..across data centers.

Page 4: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your data.

AAPL price

MSFT price

Page 5: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your queries.

Time Series Query

Start, End, Periodicity defines query

1 minute periods

Page 6: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your queries.

Cross Section Query

As Of time defines the query

As Of Time (11am)

Page 7: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your queries. Cross sections are random Storing for all possible Cross Sections is not possible. We also support bi-temporality

Page 8: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Let's optimize for Time Series.

Page 9: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

CREATE TABLE tsdata ( id blob, property string, asof_ticks bigint, knowledge_ticks bigint, value blob, PRIMARY KEY(id,property,asof_ticks,knowledge_ticks)

) WITH COMPACT STORAGE AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)

Data Model (CQL 3)

Page 10: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

SELECT * FROM tsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks >= 1234567890 AND asof_ticks <= 2345678901

CQL3 Queries: Time Series

Page 11: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

CQL3 Queries: Cross Section SELECT * FROM tsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks = 1234567890 AND knowledge_ticks < 2345678901 LIMIT 1

Page 12: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

A Service, not an app

C*

Olympus

Olym

pus

Olympus

Oly

mpu

s

App

App

App

App

App

App

App

App

App

App

Fat Client

Olympus Thrift Service Olympus Thrift Service

Page 13: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Complex Value Types Not every value is a double Some values belong together (Bid and Ask should always come back together) Thrift structures as values Typed, extensible schema Union types give us a way to deserialize any type

Page 14: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Ad-hoc querying UI

Page 15: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

But that's the easy part...

(queue transition)

Page 16: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Scaling... The first rule of scaling is you do not just turn everything to 11.

Page 17: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Scaling... Step 1 - Fast Machines for your workload Step 2 - Avoid Java GC for your workload Step 3 - Tune Cassandra for your workload Step 4 - Prefetch and cache for your workload

Page 18: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Can't fix what you can't measure Riemann (http://riemann.io) Easily push application and system metrics into a single system We push 6k metrics per second to a single Riemann instance

Page 19: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Metrics: Riemann Yammer Metrics with Riemann

https://gist.github.com/carlyeks/5199090

Page 20: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Metrics: Riemann Push stream based metrics library Riemann Dash for Why is it Slow? Graphite for Why was it Slow?

Page 21: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

VisualVM: The greatest tool EVER Many useful plugins... Just start jstatd on each server and go!

Page 22: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Scaling Reads: Machines SSDs for hot data JBOD config As many cores as possible (> 16) 10GbE network Bonded network cards Jumbo frames

Page 23: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

JBOD is a lifesaver SSDs are great until they aren't anymore JBOD allowed passive recovery in the face of simultaneous disk failures (SSDs had a bad firmware)

Page 24: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Scaling Reads: Cassandra Changes we've made: • Configuration • Compaction • Compression

Page 25: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Leveled Compaction Wide rows means data can be spread across a huge number of SSTables Leveled Compaction puts a bound on the worst case (*) Fewer SSTables to read means lower latency, as shown below; orange SSTables get read

L0

L1

L2

L3

L4

L5

* In Theory

Page 26: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Leveled Compaction: Breaking Bad Under high write load, forced to read all of the L0 files

L0

L1

L2

L3

L4

L5

Page 27: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Hybrid Compaction: Breaking Better Size Tiering Level 0 On by default in 2.0

L0

L1

L2

L3

L4

L5

{ Hybrid

Compaction

Size Tiered

Leveled

Page 28: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Overlapping Compaction Instead of forcing a combination of L0 files with L1, we can just push up files This allows a higher level of concurrency in compactions We still know the SSTables that might contain the keys We can force a proper compaction at any configurable level

L0

L1

L2

L3

L4

L5

Page 29: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

C optimized library Read path needs to be fast for our workload CRC check, composite comparison eat a lot of cycles CRC is implemented on chip for some architectures (why not use it?) We want to move some of the operations into a JNI library to reduce latency and improve throughput

Page 30: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Current Stats 16 nodes 2 Data Centers Replication Factor 6 200k Writes/sec at EACH_QUORUM 150k Reads/sec at LOCAL_QUORUM > 30 Million time series > 15 Billion points 10 TB on disk (compressed) Read Latency 50%/95% is 1ms/5ms

Page 31: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Questions? Thank you! @tjake and @carlyeks