observer, a "real life" time series application

Observer A real life time-series application

Kévin Lovato - @alprema

Index • Observer introduction • Architecture overview • CQL schema • Feedback

– Schema – Read/Write access

• Numbers

Observer introduction

Key features

• Publish metrics from anywhere

• Track & investigate business issues

• Alert users in case of unusual behavior

• Integrate with the infrastructure features

Architecture overview

Aggregator

Publisher

Send raw metrics

Aggregator

Publisher

Aggregate metrics (sec, min, hour)

WebDashboard

Client

Load metrics data

WebDashboard

Client

Receive live metrics data through bus Push

(WebSocket)

DataCruncher Load and compute all metrics for the day

Write daily computations (avg, percentiles, etc.)

Alertor

Catch up on startup

Receive live metrics data through bus

Send alerts on the bus

CQL schema

Metric_OneSec • Schema:

((MetricId, Day), UtcDate), Value

MetricId + Day

UtcDate UtcDate …

Value Value

Metric_OneSec

• TTL: 8 days

• Max column per row: 86 400 • Average size: 1.4 MB

Metric_OneMin • Schema:

((MetricId, FirstDayOfWeek), UtcDate), Value

MetricId + FirstDayOfWeek

UtcDate UtcDate …

Value Value

Metric_OneMin

• TTL: 60 days

• Max column per row: 10 080 • Average size: 300 KB

Metric_OneHour • Schema:

(MetricId, UtcDate), Value

MetricId UtcDate UtcDate …

Value Value

Metric_OneHour

• TTL: 10 years

• Average size: 45 KB

Daily_Aggregate • Schema:

(MetricId, Date), Average, Count, Percentiles, …

MetricId Date.Average Date.Count …

Daily_Aggregate

• No TTL

• Average size: 23 KB

Feedback - Schema

Row sizing • Avoid having rows spanning over long

periods • Avoid large amounts of data / row (<100

MB is good) • Make buckets using another component

(ex: Day, FirstDayOfWeek, etc.)

TTLs • Don’t use them if you don’t really need them

(extra space wasted) • Make sure to set it right the first time (or you

will need to reinsert your data) • Consider changing gc_grace_period for your

CF (tombstones useless for TTLed time-series)

General best practices • Consider disabling inter-DC read repair on

your CF (read_repair_chance) • Use collection types (map<>, etc.)

Feedback – Read / Write

Obvious but… • Avoid Thrift (can take down your cluster on

huge rows reads) • Do not disable paging (same effect as using

Thrift) • Use prepared statements

Batches • Warning: Not intended for performance • But… • Can improve insert performance under

adequate conditions • Use small (< 5 KB) "Unlogged" batches • Benchmark with your own use case • Don’t tell @PatrickMcFadin you did it

Asynchronous queries • Mandatory if you want to be fast (from

anything over 1 query)

Asynchronous queries

• For massive reads, send your queries by

bunches and wait for them

General best practices • Benchmark all heavy operations in terms of

cluster load (a faster implem might just be killing the cluster for everyone else)

• Watch out for CL: ONE (we experienced slowdowns as the coordinator asked a different DC under heavy load)

Numbers time

• Total number of metrics: 17K

• Metrics inserted: 10K/s

• Data points daily aggregation speed: 500K/s

• DC size: 3 nodes (spinning disks)

Future • Use DTCS (MaybeTWCS? CASSANDRA-

9666 / CASSANDRA-10195) • Move to SSDs everywhere

Interested? We’re hiring

Questions?

Image credits – The Noun Project • Björn Andersson • Creative Stall • Gregor Cresnar • Justin Blake • Lemon Liu • Mark Shorter • Shawn Schmidt • Stéphanie Rusch

observer, a "real life" time series application

Technology

real application clusters guide

a real-time observer for uav's brushless motors

observer - erlang · observer is a graphical tool for...

an integrated observer for real-time estimation of vehicle...

model-free controller with an observer applied in real

real estate insert in longboat observer dec 4 & 11

oracle real application clusters

application of disturbance observer-based control in low...

oracle real application testing

instructions on application for observer/trainee/visiting...

passivity properties of induction motor and its application...

a modi ed sliding mode observer design with application to

luenberger state observer rotor position estimation simulink...

stochastic calculus, application of real analysis in...

application for shadow/observer

real application security administration console …...real...

oracle real application cluster

application in real life

state observer for a class of nonlinear systems and its...

the observer | observer classifieds