timeseries data in riak - riak meetup stockholm 1/11/2012

Post on 08-May-2015

1.510 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Metrics with RiakA retrospective

MartinTörnwall

Many definitions, but here's ours...

Metrics?

So we can visualize it and search for patterns

Recording things that change over time

CPU, network, memory and disk usage, ...

OS

Number of requests, errors, events, ...

Application

Text messages or emails sent, customer service calls, ...

External events

● A named variable: "sys.mem.free"● With tags: "host=sl075", "code=403", ...

avg("sys.mem.free") from 1 hour ago where host="sl075"

What is a Metric?

Going Technical

Why not have distributed metrics?

We have distributed services

Solutions exist, but rely on technology stacks we had no experience of (e.g., HBASE)

Reinventing the wheel?

Just how hard can it be?

I mean, really...

Just how hard can it be?

I mean, really...

Our weekend hack glorious metrics storage and processing software

Introducing Metyr

Design Decisions

● Use familiar tools: Erlang, Riak, HTTP● Not a critical service but ...● ... Avoid SPOF● Write performance >> read performance● Centralized reference clock● Integer only● Avoid 2i if possible● When in doubt, leave it to Riak

In Theory...

Metyr Metyr Metyr

Riak cluster

Client Client Client

No SQL, no schemas, no indices (?), no aggregate operations

Storing metrics in Riak

The naïve way just never works...

Attempt 1

A bucket per metric; index by Epoch time

Make each sample an object

Atomicity, write-once, fast range queries

The Good™

Slow, large overhead, requires 2i

The Bad

Combine samples into chunks by time

Attempt 2

Key Points

● One bucket per metric as before● Split into hour-sized chunks

(configurable)● Chunk key: Epoch time● Chunk value: List of samples● To read: Fetch chunks within interval● To write: Fetch chunk, add sample, write

back

Chunk Anatomy

Time0 Value0

64 bits 64 bits

Tags0...

One sample

TimeN ValueN TagsN......

Writing just got harderSlower since we must fetch a chunk first;

potential race conditions, ...

Tests showed that the solution described so far was inadequate

(Arbitrary) Goal:Write 1K samples/sec

Keep per-metric write buffers, flushed every 10 seconds or so

Buffer them writes

● Race condition on write● Storage requirements● Downsampling of old data

Some Remaining Issues

Thank you!

top related