using riak for events storage and analysis at...

119
Using Riak for Events storage and analysis at Booking.com Damien Krotkine

Upload: others

Post on 25-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Using Riak for Events storage and analysis at Booking.com

Damien Krotkine

Page 2: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Damien Krotkine

• Software Engineer at Booking.com

• github.com/dams

• @damsieboy

• dkrotkine

Page 3: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 4: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

KEY FIGURES

• 600,000 hotels

• 212 countries

• 800,000 room nights every 24 hours

• guest reviews 43 million+

• offices worldwide 155+

• 8,600 people

• not a small website…

Page 5: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

INTRODUCTION

Page 6: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 7: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 8: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 9: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 10: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

www APImobi

Page 11: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

www APImobi

front

end

Page 12: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

www APIfro

nten

dba

cken

dmobi

events storage

events: info about subsystems status

Page 13: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

back

end

web mobi api

databases

caches

load balancersavailability

cluster

email

etc…

Page 14: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WHAT IS AN EVENT ?

Page 15: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EVENT STRUCTURE

• Provides info about subsystems

• Data

• Deep HashMap

• Timestamp

• Type + Subtype

• The rest: specific data

• Schema-less

Page 16: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

{ timestamp => 12345, type => 'WEB', subtype => 'app', dc => 1, action => { is_normal_user => 1, pageview_id => '188a362744c301c2', # ... }, tuning => { the_request => 'GET /display/...' bytes_body => 35, wallclock => 111, nr_warnings => 0, # ... }, # ... }

Page 17: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

{ type => 'FAV', subtype => 'fav', timestamp => 1401262979, dc => 1, tuning => { flatav => { cluster => '205', sum_latencies => 21, role => 'fav', num_queries => 7 } } }

Page 18: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EVENTS FLOW PROPERTIES

• Read-only

• Schema-less

• Continuous, sequential, timed

• 15 K events per sec

• 1.25 Billion events per day

• peak at 70 MB/s, min 25MB/s

• 100 GB per hour

Page 19: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

Page 20: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

ASSESS THE NEEDS

• Before thinking about storage

• Think about the usage

Page 21: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

1. GRAPHS 2. DECISION MAKING 3. SHORT TERM ANALYSIS 4. A/B TESTING

Page 22: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

GRAPHS

• Graph in real-time ( few seconds lag )

• Graph as many systems as possible

• General platform health check

Page 23: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

GRAPHS

Page 24: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

GRAPHS

Page 25: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DASHBOARDS

Page 26: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

META GRAPHS

Page 27: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

1. GRAPHS 2. DECISION MAKING 3. SHORT TERM ANALYSIS 4. A/B TESTING

Page 28: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DECISION MAKING

• Strategic decision ( use facts )

• Long term or short term

• Technical / Non technical Reporting

Page 29: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

1. GRAPHS 2. DECISION MAKING 3. SHORT TERM ANALYSIS 4. A/B TESTING

Page 30: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

SHORT TERM ANALYSIS

• From 10 sec ago -> 8 days ago

• Code deployment checks and rollback

• Anomaly Detector

Page 31: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

1. GRAPHS 2. DECISION MAKING 3. SHORT TERM ANALYSIS 4. A/B TESTING

Page 32: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

A/B TESTING

• Our core philosophy: use facts

• It means: do A/B testing

• Concept of Experiments

• Events provide data to compare

Page 33: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EVENT AGGREGATION

Page 34: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EVENT AGGREGATION

• Group events

• Granularity we need: second

Page 35: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

SERIALIZATION

• JSON didn’t work for us (slow, big, lack features)

• Created Sereal in 2012

• « Sereal, a new, binary data serialization format that provides high-performance, schema-less serialization »

• https://github.com/Sereal/Sereal

Page 36: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

eventeventevents storage

eventevent

even

t

eventevent

eventevent

event

even

t

event event

Page 37: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

e ee e

ee

e e

e ee

e eee

ee

e

LOGGER

e

e

Page 38: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web apieeeeee

eee

e ee e

ee

e e

e ee

e eee

ee

ee

e

Page 39: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbseeeeeeee

eeeeeeeee

eeeeeee

e ee e

ee

e e

e ee

e eee

ee

ee

e

Page 40: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbseeeeeeee

eeeeeeeee

eeeeeee

e ee e

ee

e e

e ee

e eee

ee

e

1 sec

e

e

Page 41: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbseeeeeeee

eeeeeeeee

eeeeeee

e ee e

ee

e e

e ee

e eee

ee

e

1 sec

e

e

Page 42: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbseeeee

eeeee

eeeee

1 sec

events storage

Page 43: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbs

ee

eeeee

eeeee

1 sec

events storage

ee e reserialize + compress

Page 44: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

events storage

LOGGER …LOGGER LOGGER

Page 45: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

STORAGE

Page 46: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WHAT WE WANT

• Storage security

• Mass write performance

• Mass read performance

• Easy administration

• Very scalable

Page 47: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WE CHOSE RIAK

• Security: cluster, distributed, very robust

• Good and predictable read / write performance

• The easiest to setup and administrate

• Advanced features (MapReduce, triggers, 2i, CRDTs …)

• Riak Search

• Multi Datacenter Replication

Page 48: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 49: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CLUSTER

• Commodity hardware • All nodes serve data • Data replication

• Gossip between nodes • No master • distributed system

Ring of servers

Page 50: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 51: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

hash(key)

Page 52: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

KEY VALUE STORE

• Namespaces: bucket

• Values: opaque or CRDTs

Page 53: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RIAK: ADVANCED FEATURES

• MapReduce

• Secondary indexes (2i)

• Riak Search

• Multi DataCenter Replication

Page 54: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

MULTI-BACKEND FOR STORAGE

• Bitcask

• Eleveldb

• Memory

Page 55: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

BACKEND: BITCASK

• Log-based storage backend

• Append-only files (AOF files)

• Advanced expiration

• Predictable performance (1 disk-seek max)

• Perfect for sequential data

Page 56: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CLUSTER CONFIGURATION

Page 57: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISK SPACE NEEDED

• 8 days

• 100 GB per hour

• Replication 3

• 100 * 24 * 8 * 3

• Need 60 T

Page 58: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

HARDWARE

• 12 then 16 nodes (soon 24)

• 12 CPU cores ( Xeon 2.5Ghz)

• 192 GB RAM

• network 1 Gbit/s

• 8 TB (raid 6)

• Cluster total space: 128 TB

Page 59: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DATA DESIGN

Page 60: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbs

ee

eeeee

eeeee

1 sec

events storage

1 blob per EPOCH / DC / CELL / TYPE / SUBTYPE 500 KB max chunks

Page 61: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DATA

• Bucket name: “data“

• Key: “12345:1:cell0:WEB:app:chunk0“

• Value: List of events (Hashmaps), serialized & compressed

• 200 keys per seconds

Page 62: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

METADATA

• Bucket name: “metadata“

• Key: <epoch>-<dc> “1428415043-2“

• Value: list of data keys:[ “1428415043:1:cell0:WEB:app:chunk0“, “1428415043:1:cell0:WEB:app:chunk1“ … “1428415043:4:cell0:EMK::chunk3“ ]

• As pipe separated value (PSV)

Page 63: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WRITE DATA

Page 64: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

PUSH DATA IN

• In each DC, in each cell, Loggers push to Riak

• Use ProtoBuf

• Every seconds:

• Push data values to Riak, async

• Wait for success

• Push metadata

Page 65: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

JAVA

Bucket DataBucket = riakClient.fetchBucket("data").execute(); DataBucket.store("12345:1:cell0:WEB:app:chunk0", Data1).execute(); DataBucket.store("12345:1:cell0:WEB:app:chunk1", Data2).execute(); DataBucket.store("12345:1:cell0:WEB:app:chunk2", Data3).execute();

Bucket MetaDataBucket = riakClient.fetchBucket("metadata").execute(); MetaDataBucket.store("12345-1", metaData).execute(); riakClient.shutdown();

Page 66: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Perl

my $client = Riak::Client->new(…);

$client->put(data => '12345:1:cell0:WEB:app:chunk0', $data1); $client->put(data => '12345:1:cell0:WEB:app:chunk1', $data2); $client->put(data => '12345:1:cell0:WEB:app:chunk2', $data3);

$client->put(metadata => '12345-1', $metadata, 'text/plain' );

Page 67: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

READ DATA

Page 68: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

READ ONE SECOND

• For one second (a given epoch)

• Request metadata for <epoch>-DC

• Parse value

• Filter out unwanted types / subtypes

• Fetch the keys from the “data” bucket

Page 69: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Perl

my $client = Riak::Client->new(…); my @array = split '\|', $client->get(metadata => '1428415043-1'); @filtered_array = grep { /WEB/ } @array; $client->get(data => $_) foreach @filtered_array;

Page 70: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

READ ONE SECOND

• For an interval epoch1 -> epoch2

• Generate the list of epochs

• Fetch in parallel

Page 71: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RIAK CLUSTER

Page 72: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CPU

Page 73: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISK IOPS

Page 74: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISK IO %

Page 75: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISK SPACE RECLAIMED

one day

Page 76: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

REAL TIME PROCESSING OUTSIDE OF RIAK

Page 77: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

STREAMING

• Fetch 1 second every second

• Or a range ( ex: last 10 min )

• Fetch all epochs from Riak

• Use data on the client side

Page 78: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EXAMPLES

• Streaming => Graphite ( every sec )

• Streaming => Anomaly Detector ( last 2 min )

• Streaming => Experiment analysis ( last day )

• Every minute => Hadoop

• Manual request => test, debug, investigate

• Batch fetch => ad hoc analysis

• => Huge numbers of read requests

Page 79: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

events storage

graphite cluster

Anomaly detector

experimentcluster

hadoop cluster

mysql analysis

manual requests

50 MB/s

50 MB/s

50 M

B/s 50 MB/s

50 MB/s50 MB/s

Page 80: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THIS IS REALTIME

• 1 second of data

• Stored in < 1 sec

• Available after < 1 sec

• Issue : network saturation

Page 81: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

REAL TIME PROCESSING INSIDE RIAK

Page 82: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE IDEA

• Instead of

• Fetch data,

• Crunch data (ex: average),

• Produce a small result

• Do

• Bring code to data

• Crunch data on Riak

• Fetch the result

Page 83: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WHAT TAKES TIME

• Takes a lot of time

• Fetching data out: network issue

• Decompressing: CPU time issue

• Takes almost no time

• Crunching data

Page 84: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

MAPREDUCE

• Input: epoch-dc

• Map1: metadata keys => data keys

• Map2: data crunching

• Reduce: aggregate

• Realtime: OK

• network usage: OK

• CPU time: NOT OK

Page 85: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

HOOKS

• Every time metadata is written

• Post-Commit hook triggered

• Crunch data on the nodes

Page 86: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 87: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Riak post-commit hook

REST serviceRIAK service

key keysocket

result sent for storage

decompressprocess all tasks

NODE HOST

Page 88: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

HOOK CODE

metadata_stored_hook(RiakObject) -> Key = riak_object:key(RiakObject), Bucket = riak_object:bucket(RiakObject), [ Epoch, DC ] = binary:split(Key, <<"-">>), MetaData = riak_object:get_value(RiakObject), DataKeys = binary:split(MetaData, <<"|">>, [ global ]), send_to_REST(Epoch, Hostname, DataKeys), ok.

Page 89: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

send_to_REST(Epoch, Hostname, DataKeys) -> Method = post, URL = "http://" ++ binary_to_list(Hostname) ++ ":5000?epoch=" ++ binary_to_list(Epoch), HTTPOptions = [ { timeout, 4000 } ], Options = [ { body_format, string }, { sync, false }, { receiver, fun(ReplyInfo) -> ok end } ], Body = iolist_to_binary(mochijson2:encode( DataKeys )), httpc:request(Method, {URL, [], "application/json", Body}, HTTPOptions, Options), ok.

Page 90: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

REST SERVICE

• In Perl, using PSGI (WSGI-like), Starman, preforks

• Allow to write data cruncher in Perl

• Also supports loading code on demand

Page 91: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

ADVANTAGES

• CPU usage and execution time can be capped

• Data is local to processing

• Two systems are decoupled

• REST service written in any language

• Data processing done all at once

• Data is decompressed only once

Page 92: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISADVANTAGES

• Only for incoming data (streaming), not old data

• Can’t easily use cross-second data

• What if the companion service goes down ?

Page 93: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

FUTURE

• Use this companion to generate optional small values

• Use Riak Search to index and search those

Page 94: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE BANDWIDTH PROBLEM

Page 95: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• PUT - bad case

• n_val = 3

• inside usage = 3 x outside usage

Page 96: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• PUT - good case

• n_val = 3

• inside usage = 2 x outside usage

Page 97: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• GET - bad case

• inside usage = 3 x outside usage

Page 98: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• GET - good case

• inside usage = 2 x outside usage

Page 99: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• network usage ( PUT and GET ): • 3 x 13/16+ 2 x 3/16= 2.81 • plus gossip • inside network > 3 x outside network

Page 100: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• Usually it’s not a problem • But in our case: • big values, constant PUTs, lots of GETs • sadly, only 1 Gbit/s

• => network bandwidth issue

Page 101: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE BANDWIDTH SOLUTIONS

Page 102: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE BANDWIDTH SOLUTIONS

1. Optimize GET for network usage, not speed 2. Don’t choose a node at random

Page 103: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• GET - bad case

• n_val = 1

• inside usage = 1 x outside

Page 104: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• GET - good case

• n_val = 1

• inside usage = 0 x outside

Page 105: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WARNING

• Possible only because data is read-only

• Data has internal checksum

• No conflict possible

• Corruption detected

Page 106: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RESULT

• practical network usage reduced by 2 !

Page 107: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE BANDWIDTH SOLUTIONS

1. Optimize GET for network usage, not speed 2. Don’t choose a node at random

Page 108: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• bucket = “metadata” • key = “12345”

Page 109: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• bucket = “metadata”

• key = “12345”

Hash = hashFunction(bucket + key)

RingStatus = getRingStatus

PrimaryNodes = Fun(Hash, RingStatus)

Page 110: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

hashFunction()

getRingStatus()

Page 111: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

hashFunction()

getRingStatus()

Page 112: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE IDEA

• Do the hashing on the client • By default:

• “chash_keyfun”:{"mod":"riak_core_util", "fun":"chash_std_keyfun"},

• look at the source on github • riak_core/src/chash.erl

Page 113: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE HASH FUNCTION

• Easy to re-implement client-side

• sha1 as a BigInt

• example: sha1(bucket + key) = 2450

Page 114: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RING STATUS

• If hash = 2450 => nodes 16, 17, 18

$ curl -s -XGET http://$host:8098/riak_preflists/myring{ “0" :"[email protected]" “5708990770823839524233143877797980545530986496” :"[email protected]" “11417981541647679048466287755595961091061972992”:"[email protected]" “17126972312471518572699431633393941636592959488":“[email protected]” …etc…}

Page 115: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WARNING

• Possible only if • Nodes list is monitored • In case of failed node, default to random • Data is requested in an uniform way

Page 116: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RESULT

• Network usage even more reduced ! • Especially for GETs

Page 117: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CONCLUSION

Page 118: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CONCLUSION

• We used only Riak Open Source

• No training, self-taught, small team

• Riak is a great solution

• Robust, fast, scalable, easy

• Very flexible and hackable

• Helps us continue scaling

Page 119: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Q&A@damsieboy