migrating to riak at shareaholic

Riak @

Robby Grossmanrobby@shareaholic.com

@freerobby

Agenda

Shareaholic: Product & Tech

Why Riak: The Search for a Big Data Store

Transitioning to Riak

Riak Use Cases

Deploying to EC2

What’s ?

Browser Tools

Sharing Buttons

Recommendations

Social Analytics

Monthly @

Thousands of developers hitting API

Hundreds of thousands of publishers

Tens of millions of shares & clicks

Hundreds of millions of pageviews & events

Tech @

JRuby on Rails (via Torquebox)

MySQL (Master, Read Slave)

Elastic MapReduce (similar to Hadoop)

Formerly Mongo, Now Riak

Why Not Mongo?

Working set needs to fit in memory

Global write lock blocks all queriesdespite not having transactions/joins

Standbys not “hot”

Why Riak?

Next @Options:

Cassandra

Goals:

Linear scalability

Full-text search

Flexible indexing

Easier Devops

HBasePros

Battle tested

High performance

Complex Architecture

Requires Hive for Indexing/Querying

Expensive to deployat small scale

CassandraPros

Native secondary indices

Linear scalability

Tunable CAP

Known users all domain experts

Search requires Lucene

Heavy Weight MapReduce

RiakPros

Operationally simpler

Linear scalability

Integrated search

Secondary indices

Tunable CAP

Vector clocks solve time-sync problems

Multi-data center replication requires Enterprise product

leveldb puts high strain on CPU

From Mongo to Riak

Migration Goals

No time where database goes “offline”

Product parity throughout migration

Migration Process

1. App writes to Mongo and Riak

2. Verify data integrity

3. Import historical data

4. App reads from Riak

5. Decommission Mongo

Use Cases

Share API

Save shared content

Uses MapReduce topopulate user dashboard

Recommendations

Sets of related pages

Generated on-demand

Publisher Analytics

Generated nightly via Hadoop

Typical stored “document” (JSON)

80kb-1Mb

Riak Successes

MapReduce

Handy for querying

Runs at “web page speed”.

Easy to re-reduce for complex queries

Easy to test via CURL

Replication: primary/secondary authority

Read failure tolerance: speed/consistency

Write failure tolerance

Tunable CAP @

Full Text Search

Built on Lucene

Make user content searchable

Make arbitrary keys queryable

“Just turn it on”

Hiccup: corrupt merge indexes

Query Example

curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '{ "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ]}'

Who’s our oldest user who’s shared something in the last minute?

[[2197]]

Riak on EC2

In a Nutshell

EC2 specs poorly proportioned for leveldb

Multiple AZs in one location works well

Scale vertically for better latency & consistency

Scale horizontally for more throughput/$

Benchmarks

Top Graph: c1.medium (1.7G, 5 CPU)

Middle: m1.large (7.5G, 4 CPU)

Bottom: cc1.4xlarge (23G, 33.5 CPU)

Throughput

Latency (Typical)

Latency (Worst Case)

Calculationsc1.medium (1.7G, 5 CPU)1758 IOPS/$-hrWorst 1% of queries: 300ms/800ms

m1.large (7.5G, 4 CPU)1167 IOPS/$-hrWorst 1% of queries: 110ms/200ms

cc1.4xlarge (23G, 33.5 CPU)872 IOPS/$-hrWorst 1% of queries: 47ms/139ms

Benchmark Takeaways

You can’t go “by spec”

IO is limiting factor

RAM never limiting factor for 1%of keyspace to be in memory

Fin. Questions?Thanks:

Tom Santero

Justin Sheehy

Ryan Zezeski

Reid Draper

#freenode riak crew

We’re Hiring!

Robby Grossman

robby@shareaholic.com

@freerobby

migrating to riak at shareaholic

return riak

riak successes

freenode riak crew

product techwhy riak

hadoopredisformerly

hundreds of millions

oldest user whos

data integrity3

Technology

riak at posterous

cuttleﬁsh - joe devivo · riak git:(develop) ./bin/riak...

introduction to riak

riak at shareaholic

masterless distributed applications with riak...

link walking with riak

riak - from small to large - strangeloop

riak intro

riak perf wins

riak at ideeli

riak cs in cloudstack

riak search 2: yokozuna

nosql cgn: riak (01/2012)

little riak book

riak intro at munich node.js

riak intro to

data modeling for scale with riak data types · data...

intro to riak

getting started with riak ts€¦ · riak ts is built on...

hugfr spark & riak -20160114_hug_france