riak perf wins

Riak Search Performance Wins

How we got > 100x improvement in query throughput

Gary Flake, Foundergary@clipboard.com

Introduction

Architecture

riak-01

riak-02

riak-03riak-04

riak-05

web-01Node.js + Nginx

cache-01

cache-02

cache-03

redis-01

redis-02

thumb-01 thumb-02 job-02

admin-01

job-01

An awesome noSQL data store:

• Super easy to scale up AND down

• Fault tolerant – no SPoF

• Flexible schema

• Full-text search out of the box

• Can be fixed and improved in Erlang (the Basho folks awesomely take our commits)

Riak – Basics

• Data in Riak is grouped buckets(effectively namespaces)

• Basic operations are:• Get, save, delete, search, map, reduce

• Eventual consistency managed through N, R, and W bucket parameters.

• Everything we put in Riak is JSON

• We talk to Riak through the excellent riak-js node library by Francisco Treacy

Data Model – Clips

annotation

author

domain

mentions

Data Model - Clips

Clips are the gateway to all of our data

key: abc

<html>

</html>

Key: abc

“F1rst”

“Nice clip yo!”

“Saw this on Reddit…”Clip

Comment Cache

Comments on Clip ‘abc’

Other Buckets

• Users

• Blobs

• Comments

• Templates

• Counts

• Search Caches

• Transactions

Riak Search

• Gets many things out of Riak by something other than the primary key.

• You specify a schema (the types for the field within a JSON object).

• Works great but with one big gotcha:

– Index is uses term-based partitioning instead of document-based partitioning

– Implication: joins + sort + pagination sucks

– We know how to work around this

Riak Search – Querying

• Query syntax based on Lucene

• Basic Query

text:funny

• Compound Query

login:greg OR (login:gary AND tags:riak)

• Range Query

ctime:[98685879630026 TO 98686484430026]

Clipboard App Flow

Client node.js RiakGo to clipboard.com/home

Search clips bucket query = login:greg

Top 20 resultsTop 20 results

start rendering

(For each clip)API Request for blob

GET from blobs bucket

Return blob to client

render blob

Clipboard Queries

(Search)

login:greg

mentions:greg

ctime:[98685879630026 TO 98686484430026]

Clipboard Queries cont.

(Search)

login:greg AND tags:riak

login:greg AND text:node AND text:javascript

(Search)

login:greg AND private:false

login:greg AND text:iPhone

Matches 20% of all clips!Matches only my clips

Index Partitioning Schemes

Doc Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)

2. On Each node:

1. Perform x AND y

2. Sort on z

3. Slice [ 0 .. 1000 ]

4. Send to aggregator

3. On aggregator

1. Merge all results (N x 1000)

2. Slice [ 990 .. 1000 ]

Term Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)

2. On x node: search for x (and send all)

3. On y node: search for y (and send all)

4. On aggregator:

1. Do x AND y

2. Sort on z

3. Slice to [ 990 .. 1000 ]

Riak Search Issues

1. For any singular term, all results must be sent back to aggregator.

2. Incorrectly performs sort and slice (does sort then slice)

3. ANDs take time O(MAX(|x|, |y|)) instead of O(MIN(|x|, |y|).

4. All matches must be read to get sort field.

Riak Search Fixes

1. Inline fields for short and common attributes.

2. Dynamic fields for precomputed ANDs.

3. PRESORT option for sorting without document reads.

Inline Fields

Nifty feature added recently to Riak Search

Fields only used to prune result set can be made inline for a big perf win

Normal query applied first – then results filtered quickly with inline “filter” query

High storage cost – only viable for small fields!

(Search)

Riak Search – Inline Fields cont.

login:greg AND private:false

becomes

Query - login:greg Filter Query – private:false

private:false is efficiently applied only to results of login:greg. Hooray!

(Search)

Fixing ANDs

But what about login:greg AND text:iPhone?

text field is too large to inline!

We had to get creative.

(Search)

Dynamic Fields

Our Solution: Create a new field - text_u

(u for user)

Values in text_u have the user’s name appended

In greg’s clip

text:iPhone text_greg:iPhone

In bob’s clip

text:iPhone text_bob:iPhone

(Search)

Presort on Keys

• Our addition to Riak code base.

• Does sort before slice

• If PRESORT=key, then never reads the docs

• Tremendous win (> 100x compared to M/R approaches)

Clip Keys

• Base-64 encode each component

• Only use first 4 characters of user & content

• Only 16 bytes

Collisions? 1 in 17M if clipped the same thing at same time.

Our Query Processing

1. w AND (x AND y)(sort z, start = 990, count = 10)

2. On w_x node: search and send w_x

3. On w_y node: search and send all w_y

4. On aggregator:

1. Do w_x AND w_y

2. Sort on z

3. Slice to [ 990 .. 1000 ]

Summary

• Use inline fields for short and common bits

• Use dynamic fields for prebuilt ANDs

• Use keys that imply sort order

• Use same techniques for pagination

• Out approach yields search throughput that is 100x better than out of the box (and better as you scale outward).

Questions?

We’re hiring!

www.clipboard.com/register

Invitation Code: just4u

www.clipboard.com/jobs

Or talk to us right now!

Thanks!

riak perf wins

text search

riak search issues1

riak search fixes1

riak search inline fields

query login

riak searchfields

riak searchperformance

x node

Technology

riak at shareaholic

introduction to riak - joel jacobson

riak kv 2.0 provides application developers enhanced ... ·...

riak & wooga_geeek2geeek meetup2014 berlin

riak from small to large

comparison between dynamo and riak

riak cs in cloudstack

riak at engine yard cloud

cassandra and riak at bestbuy.com

relational databases to riak

riak intro at munich node.js

node.js + riak

little riak book

riak search 2: yokozuna

perf. mgmt.vs perf. appra

turnkey riak kv cluster

link walking with riak

riak a successful failure

riak - from small to large

breaking a riak cluster