Transcript
Page 1: Introduction to Riak - Joel Jacobson

basho

Core Concepts

Introduction to Riak

bashoAKQA

24th July 2013

Friday, 26 July 13

Page 2: Introduction to Riak - Joel Jacobson

WHO AM I?

Joel Jacobson

Technical Evangelist

Basho Technologies

@joeljacobson

Friday, 26 July 13

Page 3: Introduction to Riak - Joel Jacobson

Distributed computing is

HARD.

Friday, 26 July 13

Page 4: Introduction to Riak - Joel Jacobson

PROBLEMS?

• Concurrency and latency at scale

•Data consistency

• Uptime/failover

•Multi Tenancy

• SLA’s

Friday, 26 July 13

Page 5: Introduction to Riak - Joel Jacobson

WHAT IS RIAK?

• Key-Value store + extras

•Distributed and horizontally scalable

• Fault-tolerant

• Highly available

• Built for the web

Friday, 26 July 13

Page 6: Introduction to Riak - Joel Jacobson

INSPIRED BY AMAZON DYNAMO

•White paper released to describe a database system to be used for their shopping cart

•Masterless, peer-coordinated replication

•Dynamo inspired data-stores; Riak, Cassandra, Voldemort etc.

• Consistent hashing - no sharding :-)

• Eventually consistent

Friday, 26 July 13

Page 7: Introduction to Riak - Joel Jacobson

RIAK KEY-VALUE STORE

• Simple operations - GET, PUT, DELETE

• Value is opaque, with metadata

• Extras, e.g.

• Secondary Indexes (2i)

• MapReduce

• Full text search

Friday, 26 July 13

Page 8: Introduction to Riak - Joel Jacobson

HORIZONTALLY SCALABLE

•Near linear scalability

•Query load and data are spread evenly

• Add more nodes and get more:

• ops/second

• storage capacity

• compute power (for Map/Reduce)

Friday, 26 July 13

Page 9: Introduction to Riak - Joel Jacobson

FAULT TOLERANT• All nodes participate equally - no single point of failure (SPOF)

• All data is replicated

• Clusters self heal - Handoff, Active Anti-Entropy

• Cluster transparently survives...

• node failure

• network partitions

• Built on Erlang/OTP (designed for FT)

Friday, 26 July 13

Page 10: Introduction to Riak - Joel Jacobson

HIGHLY AVAILABLE

• Any node can serve client requests

• Fallbacks are used when nodes are down

• Always accepts read and write requests

• Per-request quorums

Friday, 26 July 13

Page 11: Introduction to Riak - Joel Jacobson

QUORUMS - N/R/W

• Tunable down to bucket level

• n_val = 3 by default

• w / r = 2 by default

• w = 1 - Quicker response time, read could be inconsistent in short term

• w = all - Slower response, increased data consistency

Friday, 26 July 13

Page 12: Introduction to Riak - Joel Jacobson

CAP THEOREM

• C = Consistency

• A = Availability

• P = Partition Tolerance

• Cap theorem states that a distributed shared data system can at most support 2 out of these 3 properties

DB DB DB

Client Client

Network/Data Partition

Friday, 26 July 13

Page 13: Introduction to Riak - Joel Jacobson

THE RING

Friday, 26 July 13

Page 14: Introduction to Riak - Joel Jacobson

REPLICATION• Replicated to 3 nodes by default (n_val =3, which is

configurable)

Friday, 26 July 13

Page 15: Introduction to Riak - Joel Jacobson

DISASTER SCENARIO•Node fails

• Request goes to fallback

•Node comes back

• Handoff - data retuned to recovered node

•Normal operations resume automatically

Friday, 26 July 13

Page 16: Introduction to Riak - Joel Jacobson

DISASTER SCENARIO•Node fails

• Request goes to fallback

•Node comes back

• Handoff - data retuned to recovered node

•Normal operations resume automatically hash(“user_id”)

Friday, 26 July 13

Page 17: Introduction to Riak - Joel Jacobson

ACTIVE ANTI-ENTROPY

• Automatically repair inconsistencies in data

• Active Anti-Entropy was new in 1.3.0 and uses Merkle trees to compare data in partitions and periodically ensure consistency

• Active Anti-Entropy runs as a background process

• Can also be configured as a manual process

Friday, 26 July 13

Page 18: Introduction to Riak - Joel Jacobson

CONFLICT RESOLUTION

•Network partitions and concurrent actors modifying the same data cause data divergence

• Riak provides two solutions to manage this that can be set on bucket level:

• Last Write Wins - an approach used for some use cases

• Vector Clocks - Retain “sibling” copies of data for merging

Friday, 26 July 13

Page 19: Introduction to Riak - Joel Jacobson

VECTOR CLOCKS

• Every node has an ID

• Send last-seen vector clock in every “put” request

• Can be viewed as ‘commit history’ e.g Git

• Lets you decide conflicts

Friday, 26 July 13

Page 20: Introduction to Riak - Joel Jacobson

SIBLING CREATION

0

32

1Objectv1

Objectv1

[{a,3}]

[{a,2},{b,1}]

1) 2)[{a,3}]

[{a,2},{b,1}]

0

32

1Objectv1

Object v1

Object v1

• Siblings can be created by:

• Simultaneous writes (based on same object version)

• Network partitions

• Writes to existing key without submitting vector clock

Friday, 26 July 13

Page 21: Introduction to Riak - Joel Jacobson

STORAGE BACKENDS

• Bitcask

• LevelDB

•Memory

•Multi

Friday, 26 July 13

Page 22: Introduction to Riak - Joel Jacobson

BITCASK

• A fast, append-only key-value store

• In memory key lookup table (key_dir) data on disk

• Closed files are immutable

•Merging cleans up old data

•Developed by Basho Technologies

• Suitable for bounded data, e.g. reference data

Friday, 26 July 13

Page 23: Introduction to Riak - Joel Jacobson

LEVELDB

• Key-Value storage developed by Google

• Append-only for very large data sets

•Multiple levels of SSTable-like data structures

• Allows for more advanced querying (2i)

• It includes compression (Snappy algorithm)

• Suitable for unbounded data or advanced querying

Friday, 26 July 13

Page 24: Introduction to Riak - Joel Jacobson

MEMORY

•Data is never persisted to disk

• Typically used for “test” databases (unit tests... etc)

•Definable memory limits per vnode

• Configurable object expiry

• Useful for highly transient data

Friday, 26 July 13

Page 25: Introduction to Riak - Joel Jacobson

MULTI

• Configure multiple storage engines for different types of data

• Configure the “default” storage engine

• Choose storage engine on per bucket basis

•No reason not to use it

Friday, 26 July 13

Page 26: Introduction to Riak - Joel Jacobson

CLIENT APIS

• Riak supports two main client types:

• REST based HTTP Interface

• Easy to use from command line and simple scripts

• Useful if using intermediate caching layer, e.g. Varnish

• Protocol Buffers

• Optimized binary encoding standard developed by Google

• More performant than HTTP interface

Friday, 26 July 13

Page 27: Introduction to Riak - Joel Jacobson

CLIENT LIBRARIES

• Client libraries supported by Basho:

• Community supported languages and frameworks:

• C/C++, Clojure, Common Lisp, Dart, Django, Go, Grails, Griffon, Groovy, Erlang, Haskell, Java, .NET, Node.js, OCaml , Perl, PHP, Play, Python, Racket, Ruby, Scala, Smalltalk

Friday, 26 July 13

Page 28: Introduction to Riak - Joel Jacobson

• Using Riak as datastore for all back-end systems supporting Angry Birds

• Game-state storage, ID/Login, Payments, Push notifications, analytics, advertisements

• 9 clusters in use with over 100 nodes

• 263 million active monthly users

Friday, 26 July 13

Page 29: Introduction to Riak - Joel Jacobson

• Spine2 project - storing patient data (80 million+)

• 500 complex messages per second

• 20,000 integrated end points

• 0 data loss

• 99.9% availability SLA

Friday, 26 July 13

Page 30: Introduction to Riak - Joel Jacobson

• Push to talk application

• Billions of requests daily

• > 50 dedicated servers

• Everything stored in Riak

• https://github.com/mranney/node_riak

Friday, 26 July 13

Page 31: Introduction to Riak - Joel Jacobson

MULTI DATACENTER REPLICATION (MDC)

• Allows data to be replicated between clusters in different data centers. Can handle larger latencies.

• Two synchronization modes that can be used together: real-time and full sync

• Set up as uni-directional or bi-directional replication

• Can be used for global load-balancing, business continuity and back-ups

Friday, 26 July 13

Page 32: Introduction to Riak - Joel Jacobson

RIAK-CS

• Built on top of Riak and supports MDC

• S3 compatible object storage

• Supports multi-tenancy

• Per-tenant usage data and statistics on network I/O

• Supports Objects of Arbitrary Content Type Up to 5TB

•Often used to build private cloud storage

Friday, 26 July 13

Page 33: Introduction to Riak - Joel Jacobson

PLAY AROUND WITH RIAK?

• https://github.com/joeljacobson/riak-dev-cluster

• https://github.com/joeljacobson/vagrant-riak-cluster

Friday, 26 July 13

Page 34: Introduction to Riak - Joel Jacobson

THANK [email protected]

bashobasho

Friday, 26 July 13


Top Related