introduction to riak - joel jacobson
Post on 27-Jan-2015
124 Views
Preview:
DESCRIPTION
TRANSCRIPT
basho
Core Concepts
Introduction to Riak
bashoAKQA
24th July 2013
Friday, 26 July 13
WHO AM I?
Joel Jacobson
Technical Evangelist
Basho Technologies
@joeljacobson
Friday, 26 July 13
Distributed computing is
HARD.
Friday, 26 July 13
PROBLEMS?
• Concurrency and latency at scale
•Data consistency
• Uptime/failover
•Multi Tenancy
• SLA’s
Friday, 26 July 13
WHAT IS RIAK?
• Key-Value store + extras
•Distributed and horizontally scalable
• Fault-tolerant
• Highly available
• Built for the web
Friday, 26 July 13
INSPIRED BY AMAZON DYNAMO
•White paper released to describe a database system to be used for their shopping cart
•Masterless, peer-coordinated replication
•Dynamo inspired data-stores; Riak, Cassandra, Voldemort etc.
• Consistent hashing - no sharding :-)
• Eventually consistent
Friday, 26 July 13
RIAK KEY-VALUE STORE
• Simple operations - GET, PUT, DELETE
• Value is opaque, with metadata
• Extras, e.g.
• Secondary Indexes (2i)
• MapReduce
• Full text search
Friday, 26 July 13
HORIZONTALLY SCALABLE
•Near linear scalability
•Query load and data are spread evenly
• Add more nodes and get more:
• ops/second
• storage capacity
• compute power (for Map/Reduce)
Friday, 26 July 13
FAULT TOLERANT• All nodes participate equally - no single point of failure (SPOF)
• All data is replicated
• Clusters self heal - Handoff, Active Anti-Entropy
• Cluster transparently survives...
• node failure
• network partitions
• Built on Erlang/OTP (designed for FT)
Friday, 26 July 13
HIGHLY AVAILABLE
• Any node can serve client requests
• Fallbacks are used when nodes are down
• Always accepts read and write requests
• Per-request quorums
Friday, 26 July 13
QUORUMS - N/R/W
• Tunable down to bucket level
• n_val = 3 by default
• w / r = 2 by default
• w = 1 - Quicker response time, read could be inconsistent in short term
• w = all - Slower response, increased data consistency
Friday, 26 July 13
CAP THEOREM
• C = Consistency
• A = Availability
• P = Partition Tolerance
• Cap theorem states that a distributed shared data system can at most support 2 out of these 3 properties
DB DB DB
Client Client
Network/Data Partition
Friday, 26 July 13
THE RING
Friday, 26 July 13
REPLICATION• Replicated to 3 nodes by default (n_val =3, which is
configurable)
Friday, 26 July 13
DISASTER SCENARIO•Node fails
• Request goes to fallback
•Node comes back
• Handoff - data retuned to recovered node
•Normal operations resume automatically
Friday, 26 July 13
DISASTER SCENARIO•Node fails
• Request goes to fallback
•Node comes back
• Handoff - data retuned to recovered node
•Normal operations resume automatically hash(“user_id”)
Friday, 26 July 13
ACTIVE ANTI-ENTROPY
• Automatically repair inconsistencies in data
• Active Anti-Entropy was new in 1.3.0 and uses Merkle trees to compare data in partitions and periodically ensure consistency
• Active Anti-Entropy runs as a background process
• Can also be configured as a manual process
Friday, 26 July 13
CONFLICT RESOLUTION
•Network partitions and concurrent actors modifying the same data cause data divergence
• Riak provides two solutions to manage this that can be set on bucket level:
• Last Write Wins - an approach used for some use cases
• Vector Clocks - Retain “sibling” copies of data for merging
Friday, 26 July 13
VECTOR CLOCKS
• Every node has an ID
• Send last-seen vector clock in every “put” request
• Can be viewed as ‘commit history’ e.g Git
• Lets you decide conflicts
Friday, 26 July 13
SIBLING CREATION
0
32
1Objectv1
Objectv1
[{a,3}]
[{a,2},{b,1}]
1) 2)[{a,3}]
[{a,2},{b,1}]
0
32
1Objectv1
Object v1
Object v1
• Siblings can be created by:
• Simultaneous writes (based on same object version)
• Network partitions
• Writes to existing key without submitting vector clock
Friday, 26 July 13
STORAGE BACKENDS
• Bitcask
• LevelDB
•Memory
•Multi
Friday, 26 July 13
BITCASK
• A fast, append-only key-value store
• In memory key lookup table (key_dir) data on disk
• Closed files are immutable
•Merging cleans up old data
•Developed by Basho Technologies
• Suitable for bounded data, e.g. reference data
Friday, 26 July 13
LEVELDB
• Key-Value storage developed by Google
• Append-only for very large data sets
•Multiple levels of SSTable-like data structures
• Allows for more advanced querying (2i)
• It includes compression (Snappy algorithm)
• Suitable for unbounded data or advanced querying
Friday, 26 July 13
MEMORY
•Data is never persisted to disk
• Typically used for “test” databases (unit tests... etc)
•Definable memory limits per vnode
• Configurable object expiry
• Useful for highly transient data
Friday, 26 July 13
MULTI
• Configure multiple storage engines for different types of data
• Configure the “default” storage engine
• Choose storage engine on per bucket basis
•No reason not to use it
Friday, 26 July 13
CLIENT APIS
• Riak supports two main client types:
• REST based HTTP Interface
• Easy to use from command line and simple scripts
• Useful if using intermediate caching layer, e.g. Varnish
• Protocol Buffers
• Optimized binary encoding standard developed by Google
• More performant than HTTP interface
Friday, 26 July 13
CLIENT LIBRARIES
• Client libraries supported by Basho:
• Community supported languages and frameworks:
• C/C++, Clojure, Common Lisp, Dart, Django, Go, Grails, Griffon, Groovy, Erlang, Haskell, Java, .NET, Node.js, OCaml , Perl, PHP, Play, Python, Racket, Ruby, Scala, Smalltalk
Friday, 26 July 13
• Using Riak as datastore for all back-end systems supporting Angry Birds
• Game-state storage, ID/Login, Payments, Push notifications, analytics, advertisements
• 9 clusters in use with over 100 nodes
• 263 million active monthly users
Friday, 26 July 13
• Spine2 project - storing patient data (80 million+)
• 500 complex messages per second
• 20,000 integrated end points
• 0 data loss
• 99.9% availability SLA
Friday, 26 July 13
• Push to talk application
• Billions of requests daily
• > 50 dedicated servers
• Everything stored in Riak
• https://github.com/mranney/node_riak
Friday, 26 July 13
MULTI DATACENTER REPLICATION (MDC)
• Allows data to be replicated between clusters in different data centers. Can handle larger latencies.
• Two synchronization modes that can be used together: real-time and full sync
• Set up as uni-directional or bi-directional replication
• Can be used for global load-balancing, business continuity and back-ups
Friday, 26 July 13
RIAK-CS
• Built on top of Riak and supports MDC
• S3 compatible object storage
• Supports multi-tenancy
• Per-tenant usage data and statistics on network I/O
• Supports Objects of Arbitrary Content Type Up to 5TB
•Often used to build private cloud storage
Friday, 26 July 13
PLAY AROUND WITH RIAK?
• https://github.com/joeljacobson/riak-dev-cluster
• https://github.com/joeljacobson/vagrant-riak-cluster
Friday, 26 July 13
THANK YOUjoel@basho.com
bashobasho
Friday, 26 July 13
top related