memory is the new disk, disk is the new tape, bela ban (jboss by redhat)

Post on 29-Nov-2014

1.381 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Memory is the new disk,disk is the new tape

Bela Ban, JBoss / Red Hat

Motivation

● We want to store our data in memory– Memory access is faster than disk access

– Even across a network

– A DB requires network communication, too

● The disk is used for archival purposes● Not a replacement for DBs !

– Only a key-value store

– NoSQL

Problems

● #1: How do we provide memory large enough to store the data (e.g. 2 TB of memory) ?

● #2: How do we guarantee persistence ?– Survival of data between reboots / crashes

#1: Large memory

● We aggregate the memory of all nodes in a cluster into a large virtual memory space

– 100 nodes of 10 GB == 1 TB of virtual memory

#2: Persistence

● We store keys redundantly on multiple nodes

– Unless all nodes on which key K is stored crash at the same time, K is persistent

● We can also store the data on disk– To prevent data loss in case all cluster

nodes crash

– This can be done asynchronously, on a background thread

How do we provide redundancy ?

Store every key on every node

AA BB CC DD

K1 K1 K1 K1

K2 K2 K2 K2

K3 K3 K3 K3

K4 K4 K4 K4

● RAID 1● Pro: data is available everywhere

– No network round trip

– Data loss only when all nodes crash

● Con: we can only use 25% of our memory

Store every key on 1 node only

AA BB CC DD

K1 K2 K3 K4

● RAID 0, JBOD● Pro: we can use 100% of our memory● Con: data loss on node crash

– No redundancy

Store every key on K nodes

AA BB CC DD

K1 K1

K2 K2

K3 K3

K4 K4

● K is configurable (2 in the example)● Variable RAID● Pro: we can use a variable % of our memory

– User determines tradeoff between memory consumption and risk of data loss

So how do we determine on which nodes the keys are stored ?

Consistent hashing

● Given a key K and a set of nodes, CH(K) will always pick the same node P for K

– We can also pick a list {P,Q} for K

● Anyone 'knows' that K is on P● If P leaves, CH(K) will pick another node Q

and rebalance affected keys● A good CH will rebalance 1/N keys at most

(where N = number of cluster nodes)

Example

AA BB CC DD

K1 K1

K2 K2

K3 K3

K4 K4

● K2 is stored on B (primary owner) and C (backup owner)

Example

AA BB CC DD

K1 K1

K2 K2

K3 K3

K4 K4

● Node B now crashes

Example

● C (the backup owner of K2) copies K2 to D– C is now the primary owner of K2

● A copies K1 to C– C is now the backup owner of K1

AA BB CC DD

K1 K1 K1

K2 K2 K2

K3 K3

K4 K4

Rebalancing

● Unless all N owners of a key K crash exactly at the same time, K is always stored redundantly

● When less than N owners crash, rebalancing will copy/move keys to other nodes, so that we have N owners again

Enter ReplCache

● ReplCache is a distributed hashmap spanning the entire cluster

● Operations: put(K,V), get(K), remove(K)● For every key, we can define how many

times we'd like it to be stored in the cluster– 1: RAID 0

– -1: RAID 1

– N: variable RAID

Use of ReplCache

HTTP

Apache

mod_jk

DB

JBoss

Servlet

ReplCache

JBoss

Servlet

ReplCache

JBoss

Servlet

ReplCacheCluster

Demo

Use cases

● JBoss AS: session distribution using Infinispan

– For data scalability, sessions are stored only N times in a cluster

● GridFS (Infinispan)– I/O over grid

– Files are chunked into slices, each slice is stored in the grid (redundantly if needed)

– Store a 4GB DVD in a grid where each node has only 2GB of heap

Use cases

● Hibernate Over Grid (OGM)– Replaces DB backend with Infinispan

backed grid

Conclusion

● Given enough nodes in a cluster, we can provide persistence for data

● Unlike RAID, where everything is stored fully redundantly (even /tmp), we can define persistence guarantees per key

● Ideal for data sets which need to be accessed quickly

– For the paranoid we can still stream to disk

Conclusion

● Data is distributed over a grid– Cache is closer to clients

– No bottleneck to the DBMS

– Keys are on different nodes

Conclusion

CacheCache

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

CacheCache

CacheCache

CacheCache

CacheCache

CacheCache

CacheCache

Questions ?

● Demo (JGroups)– http://www.jgroups.org

● Infinispan– http://www.infinispan.org

● OGM– http://community.jboss.org/en/hibernate/ogm

top related