![Page 1: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/1.jpg)
Memory is the new disk,disk is the new tape
Bela Ban, JBoss / Red Hat
![Page 2: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/2.jpg)
Motivation
● We want to store our data in memory– Memory access is faster than disk access
– Even across a network
– A DB requires network communication, too
● The disk is used for archival purposes● Not a replacement for DBs !
– Only a key-value store
– NoSQL
![Page 3: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/3.jpg)
Problems
● #1: How do we provide memory large enough to store the data (e.g. 2 TB of memory) ?
● #2: How do we guarantee persistence ?– Survival of data between reboots / crashes
![Page 4: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/4.jpg)
#1: Large memory
● We aggregate the memory of all nodes in a cluster into a large virtual memory space
– 100 nodes of 10 GB == 1 TB of virtual memory
![Page 5: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/5.jpg)
#2: Persistence
● We store keys redundantly on multiple nodes
– Unless all nodes on which key K is stored crash at the same time, K is persistent
● We can also store the data on disk– To prevent data loss in case all cluster
nodes crash
– This can be done asynchronously, on a background thread
![Page 6: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/6.jpg)
How do we provide redundancy ?
![Page 7: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/7.jpg)
Store every key on every node
AA BB CC DD
K1 K1 K1 K1
K2 K2 K2 K2
K3 K3 K3 K3
K4 K4 K4 K4
● RAID 1● Pro: data is available everywhere
– No network round trip
– Data loss only when all nodes crash
● Con: we can only use 25% of our memory
![Page 8: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/8.jpg)
Store every key on 1 node only
AA BB CC DD
K1 K2 K3 K4
● RAID 0, JBOD● Pro: we can use 100% of our memory● Con: data loss on node crash
– No redundancy
![Page 9: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/9.jpg)
Store every key on K nodes
AA BB CC DD
K1 K1
K2 K2
K3 K3
K4 K4
● K is configurable (2 in the example)● Variable RAID● Pro: we can use a variable % of our memory
– User determines tradeoff between memory consumption and risk of data loss
![Page 10: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/10.jpg)
So how do we determine on which nodes the keys are stored ?
![Page 11: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/11.jpg)
Consistent hashing
● Given a key K and a set of nodes, CH(K) will always pick the same node P for K
– We can also pick a list {P,Q} for K
● Anyone 'knows' that K is on P● If P leaves, CH(K) will pick another node Q
and rebalance affected keys● A good CH will rebalance 1/N keys at most
(where N = number of cluster nodes)
![Page 12: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/12.jpg)
Example
AA BB CC DD
K1 K1
K2 K2
K3 K3
K4 K4
● K2 is stored on B (primary owner) and C (backup owner)
![Page 13: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/13.jpg)
Example
AA BB CC DD
K1 K1
K2 K2
K3 K3
K4 K4
● Node B now crashes
![Page 14: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/14.jpg)
Example
● C (the backup owner of K2) copies K2 to D– C is now the primary owner of K2
● A copies K1 to C– C is now the backup owner of K1
AA BB CC DD
K1 K1 K1
K2 K2 K2
K3 K3
K4 K4
![Page 15: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/15.jpg)
Rebalancing
● Unless all N owners of a key K crash exactly at the same time, K is always stored redundantly
● When less than N owners crash, rebalancing will copy/move keys to other nodes, so that we have N owners again
![Page 16: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/16.jpg)
Enter ReplCache
● ReplCache is a distributed hashmap spanning the entire cluster
● Operations: put(K,V), get(K), remove(K)● For every key, we can define how many
times we'd like it to be stored in the cluster– 1: RAID 0
– -1: RAID 1
– N: variable RAID
![Page 17: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/17.jpg)
Use of ReplCache
HTTP
Apache
mod_jk
DB
JBoss
Servlet
ReplCache
JBoss
Servlet
ReplCache
JBoss
Servlet
ReplCacheCluster
![Page 18: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/18.jpg)
Demo
![Page 19: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/19.jpg)
Use cases
● JBoss AS: session distribution using Infinispan
– For data scalability, sessions are stored only N times in a cluster
● GridFS (Infinispan)– I/O over grid
– Files are chunked into slices, each slice is stored in the grid (redundantly if needed)
– Store a 4GB DVD in a grid where each node has only 2GB of heap
![Page 20: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/20.jpg)
Use cases
● Hibernate Over Grid (OGM)– Replaces DB backend with Infinispan
backed grid
![Page 21: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/21.jpg)
Conclusion
● Given enough nodes in a cluster, we can provide persistence for data
● Unlike RAID, where everything is stored fully redundantly (even /tmp), we can define persistence guarantees per key
● Ideal for data sets which need to be accessed quickly
– For the paranoid we can still stream to disk
![Page 22: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/22.jpg)
Conclusion
● Data is distributed over a grid– Cache is closer to clients
– No bottleneck to the DBMS
– Keys are on different nodes
![Page 23: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/23.jpg)
Conclusion
CacheCache
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
ClientClient
CacheCache
CacheCache
CacheCache
CacheCache
CacheCache
CacheCache
![Page 24: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)](https://reader034.vdocuments.us/reader034/viewer/2022051817/547aad83b379594e2b8b4ad3/html5/thumbnails/24.jpg)
Questions ?
● Demo (JGroups)– http://www.jgroups.org
● Infinispan– http://www.infinispan.org
● OGM– http://community.jboss.org/en/hibernate/ogm