attributed consistent hashing for heterogeneous storage system the virtual nodes (a2) will be...

Ø Attributed Consistent Hashing Ring Ø  Add two attributes to nodes on the hashing ring Ø  Capacity: the overall data capacity of a node Ø  Bandwidth: the maximum bandwidth of a node

•  Given a set of n nodes with attributes of E.g., mean node A1 has 1TB capacity and 146 MB/s bandwidth, node B1 has 256 GB capacity and 540 MB/s bandwidth •  Virtual node has the same attributes with the physical node, e.g., A1 and A2

on the ring Ø Consider Capacity Attribute

Ø  Making virtual nodes according to node capacity •  Making virtual nodes for each physical node, where

is the capacity of node

•  Proportionally adjust virtual node numbers if they are too large •  To avoid the declustering of data movement on each node

Ø Consider Bandwidth Attribute Ø  Define a range for each virtual node according to node bandwidth Ø  Each node has a range on the hashing ring. The virtual node has the same

range with the physical node. Range length is computed with a predefined parameter p with node configuration

Ø Data Placement algorithm Ø  Step 1: data is hashed to a position on the ring with hash value s Ø  Step 2: data is assigned to a node with pseudo-number algorithm val = hash(x, s, max), where x is data ID, s and max are lower and

upper range limits of the hashing ring.

Ø Objectives Ø  To design a data placement algorithm for heterogeneous storage

composing with hard disk driver (HDD) and solid state disk (SSD) Ø  Based on consistent hashing to keep its inherent features while making

better utilization of heterogeneous storage devices Ø Background

Ø  Cloud-scale storage system is an important building block of the cloud infrastructure, which demands efficient and flexible data distribution for performance improvement

Ø  Heterogeneous storage system is a very promising trend in data centers Ø Contributions

Ø A novel data distributed placement algorithm based on consistent hashing for heterogeneous storage systems

Ø  Improve the I/O performance while keeping storage space load balanced and well utilized

Methodology

Attributed Consistent Hashing for Heterogeneous Storage System

Abstract

Ø Heterogeneous device characteristics A heterogeneous storage system, which contains nodes with both HDDs and SSDs, is often desired to combine merits of different storage devices.

Ø  I/O bandwidth comparison of the HDD (Seagate ST9500620NS) and SSD (Intel SSDSC2BA200G3T)

Ø  SSD favors workloads with small I/O size •  The bandwidth of SSD is 65 times higher than that of HDD when

the request size is 4KB Ø  SSD excels in random I/Os generally

•  The SSD achieves nearly 2 to 9 times higher throughput than that of HDD when the request size varies from 64KB to 4MB

Ø Consistent hashing algorithm Consistent hashing (used in Dynamo, Chord and Sheepdog) conceptually assigns nodes and data to a ring. The data is assigned to the “K closest” nodes with the hashing value. Virtual nodes can be added to improve load balance. Data x and its replicas will be mapped to 3 “closest” nodes (A, B, C) with clockwise way. The virtual nodes (A2) will be skipped for data reliability. Ø Challenge Consistent hashing is mainly designed for a homogeneous environment and lacks efficient method to distinguish heterogeneous device characteristics.

Motivation

Ø  Simulation tests Ø  Make tests in a modular consistent hashing library libch-placement Ø  Results show our algorithm achieves similar calculation time and

memory usage with traditional consistent hashing Table 1 The calculation time of different algorithms (s)

Table 2 The memory usage in different configs (MB)

Ø  Performance Evaluation on Sheepdog storage system

Ø  On a cluster of 30 nodes (half SSDs) with FIO and Filebench Ø  Use FNV-1 hash function for consistent hashing and SIMD-oriented

Fast Mersenne Twister (SFMT) for generate pseudo random numbers Ø  Results show the performance can be improved more than 3 times

Performance Results

We propose a consistent hashing based data placement algorithm to promote better utilization of heterogeneous storage devices. This work is supported by the National Science Foundation under grant CNS-1338078 and IIP-1362134 through the Nimboxx membership contribution.

Conclusion

r2r3 A1

(a) Without virtual nodes (b) Using virtual nodes

DataVirtual node

DataNode

V = v1,!,vn{ } vi = ci,bi .a1 = 1000,146 ,b1 = 256,540 ,

<c1, b1>

<c2, b2>

<c3, b3>

Virtual node

<c1, b1>

<c2, b2>

<c3, b3>

cimin j∈V cj{ }

Node number Our algorithm Consistent hashing CRUSH 64 0.0131 0.0124 1.4591 512 0.0218 0.0204 9.1039 4096 0.0361 0.0332 70.2185 32768 0.1783 0.1948 582.4153

Node No. ×Virtual Node No. Our algorithm Consistent hashing 100 ×256 0.98 0.78 1000 ×256 9.77 7.81 10000 ×256 97.65 78.13 1000 ×1024 39.06 31.25 10000 ×4096 1953.0 1562.4

1K 4K 16K 64K 256K 1M 4M

Request sizes

Our algorithmConsistent hashingCRUSH (straw buckets)

0200400600800

100012001400160018002000

1 3 5 7 9 11 13 15

Number of clients

fileserver varmail webservervideoserver

Applications

CRUSH (straw buckets)Consistent hashingOur algorithm

fileserver varmail webserver videoserver

Applications

(a) Random read performance with different request sizes

(b) Random read performance with different clients (scalability)

(c) Average latency (d) IOPS

xf(x, s, max)

<1000, 146>

<500, 219>

<1000, 146>

<256, 540>

0 1 7.12.5 3.5

DataVirtual node

Node RangeA [0, 1), [2.5, 3.5)B [1, 2.5)C [3.5, 7.1)

range length = node bandwidthp

p =146

attributed consistent hashing for heterogeneous storage system the virtual nodes (a2) will be...

Documents

data mining lecture 6 sketching, min-hashing, locality...

hashing ppt

coalesced hashing

5 hashing hashtabellen hashing with chaining hashing with...

hashing 1 hashing. hashing 2 hashing … * again, a...

web caching with consistent hashing - rajesh...

10 hashing

lecture 10 sept 29 goals: hashing dictionary operations...

consistent hashing - university of washington · shard...

18 hashing

hashing part two: static perfect hashing

chapter 1thanhtung/downloads/dbms/chapter_1.pdf · hashing,...

b202 hashing

mit consistent hashing: load balancing in a changing world...

introduction to cloud computing - department of computer...

cs235102 data structures chapter 8 hashing (concentrating on...

chapter 8 hashing concept of hashing static hashing...

high dimensional search min-hashing locality sensitive...

c11 hashing

hashing - introduction - mcmaster...