attributed consistent hashing for heterogeneous storage system the virtual nodes (a2) will be...

1
Attributed Consistent Hashing Ring Add two attributes to nodes on the hashing ring Capacity: the overall data capacity of a node Bandwidth: the maximum bandwidth of a node Given a set of n nodes with attributes of E.g., mean node A1 has 1TB capacity and 146 MB/s bandwidth, node B1 has 256 GB capacity and 540 MB/s bandwidth Virtual node has the same attributes with the physical node, e.g., A1 and A2 on the ring Consider Capacity Attribute Making virtual nodes according to node capacity Making virtual nodes for each physical node, where is the capacity of node Proportionally adjust virtual node numbers if they are too large To avoid the declustering of data movement on each node Consider Bandwidth Attribute Define a range for each virtual node according to node bandwidth Each node has a range on the hashing ring. The virtual node has the same range with the physical node. Range length is computed with a predefined parameter p with node configuration Data Placement algorithm Step 1: data is hashed to a position on the ring with hash value s Step 2: data is assigned to a node with pseudo-number algorithm val = hash(x, s, max), where x is data ID, s and max are lower and upper range limits of the hashing ring. Objectives To design a data placement algorithm for heterogeneous storage composing with hard disk driver (HDD) and solid state disk (SSD) Based on consistent hashing to keep its inherent features while making better utilization of heterogeneous storage devices Background Cloud-scale storage system is an important building block of the cloud infrastructure, which demands efficient and flexible data distribution for performance improvement Heterogeneous storage system is a very promising trend in data centers Contributions A novel data distributed placement algorithm based on consistent hashing for heterogeneous storage systems Improve the I/O performance while keeping storage space load balanced and well utilized Methodology Attributed Consistent Hashing for Heterogeneous Storage System Abstract Heterogeneous device characteristics A heterogeneous storage system, which contains nodes with both HDDs and SSDs, is often desired to combine merits of different storage devices. I/O bandwidth comparison of the HDD (Seagate ST9500620NS) and SSD (Intel SSDSC2BA200G3T) SSD favors workloads with small I/O size The bandwidth of SSD is 65 times higher than that of HDD when the request size is 4KB SSD excels in random I/Os generally The SSD achieves nearly 2 to 9 times higher throughput than that of HDD when the request size varies from 64KB to 4MB Consistent hashing algorithm Consistent hashing (used in Dynamo, Chord and Sheepdog) conceptually assigns nodes and data to a ring. The data is assigned to the “K closest” nodes with the hashing value. Virtual nodes can be added to improve load balance. Data x and its replicas will be mapped to 3 “closest” nodes (A, B, C) with clockwise way. The virtual nodes (A2) will be skipped for data reliability. Challenge Consistent hashing is mainly designed for a homogeneous environment and lacks efficient method to distinguish heterogeneous device characteristics. Motivation Simulation tests Make tests in a modular consistent hashing library libch-placement Results show our algorithm achieves similar calculation time and memory usage with traditional consistent hashing Table 1 The calculation time of different algorithms (s) Table 2 The memory usage in different configs (MB) Performance Evaluation on Sheepdog storage system On a cluster of 30 nodes (half SSDs) with FIO and Filebench Use FNV-1 hash function for consistent hashing and SIMD-oriented Fast Mersenne Twister (SFMT) for generate pseudo random numbers Results show the performance can be improved more than 3 times Performance Results We propose a consistent hashing based data placement algorithm to promote better utilization of heterogeneous storage devices. This work is supported by the National Science Foundation under grant CNS-1338078 and IIP-1362134 through the Nimboxx membership contribution. Conclusion A B C D x r1 r2 r3 A1 B1 A2 C1 x r1 r3 r2 D1 (a) Without virtual nodes (b) Using virtual nodes Data Virtual node Data Node V = v 1 , ! , v n { } v i = c i , b i . a 1 = 1000,146 , b 1 = 256, 540 , A1 B1 C1 A2 B2 C2 <c 1 , b 1 > <c 2 , b 2 > <c 3 , b 3 > Virtual node <c 1 , b 1 > <c 2 , b 2 > <c 3 , b 3 > c i min j Vc j { } " # # # $ % % % c i v i . Node number Our algorithm Consistent hashing CRUSH 64 0.0131 0.0124 1.4591 512 0.0218 0.0204 9.1039 4096 0.0361 0.0332 70.2185 32768 0.1783 0.1948 582.4153 Node No. ×Virtual Node No. Our algorithm Consistent hashing 100 ×256 0.98 0.78 1000 ×256 9.77 7.81 10000 ×256 97.65 78.13 1000 ×1024 39.06 31.25 10000 ×4096 1953.0 1562.4 0 20 40 60 80 100 120 140 1K 4K 16K 64K 256K 1M 4M Aggregated Bandwidth (MB/s) Request sizes Our algorithm Consistent hashing CRUSH (straw buckets) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 3 5 7 9 11 13 15 Aggregated Bandwidth (MB/s) Number of clients Our algorithm Consistent hashing CRUSH (straw buckets) 0 50 100 150 200 250 fileserver varmail webservervideoserver Average latency (ms) Applications CRUSH (straw buckets) Consistent hashing Our algorithm 0 1000 2000 3000 4000 5000 6000 fileserver varmail webserver videoserver IOPS (op/s) Applications Our algorithm Consistent hashing CRUSH (straw buckets) (a) Random read performance with different request sizes (b) Random read performance with different clients (scalability) (c) Average latency (d) IOPS 0 Max A1 B1 A2 C1 x f(x, s, max) C2 <1000, 146> <500, 219> <1000, 146> <256, 540> range 0 1 7.1 2.5 3.5 Data Virtual node B2 Node Range A [0, 1), [2.5, 3.5) B [1, 2.5) C [3.5, 7.1) range length = node bandwidth p p = 146

Upload: others

Post on 28-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Attributed Consistent Hashing for Heterogeneous Storage System The virtual nodes (A2) will be skipped for data reliability. ! Challenge Consistent hashing is mainly designed for a

Ø Attributed Consistent Hashing Ring Ø  Add two attributes to nodes on the hashing ring Ø  Capacity: the overall data capacity of a node Ø  Bandwidth: the maximum bandwidth of a node

•  Given a set of n nodes with attributes of E.g., mean node A1 has 1TB capacity and 146 MB/s bandwidth, node B1 has 256 GB capacity and 540 MB/s bandwidth •  Virtual node has the same attributes with the physical node, e.g., A1 and A2

on the ring Ø Consider Capacity Attribute

Ø  Making virtual nodes according to node capacity •  Making virtual nodes for each physical node, where

is the capacity of node

•  Proportionally adjust virtual node numbers if they are too large •  To avoid the declustering of data movement on each node

Ø Consider Bandwidth Attribute Ø  Define a range for each virtual node according to node bandwidth Ø  Each node has a range on the hashing ring. The virtual node has the same

range with the physical node. Range length is computed with a predefined parameter p with node configuration

Ø Data Placement algorithm Ø  Step 1: data is hashed to a position on the ring with hash value s Ø  Step 2: data is assigned to a node with pseudo-number algorithm val = hash(x, s, max), where x is data ID, s and max are lower and

upper range limits of the hashing ring.

Ø Objectives Ø  To design a data placement algorithm for heterogeneous storage

composing with hard disk driver (HDD) and solid state disk (SSD) Ø  Based on consistent hashing to keep its inherent features while making

better utilization of heterogeneous storage devices Ø Background

Ø  Cloud-scale storage system is an important building block of the cloud infrastructure, which demands efficient and flexible data distribution for performance improvement

Ø  Heterogeneous storage system is a very promising trend in data centers Ø Contributions

Ø A novel data distributed placement algorithm based on consistent hashing for heterogeneous storage systems

Ø  Improve the I/O performance while keeping storage space load balanced and well utilized

Methodology

Attributed Consistent Hashing for Heterogeneous Storage System

Abstract

Ø Heterogeneous device characteristics A heterogeneous storage system, which contains nodes with both HDDs and SSDs, is often desired to combine merits of different storage devices.

Ø  I/O bandwidth comparison of the HDD (Seagate ST9500620NS) and SSD (Intel SSDSC2BA200G3T)

Ø  SSD favors workloads with small I/O size •  The bandwidth of SSD is 65 times higher than that of HDD when

the request size is 4KB Ø  SSD excels in random I/Os generally

•  The SSD achieves nearly 2 to 9 times higher throughput than that of HDD when the request size varies from 64KB to 4MB

Ø Consistent hashing algorithm Consistent hashing (used in Dynamo, Chord and Sheepdog) conceptually assigns nodes and data to a ring. The data is assigned to the “K closest” nodes with the hashing value. Virtual nodes can be added to improve load balance. Data x and its replicas will be mapped to 3 “closest” nodes (A, B, C) with clockwise way. The virtual nodes (A2) will be skipped for data reliability. Ø Challenge Consistent hashing is mainly designed for a homogeneous environment and lacks efficient method to distinguish heterogeneous device characteristics.

Motivation

Ø  Simulation tests Ø  Make tests in a modular consistent hashing library libch-placement Ø  Results show our algorithm achieves similar calculation time and

memory usage with traditional consistent hashing Table 1 The calculation time of different algorithms (s)

Table 2 The memory usage in different configs (MB)

Ø  Performance Evaluation on Sheepdog storage system

Ø  On a cluster of 30 nodes (half SSDs) with FIO and Filebench Ø  Use FNV-1 hash function for consistent hashing and SIMD-oriented

Fast Mersenne Twister (SFMT) for generate pseudo random numbers Ø  Results show the performance can be improved more than 3 times

Performance Results

We propose a consistent hashing based data placement algorithm to promote better utilization of heterogeneous storage devices. This work is supported by the National Science Foundation under grant CNS-1338078 and IIP-1362134 through the Nimboxx membership contribution.

Conclusion

A

B

C

Dxr1

r2r3 A1

B1A2

C1xr1

r3 r2

D1

(a) Without virtual nodes (b) Using virtual nodes

DataVirtual node

DataNode

V = v1,!,vn{ } vi = ci,bi .a1 = 1000,146 ,b1 = 256,540 ,

A1

B1

C1

A2

B2

C2

<c1, b1>

<c2, b2>

<c3, b3>

Virtual node

<c1, b1>

<c2, b2>

<c3, b3>

cimin j∈V cj{ }

"

#

##

$

%

%%

ci

vi.

Node number Our algorithm Consistent hashing CRUSH 64 0.0131 0.0124 1.4591 512 0.0218 0.0204 9.1039 4096 0.0361 0.0332 70.2185 32768 0.1783 0.1948 582.4153

Node No. ×Virtual Node No. Our algorithm Consistent hashing 100 ×256 0.98 0.78 1000 ×256 9.77 7.81 10000 ×256 97.65 78.13 1000 ×1024 39.06 31.25 10000 ×4096 1953.0 1562.4

0

20

40

60

80

100

120

140

1K 4K 16K 64K 256K 1M 4M

Agg

rega

ted

Ban

dwid

th (

MB

/s)

Request sizes

Our algorithmConsistent hashingCRUSH (straw buckets)

0200400600800

100012001400160018002000

1 3 5 7 9 11 13 15

Aggr

egat

ed B

andw

idth

(MB/

s)

Number of clients

Our algorithmConsistent hashingCRUSH (straw buckets)

0

50

100

150

200

250

fileserver varmail webservervideoserver

Aver

age

late

ncy

(ms)

Applications

CRUSH (straw buckets)Consistent hashingOur algorithm

0

1000

2000

3000

4000

5000

6000

fileserver varmail webserver videoserver

IOPS

(op/

s)

Applications

Our algorithmConsistent hashingCRUSH (straw buckets)

(a) Random read performance with different request sizes

(b) Random read performance with different clients (scalability)

(c) Average latency (d) IOPS

0Max

A1

B1

A2

C1

xf(x, s, max)

C2

<1000, 146>

<500, 219>

<1000, 146>

<256, 540>

range

0 1 7.12.5 3.5

DataVirtual node

B2

Node RangeA [0, 1), [2.5, 3.5)B [1, 2.5)C [3.5, 7.1)

range length = node bandwidthp

p =146