the ieee international conference on cluster computing 2010

19
CDRM: CDRM: A Cost-effective A Cost-effective Dynamic Dynamic Replication Replication Management Management Scheme for Scheme for Cloud Storage Cloud Storage Cluster Cluster The IEEE International Conference on Cluster Computing 2010 Qingsong Wei Data Storage Institute, A-STAR, Singapore Bharadwaj Veeravalli, Bozhao Gong National University of Singapore Lingfang Zeng, Dan Feng Huazhong University of Science & Technology, China

Upload: osborne-mclaughlin

Post on 06-Jan-2018

232 views

Category:

Documents


1 download

DESCRIPTION

Cost-effective Dynamic Replication Management (CDRM) Agenda Outline Introduction Problem Statement Cost-effective Dynamic Replication Management (CDRM) Evaluation Conclusion

TRANSCRIPT

Page 1: The IEEE International Conference on Cluster Computing 2010

CDRM: CDRM: A Cost-effective Dynamic A Cost-effective Dynamic Replication Management Replication Management Scheme Scheme

for Cloud Storage for Cloud Storage ClusterCluster

The IEEE International Conference on Cluster Computing 2010

Qingsong Wei Data Storage Institute, A-STAR, Singapore

Bharadwaj Veeravalli, Bozhao GongNational University of Singapore

Lingfang Zeng, Dan FengHuazhong University of Science & Technology, China

Page 2: The IEEE International Conference on Cluster Computing 2010

Page 2 of 19

1. Introduction

2. Problem Statement

3. Cost-effective Dynamic Replication Management (CDRM)

4. Evaluation

5. Conclusion

OutlineAgenda

Page 3: The IEEE International Conference on Cluster Computing 2010

Page 3 of 19

Outline1, Introduction

HDFS Architecture

Network

Meta Data

Data Blocks

Data Nodes

Name Node

NodeDiskDisks

NodeDiskDisks

NodeDiskDisks

NodeDiskDisks

NodeDiskDisks

NodeDiskDisks

Clients

Control

Page 4: The IEEE International Conference on Cluster Computing 2010

1. Introduction

Page 4 of 19

Data Striping

B1 B2 … Bm

Node1 Node2 … Noden

In the HDFS, files are striped into date blocks across multiple data nodes to enable parallel access.

However, Block may be unaccessible due to date node unavailable. If one of the blocks is unavailable, so as the whole file.

Failure is normal instead of exception in large scale storage cloud system. Fault tolerance is required in such a system.

Page 5: The IEEE International Conference on Cluster Computing 2010

1. Introduction

Page 5 of 19

12

2 14 5 2

35

4 35 4

Clients

Replication is used in HDFS. When one data node fails, the data is still accessible from the

replicas and storage service need not be interrupted. Besides fault tolerance, replicas among data nodes can be used to

balance workload.

Data nodes

Page 6: The IEEE International Conference on Cluster Computing 2010

2. Problem Statement

Page 6 of 19

212

35

43

54 1

1 2 4 53

Current replication managements Treat all data as same: same replica number for all data Treat all storage nodes as same Fixed and Static

High cost & Poor load balance

Page 7: The IEEE International Conference on Cluster Computing 2010

2. Problem Statement

Page 7 of 19

Replica number is critical to management cost. More replica, more cost.

12

2 14 5 2

35 4 35 4The block 5

is modified

Update to maintain consistency

Because large number of blocks are stored in system, even a small increase of replica number can result in a significant increase of management cost in the overall system.

Then, how many minimal replica should be kept in the system to satisfy availability requirement?

Page 8: The IEEE International Conference on Cluster Computing 2010

2. Problem Statement

Page 8 of 19

Replica placement influences intra-request parallelism.

Client

Data Node1 Data Node2 Data Node3

B3 B2 B1

Sessionmax=3Sessionfree=1

Sessionmax=3Sessionfree=2

Sessionmax=2Sessionfree=0

B3

Requests

File (B1, B2, B3)

B2 B1

Blocked

B1

Page 9: The IEEE International Conference on Cluster Computing 2010

2. Problem Statement

Page 9 of 19

Replica placement also influences inter-request parallelism.

Client1 Client2

B3 B2 B1 B1

Sessionmax=3Sessionfree=0

Sessionmax=3Sessionfree=1

Sessionmax=2Sessionfree=0

Requests

How to place these replicas among Data nodes clusters in a balance way to improve access parallelism?

Data Node1 Data Node2 Data Node3

B1B2 B3B1

Page 10: The IEEE International Conference on Cluster Computing 2010

3. Cost-effective Dynamic Replication Management

Page 10 of 19

Node1 … Nodei … NodeN

Total arrival rate: λ

(p1, s1, r1, t1)

B1……

(pj, sj, rj, tj)

Bj……

(pM, sM, rM, tM)

BM

(λ1, τ1, f1, c1) (λi, τi, fi, ci) (λN, τN, fN, cN)

System Modelpj : popularitysj : sizerj : replica numbertj : access latency

requirement

λi : req. arr. rateτi : average ser. timefi : failure rateci : max sessions

Data has different attributes Data nodes are different

Page 11: The IEEE International Conference on Cluster Computing 2010

Availability Suppose file F is striped into m blocks {b1 , b2 ,…, bm}. To retrieve

whole file F, we must get all the m blocks. Availability is modeled as function of replica number.

Page 11 of 19

ect

m

j

jr

ii

jm

j AfCj

exp1 1

1 )()1(1

Minimum replicas can be calculated from above Eq. for a given expected availability.

Suppose the expected availability for file F is Aexpect, which defined by users. To satisfy the availability requirement for a given file, we get

m

j

jr

ii

jm

jj

fCFAP1 1

1 )()1(1)(

3. Cost-effective Dynamic Replication Management

Page 12: The IEEE International Conference on Cluster Computing 2010

1

0 !)(

!)(

ii c

k

kii

i

cii

i kcB

Replica placement policy: replica will be placed into data node with lowest blocking probability to dynamically maintain overall load balancing.

Page 12 of 19

Blocking Probability Blocking probability is used as criterion to place replicas among data

nodes to improve load balance . An data node Si is modeled as M/G/ci system with arrival rate λi and

service time τi, and accordingly, the blocking probability of data node Si can be given to be

3. Cost-effective Dynamic Replication Management

Page 13: The IEEE International Conference on Cluster Computing 2010

Page 13 of 19

3. Cost-effective Dynamic Replication Management

Framework of cost-effective dynamic replication management in HDFS

Client

Data Nodes

Name NodeB1B2…Bm

Calculate the replication factor and Search the Datanode B+Tree to obtain Datanode list.

2

4Flush and replicate blocks to selected Datanodes

Request to create a file with <Availability, Block Number>

1

Return replication policy <Bi, Replication factor, DataNode list>

3

Replication Pipelining

Page 14: The IEEE International Conference on Cluster Computing 2010

4. Evaluation

Setup Our test platform is built on a cluster with one name node and twenty

data nodes of commodity computer

The operating system is Red Hat AS4.4 with kernel 2.6.20.

Hadoop version is 0.16.1 and java version is 1.6.0.

AUSPEX file system trace is used

A synthesizer is developed to create workloads with different

characteristics, such as data sets of different sizes, varying data rates,

and different popularities. These characteristics reflect the differences

among various workloads to the cloud storage cluster.

Page 14 of 19

Page 15: The IEEE International Conference on Cluster Computing 2010

4. Evaluation

Cost effective Availability Initially, one replica per object. CDRM only maintain minimal replicas to satisfy availability. Higher failure rate, more replica required.

Page 15 of 19

Dynamic replication with Data node failure rate of 0.1 and 0.2 , Aexpect=0.8

0

1

2

3

4

5

0 5 10 15 20 25 30 35 40

Time(Min)

Rep

lica

Num

ber

Failure Rate=0.2Failure Rate=0.1

Page 16: The IEEE International Conference on Cluster Computing 2010

4. Evaluation Performance

CDRM vs. HDFS default Replication Management (HDRM) under different popularity and workload intensity.

Performance of CDRM is much better than that of HDRM when popularity is small. CDRM outperform HDRM under heavy workload.

Page 16 of 19

0510152025303540

10 20 30 40 50 60 70 80 90 100

Popularity(%)

Ave

rage

Lat

ency

(ms) HDRM λ=0.6 CDRM λ=0.6

HDRM λ=0.2 CDRM λ=0.2

Effect of popularity and access arrival rate, 20 data nodes

Page 17: The IEEE International Conference on Cluster Computing 2010

4. Evaluation

Load Balance The figure shows the difference of system utilization of each data node

comparing to the average system utilization of the cluster. CDRM can dynamically distribute workload among whole cluster.

Page 17 of 19

-50

-40

-30

-20

-10

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Data Node#

Diff

. of s

ys. u

tiliz

atio

n

CDRM HDRM

System utilization among data nodes, popularity=10%, λ=0.6

Page 18: The IEEE International Conference on Cluster Computing 2010

5. Conclusion

Page 18 of 19

Current replication management policies CDRM

Data is same Data is differentStorage node are same Storage nodes are different

Same replica number for all data Different replica number for different data

Static placement Dynamic placementHigh Cost Cost effectivePoor load balance Good balanceLow performance High performance

Page 19: The IEEE International Conference on Cluster Computing 2010

Page 19 of 19

For more questions, please contact Dr. Qingsong Wei by email:

[email protected]