springfs: bridging agility and performance in elastic ......• hadoop/hdfs deployments at facebook...

26
SpringFS: Bridging Agility and Performance in Elastic Distributed Storage Lianghong Xu Jim Cipar, Elie Krevat, Alexey Tumanov, Nitin Gupta, Mike Kozuch (Intel), Greg Ganger Carnegie Mellon University http://www.pdl.cmu.edu/ Lianghong Xu © Feb 14

Upload: others

Post on 07-Nov-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

SpringFS: Bridging Agility and Performance in Elastic Distributed Storage

Lianghong Xu Jim Cipar, Elie Krevat, Alexey Tumanov, Nitin Gupta,

Mike Kozuch (Intel), Greg Ganger

Carnegie Mellon University

http://www.pdl.cmu.edu/ Lianghong Xu © Feb 14!

Page 2: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Elasticity in Distributed Storage •  “Elasticity” in distributed storage:

•  ability to resize dynamically as workload varies •  More difficult than elastic computing

•  Benefits •  Re-use for other purposes or reduce energy usage •  Save machine hours (operating cost)

•  Most distributed storage is not elastic •  Designed for load balancing, not elasticity •  E.g., GFS and HDFS •  Deactivating servers may make data unavailable

http://www.pdl.cmu.edu/ 2 Lianghong Xu © Feb 14!

Page 3: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

SpringFS Contributions

http://www.pdl.cmu.edu/ 3

“Agility”: speed of elastic resizing

Lianghong Xu © Feb 14!

Sierra

Some elasticity Max peak write performance

Rabbit

Max elasticity Sub-max peak write performance

SpringFS

minimize data migration minimize machine hours

Key metrics

1. Fill the gap

2. Propose and address agility

Page 4: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Agility is Important

<your name here> © 2/24/14!http://www.pdl.cmu.edu/ 4

“Burstiness” in the Facebook HDFS trace

Page 5: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Agility is Important

<your name here> © 2/24/14!http://www.pdl.cmu.edu/ 5

“Burstiness” in the Facebook HDFS trace

Page 6: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Agility is Important

http://www.pdl.cmu.edu/ 6

> 50% potential saving of machine hours

Lianghong Xu © Feb 14!

Agility allows close tracking of workload variation

“Burstiness” in the Facebook HDFS trace

Page 7: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Outline •  Introduction

•  Background and motivation

•  SpringFS design

•  Evaluation

•  Conclusion

http://www.pdl.cmu.edu/ 7 Lianghong Xu © Feb 14!

Page 8: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Non-elastic Example: HDFS

http://www.pdl.cmu.edu/ 8

Almost all servers must be “on” to ensure 100% availability •  Little potential for elastic resizing

Primary replicas

Tertiary replicas

Secondary replicas

Assumption: 3-way replication

Lianghong Xu © Feb 14!

Server number Dat

a st

ored

on

serv

er X

N 1

Pseudo-random placement, even data layout

Page 9: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Data Layout in Elastic Storage •  General rule:

•  Take advantage of replication •  Always keep the first (primary) replicas “on” •  The other replicas can be activated on demand

•  Notable examples: Sierra [1] and Rabbit [2]

[1] E. Thereska et al. Sierra: Practical Power-proportionality for Data Center Storage. Eurosys 2011.

[2] J. Cipar et al. Robust and flexible power-proportional storage. SoCC 2010.

<your name here> © 2/24/14!http://www.pdl.cmu.edu/ 9

Page 10: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Sierra Data Layout

http://www.pdl.cmu.edu/ 10

N/3

•  Peak write performance: N/3 (as good as HDFS)

Primary servers

Secondary and tertiary servers

Lianghong Xu © Feb 14!

Server number

Dat

a st

ored

on

serv

er X

N 1

•  #servers with primary replicas: N/3

Page 11: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Dat

a st

ored

on

serv

er X

Server number N 1

Rabbit Equal-work Data Layout

http://www.pdl.cmu.edu/ 11

Primary servers p = N / e2 ≈ 0.13N

Lianghong Xu © Feb 14!

Peak write performance: p

#servers with primary replicas: p (theoretically the best)

Page 12: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Rabbit When Using Offloading

http://www.pdl.cmu.edu/ 12

Dat

a st

ored

on

the

serv

er

Server number N

Lianghong Xu © Feb 14!

Primary replicas spread across all active servers

•  Peak write performance: N/3

•  #servers with primary replicas: N

Page 13: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Tradeoff Space

http://www.pdl.cmu.edu/ 13

#ser

vers

with

prim

ary

repl

icas

Peak write performance (normalized to 1 server) 0

better

bette

r N HDFS

Rabbit (offloaded)

N/3

N/3 Sierra (all cases)

p

p Rabbit (no offload)

SpringFS (depending on workload)

Lianghong Xu © Feb 14!

Page 14: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Outline •  Introduction

•  Background and motivation

•  SpringFS design

•  Evaluation

•  Conclusion

http://www.pdl.cmu.edu/ 14 Lianghong Xu © Feb 14!

Page 15: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

SpringFS Design •  Bounded write offloading

•  Dynamically constrain distribution of primary replicas

•  Read offloading •  Preferentially offload reads from write-heavy servers

•  Passive migration •  Delay migration on server re-integration

http://www.pdl.cmu.edu/ 15 Lianghong Xu © Feb 14!

Page 16: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Bounded Write Offloading

http://www.pdl.cmu.edu/ 16

Dat

a st

ored

on

serv

er X

Server number N p m

Offload set: automatically adapts to workload Lianghong Xu © Feb 14!

Page 17: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Bounded Write Offloading

http://www.pdl.cmu.edu/ 17

N m

Lianghong Xu © Feb 14!

m = N/3 m = p Rabbit Sierra

Page 18: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

SpringFS Implementation •  Modified instance of HDFS

•  Written in Java and Python

•  Built and used a Scriptable HDFS interface •  Data placement •  Load balancing •  Data Migration •  Implement SpringFS and Rabbit in the same system

•  Resizing agent •  Activate/deactivate servers according to workload

<your name here> © 2/24/14!http://www.pdl.cmu.edu/ 18

Page 19: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Outline •  Introduction and motivation

•  Background

•  SpringFS design

•  Evaluation

•  Conclusion

http://www.pdl.cmu.edu/ 19 Lianghong Xu © Feb 14!

Page 20: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Evaluation Overview •  Experiments with SpringFS prototype

•  Analysis with real-world traces •  Hadoop/HDFS deployments at Facebook (FB) and

Cloudera Customers (CC)

•  Summary of results •  SpringFS improves over state-of-the-art designs

– Reduces data migration – Reduces machine hour usage (often near-ideal) – Provides range of options between Rabbit and

Sierra <your name here> © 2/24/14!http://www.pdl.cmu.edu/ 20

Page 21: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Data Migration in SpringFS Prototype

http://www.pdl.cmu.edu/ 21

0

50

100

150

200

250

300

350

400

450

0 5 10 15 20 25 30

Num

ber o

f blo

cks

to m

ove

Target active servers

Rabbit (no offload) Rabbit (offload=30) SpringFS(offload=6) SpringFS(offload=8) SpringFS(offload=10)

Lianghong Xu © Feb 14!

Experimental setup: •  30 DataNodes •  4 primary servers •  2GB file per server •  128MB block size

Same performance

Page 22: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Policy Analysis with the Facebook Trace

http://www.pdl.cmu.edu/ 22

“Ideal” system: no migration for resizing

Area under curve: aggregate machine hour usage

Lianghong Xu © Feb 14!

Page 23: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Machine Hours (SpringFS vs. Ideal)

http://www.pdl.cmu.edu/ 23

No cleanup work

Extra servers activated due to passive migration

Lianghong Xu © Feb 14!

#primary servers

Page 24: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Machine Hour Usage

http://www.pdl.cmu.edu/ 24

SpringFS: 6-120% improvement

SpringFS within 5% off “Ideal”

Lianghong Xu © Feb 14!

Page 25: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Data Migration

http://www.pdl.cmu.edu/ 25

SpringFS: 9-208X improvement

Lianghong Xu © Feb 14!

Page 26: SpringFS: Bridging Agility and Performance in Elastic ......• Hadoop/HDFS deployments at Facebook (FB) and Cloudera Customers (CC) • Summary of results • SpringFS improves over

Conclusion •  SpringFS: a new elastic distributed storage

•  Fills the gap in state-of-the-art designs

•  Agility is important •  Ability to track workload burstiness

•  Address agility by minimizing data migration •  Bounded write offloading •  Read offloading + passive migration

•  Much lower machine hour usage

<your name here> © 2/24/14!http://www.pdl.cmu.edu/ 26