mantle: a programmable metadata load balancer for the ceph ...€¦ · orse than . 1 mds. adaptable...

Post on 14-Nov-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Michael SevillaMantle, Symposium ‘15

Mantle: A Programmable Metadata Load Balancer for the

Ceph File SystemMichael A. Sevilla, Noah Watkins, Carlos Maltzahn, Ike Nassi, Scott A. Brandt, Sage A. Weil*, Greg Farnum*, Sam Fineberg^

UC Santa Cruz, *Red Hat, ^HP StoragePublished at Supercomputing 2015

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Separating Metadata & Data IO

File System

2

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

metadata service

Separating Metadata & Data IO

DistributedFile System

object store

3

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

History: A Simple Solution

• 1 MDS is insufficient[McKusick et al., login; '10], [Beaver et al., OSDI '10], [Thusoo et al., SIGMOD '10]

• How do we distribute metadata?

4

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

History: Scalable Solutions

1. Hash file identifier 2. Subtree partitioning

5

Michael SevillaMantle, Symposium ‘15

Outline

1. File System Metadata Management2. CephFS Background3. Complexity of Dynamic Subtree Partitioning4. Mantle5. Evaluation

6

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

CephFS Background

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Example File System Workload

• Linux kernel compile locality

• Shade of Red: locality

Time

Fewer InodeRead/Writes

Many InodeRead/Writes

8

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Example File System Workload

• Linux kernel compile locality

• Shade of Red: locality

Time

Fewer InodeRead/Writes

Many InodeRead/Writes

9

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

CephFS Hotspot Detection!

Migration!

10

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Does CephFS work?what we want

bad

bad

bad

11

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Complexity of Dynamic Subtree Partitioning

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

MDS Cluster

rebalancemigrate?

partitionclusterpartition

namespace

migratefragment

recv HB

Why not?

Migration Policies• How to calculate load?• When to move load?• Where to move load?• How much to move?

RADOSrebalance

Hierarchical Namespace

13

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

CephFS’s Policies

“weighted ∑𝒐𝒐𝒐𝒐’’

“weighted ∑𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎’’

“greater than average’’

“underload MDS’’

“equal load across cluster’’

14

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Different Balancers for Different Workloads

• Which heuristics should we use?[Weil et al., SuperComputing ‘04] [Patil et al., FAST ‘11] [Pai et al., ASPLOS ‘98]

Good for mixed workloads

Good for create-heavy workloads

Simple implementation

15

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Mantle

http://synapostasy.blogspot.com/2007/10/cephalopod-awareness-day.html

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Different Balancers for Different Workloads

• Which heuristics should we use?[Weil et al., SuperComputing ‘04] [Patil et al., FAST ‘11] [Pai et al., ASPLOS ‘98]

MDS Cluster

Mantle API

17

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Different Balancers for Different Workloads

• Which heuristics should we use?[Weil et al., SuperComputing ‘04] [Patil et al., FAST ‘11] [Pai et al., ASPLOS ‘98]

MDS Cluster

18

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Different Balancers for Different Workloads

• Which heuristics should we use?[Weil et al., SuperComputing ‘04] [Patil et al., FAST ‘11] [Pai et al., ASPLOS ‘98]

MDS Cluster

19

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Implementation: API + EnvironmentMDS Cluster

rebalance

20

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Balancers

• Greedy Spill Balancer

• Fill & Spill Balancer

• Adaptable Balancer

21

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Evaluation

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Evaluation: Creates Workload

• % of total load:

25 25 2525

25 0 075

25 13 1350

23

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Workload: Creates in Same Directory

best

sp

eedu

p

distribution not worthwhile st

able

Ove

rload

ed

MD

S

bett

er th

an1

MDS

wor

se th

an

1 M

DS

Strategy

24

Outline1. FS Metadata Mngmt2. CephFS Background3. Complexity of DSP4. Mantle5. Evaluation

Michael SevillaMantle, Symposium ‘15

Workload: Compiling Code

system notsaturated

best speedupmost stable

bett

er th

an1

MDS

wor

se th

an

1 M

DS

Adaptable Balancer

too

aggr

essi

ve=

bad

perf.

25

Michael SevillaMantle, Symposium ‘15

Conclusion: Separate Policy and Mechanism

• Benefits of understanding server capacity• less resource utilization• better performance/stability

• Distribution can hurt performance/stability

• Being too aggressive thrashes workload

26

Michael SevillaMantle, Symposium ‘15

Thanks! Questions?Acknowledgements:

Co-authors: Noah Watkins, Carlos Maltzahn, Ike Nassi, Scott A. Brandt, Sage A. Weil*, Greg Farnum*, Sam Fineberg^

Collaboraters: Ivo Jimenez, Adam CrumeFunding: HP Enterprise; storage division

27

Michael SevillaMantle, Symposium ‘15

Extra Slides

28/24

Michael SevillaMantle, Symposium ‘15

Why is Locality Important?

29

Michael SevillaMantle, Symposium ‘15

More Recent History: Distributed Metadata

Mechanisms for migrating load

Heuristics for migrating resources

30

Michael SevillaMantle, Symposium ‘15

Evaluation: Compile Workload

31

Michael SevillaMantle, Symposium ‘15

Background CephFS

• Why layering a file system over RADOS is effective• Random access• Significant engineering effort• Specialized subsystem for handling the namespace

32

top related