azor: using two-level block selection to improve ssd-based ... · title: azor: using two-level...

IntroductionSystem Design

Experimental PlatformEvaluation

Conclusions

Azor: Using Two-level Block Selectionto Improve SSD-based I/O caches

Yannis Klonatos, Thanos Makatos, Manolis Marazakis,Michail D. Flouris, Angelos Bilas

{klonatos, makatos, maraz, flouris, bilas}@ics.forth.gr

Foundation for Research and Technology - Hellas (FORTH),Institute of Computer Science (ICS)

July 23, 2011

Yannis Klonatos et al. FORTH-ICS, Greece 1 / 29



Conclusions

BackgroundPrevious WorkOur goal

Table of contents

1 Introduction

2 System Design

3 Experimental Platform

4 Evaluation

5 Conclusions




Conclusions


Background

Increased need for high-performance storage I/O1. Larger file-set sizes ⇒ more I/O time2. Server virtualization and consolidation ⇒ more I/O pressure

SSDs can mitigate I/O penalties

SSD HDD

Throughput (R/W) (MB/s) 277/202 100/90Response time (ms) 0.17 12.6IOPS (R/W) 30,000/3,500 150/150Price/capacity ($/GB) $3 $0.3Capacity per device 32 – 120 GB Up to 3TB

Mixed SSD and HDD environments are necessary

Cost-effectiveness: deploy SSDs as HDDs caches




Conclusions


Previous Work

Web servers as a secondary file cache[Kgil et al., 2006 ]� Requires application knowledge and intervention

Readyboost feature in Windows� Static file preloading� Requires user interaction

bcache module in the Linux Kernel� Has no admission control

NetApp’s Performance Acceleration Module� Needs specialized hardware




Conclusions


Our goal

Design Azor, a transparent SSD cache� Move SSD caching to block-level� Hide the address space of SSDs

Thorough analysis of design parameters1. Dynamic differentiation of blocks2. Cache associativity3. I/O concurrency




Conclusions

Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency

Table of contents

1 Introduction

2 System Design


4 Evaluation

5 Conclusions




Conclusions


Overall design space




Conclusions


Writeback Cache Design Issues

1. Requires synchronous metadata updates for write I/Os,HDDs may not have the up-to-date blocksMust know the location of each block in case of failure

2. Reduces system resilience to failures,A failing SSD results in data lossSSDs are hidden, so other layers can’t handle these failures

� Our write-through design avoids these issues




Conclusions






Conclusions


Dynamic block differentiation

Blocks are not equally important to performance� Makes sense to differentiate during admission to SSD cache

Introduce a 2-Level Block Selection scheme (2LBS)

First level: Prioritize filesystem metadata over data� Many more small files → more FS metadata� Additional FS metadata introduced for data protection� Cannot rely on DRAM for effective metadata caching� Metadata requests represent 50% – 80% of total I/O accesses �

Second level: Prioritize between data blocks� Some data are accessed more frequently� Some data are used for faster accesses to other data

� D. Roselli and T. E. Anderson, ”A comparison of file system workloads”, Usenix ATC 2000




Conclusions


Two-level Block Selection

Modify XFS filesystem to tag FS metadata requests� Transparent metadata detection also possible

Keep in DRAM an estimate of each HDD block’s accesses� Static allocation: 256 MB DRAM required per TB of HDDs� DRAM space required is amortized with better performance� Dynamic allocation of counters reduces DRAM footprint




Conclusions


Cache Associativity

Associativity: performance and metadata footprint tradeoff

Higher-way associativities need more DRAM space for metadata

Direct-Mapped cache� Minimizes metadata requirements� Suffers from conflict misses

Fully-Set-Associative cache� 4.7× more metadata than the direct-mapped cache� Proper choice of replacement policy is important




Conclusions


Cache Associativity - Replacement policy

Large variety of replacement algorithms used in CPUs/DRAM� Prohibitively expensive in terms of metadata size� Assume knowledge of the workload I/O patterns� May cause up to 40% performance variance

We choose the LRU replacement policy� Good reference point for more sophisticated policies� Reasonable choice since buffer-cache uses LRU




Conclusions


I/O Concurrency

A high degree of I/O concurrency:

� Allows overlapping I/O with computation

� Effectively hides I/O latency

1 Allow concurrent read accesses on the same cache line� Track only pending I/O requests� Reader-writer locks per cache line are prohibitevely expensive

2 Hide SSD write I/Os of read misses� Copy the filled buffers to a new request� Introduces a memory copy� Must maintain state of pending I/Os




Conclusions

Experimental SetupBenchmarksExperimental Questions

Table of contents

1 Introduction

2 System Design


4 Evaluation

5 Conclusions




Conclusions


Experimental Setup

Dual socket, quad core Intel Xeon 5400 (64-bit)

Twelve 500GB SATA-II disks with write-through caching

Areca 1680D-IX-12 SAS/SATA RAID storage controller

Four 32GB Intel SLC SSDs (NAND Flash)

HDDs and SSDs on RAID-0 setup, 64KB chunks

Centos 5.5 OS, kernel version 2.6.18-194

XFS filesystem

64GB DRAM, varied by experiment




Conclusions


Benchmarks

I/O intensive workloads, between hours to days for each run

Type Properties File Set RAM SSD Cachesizes (GB)

TPC-H Datawarehouse

Read only 28GB 4GB 7,14,28

SPECsfs CIFS File-server

write-dominated,latency-sensitive

Up to2TB

32GB 128

TPC-C OLTPworkload

highly-concurrent

155GB 4GB 77.5




Conclusions


Experimental Questions

Which is the best static decision for handling I/O misses?

Does dynamically differentiating blocks improve performance?

How does cache associativity impact performance?

Can our design options cope with a ”black box” workload?




Conclusions

Static decision for handling I/O missesDynamic differentiation of blocksImportance of cache associativityA black box workload

Table of contents

1 Introduction

2 System Design


4 Evaluation

5 Conclusions




Conclusions


Static decision for I/O misses (SPECsfs2008)

Cache Write Policy

0

2000

4000

6000

8000

10000

12000

14000

16000

CIF

S o

ps/

sec

Native-12HDDswrite-hdd-ssd

Cache Write Policy

0

2000

4000

6000

8000

10000

12000

14000

16000

CIF

S o

ps/

sec

write-hdd-update

0 2000 4000 6000 8000 10000 12000 14000

CIFS ops/sec

0

5

10

15

20

Late

ncy

(m

sec/

op

)

Nativewrite-hdd-updatewrite-hdd-ssd

11% to 66% better performance than HDDs

Huge file set, only 30% accessed� write-hdd-ssd policy evicts useful blocks

Up to 5000 CIFS ops/sec difference for the same latency




Conclusions


Differentiating filesystem metadata (SPECsfs2008)

FS metadata continuously increase during execution

4 8 10 12 16 32

RAM (GBytes)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

CIF

S o

ps/

sec

Maximum Sustained Load

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

# M

etad

ata

Req

uests

# Metadata Requests

0 1000 2000 3000 4000 5000 6000 7000

CIFS ops/sec

0

5

10

15

20

25

Late

ncy

(m

sec/

op

)

Native-12HDDsBase Fully-Set-Associative2LBS Fully-Set-Associative

Metadata DRAM misses ⇒ up to 71% impactDRAM data hit ratio less than 5%3,000 more CIFS ops/sec between HDDs and Azor∼23% latency reduction when using 2LBS in Azor




Conclusions


Differentiating filesystem data blocks (TPC-H)

Filesystem data like indices important for databasesData differentiation improves performance

14 14 28 14 14 28

Cache size (GB)

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Sp

eed

up

to n

ati

ve

Base 14GB

2LBS 14GB

DM

FA

14 14 28 14 14 28

Cache size (GB)

0.0

12.5

25.0

37.5

50.0

62.5

75.0

87.5

100.0

Hit

Rati

o (

%)

Base 28GB

DM

FA

1.95× and 1.53× improvement for DM and FA cachesMedium size DM is 20% better than large size DM→ With 10% less hit ratio




Conclusions


Importance of cache associativity (TPC-H)

7 14 28 7 14 28

Cache size (GB)

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Sp

eed

up

to H

DD

s

7GB cache

14GB cache

DM

FA

7 14 28 7 14 28

Cache size (GB)

0.0

12.5

25.0

37.5

50.0

62.5

75.0

87.5

100.0

Hit

Rati

o (

%)

28GB cache

DM

FA

FA better than DM for all cache sizes� Large size FA = 1.36× better than DM counterpart� Up to 15% less conflict misses than DM� Medium size FA 32% better than large size DM




Conclusions


A black box workload (TPC-C)

We choose the best parameters found so far� Fully-set-associative cache design� SSD cache size of half the workload size

Cache Policy

0

250

500

750

1000

1250

1500

NO

TP

M

Native 12HDDs

Base

2LBS

Cache Policy

0.0

12.5

25.0

37.5

50.0

62.5

75.0

87.5

100.0

Hit

Rati

o(%

)

Base cache: 55% improvement to native2LBS cache: 34% additional improvementHit ratio remains the same in both versionsDisk utilization is 100%, SSD utilization under 7%




Conclusions

Table of contents

1 Introduction

2 System Design


4 Evaluation

5 Conclusions




Conclusions

Conclusions

We use SSD-based I/O caches to increase storage performance

Performance is improved with higher way associativities� At the cost of 4.7× higher metadata footprint

Write policy can make up to 50% performance difference

We explore differentiation of HDD blocks� According to their expected importance on system performance� Design and evaluation of a two-level block selection scheme

Overall, our work shows that differentiation of blocks is apromising technique for improving SSD-based I/O caches


azor: using two-level block selection to improve ssd-based ... · title: azor: using two-level...

Documents