azor: using two-level block selection to improve ssd-based ... · title: azor: using two-level...
TRANSCRIPT
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Azor: Using Two-level Block Selectionto Improve SSD-based I/O caches
Yannis Klonatos, Thanos Makatos, Manolis Marazakis,Michail D. Flouris, Angelos Bilas
{klonatos, makatos, maraz, flouris, bilas}@ics.forth.gr
Foundation for Research and Technology - Hellas (FORTH),Institute of Computer Science (ICS)
July 23, 2011
Yannis Klonatos et al. FORTH-ICS, Greece 1 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
BackgroundPrevious WorkOur goal
Table of contents
1 Introduction
2 System Design
3 Experimental Platform
4 Evaluation
5 Conclusions
Yannis Klonatos et al. FORTH-ICS, Greece 2 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
BackgroundPrevious WorkOur goal
Background
Increased need for high-performance storage I/O1. Larger file-set sizes ⇒ more I/O time2. Server virtualization and consolidation ⇒ more I/O pressure
SSDs can mitigate I/O penalties
SSD HDD
Throughput (R/W) (MB/s) 277/202 100/90Response time (ms) 0.17 12.6IOPS (R/W) 30,000/3,500 150/150Price/capacity ($/GB) $3 $0.3Capacity per device 32 – 120 GB Up to 3TB
Mixed SSD and HDD environments are necessary
Cost-effectiveness: deploy SSDs as HDDs caches
Yannis Klonatos et al. FORTH-ICS, Greece 3 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
BackgroundPrevious WorkOur goal
Previous Work
Web servers as a secondary file cache[Kgil et al., 2006 ]� Requires application knowledge and intervention
Readyboost feature in Windows� Static file preloading� Requires user interaction
bcache module in the Linux Kernel� Has no admission control
NetApp’s Performance Acceleration Module� Needs specialized hardware
Yannis Klonatos et al. FORTH-ICS, Greece 4 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
BackgroundPrevious WorkOur goal
Our goal
Design Azor, a transparent SSD cache� Move SSD caching to block-level� Hide the address space of SSDs
Thorough analysis of design parameters1. Dynamic differentiation of blocks2. Cache associativity3. I/O concurrency
Yannis Klonatos et al. FORTH-ICS, Greece 5 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Table of contents
1 Introduction
2 System Design
3 Experimental Platform
4 Evaluation
5 Conclusions
Yannis Klonatos et al. FORTH-ICS, Greece 6 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Overall design space
Yannis Klonatos et al. FORTH-ICS, Greece 7 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Writeback Cache Design Issues
1. Requires synchronous metadata updates for write I/Os,HDDs may not have the up-to-date blocksMust know the location of each block in case of failure
2. Reduces system resilience to failures,A failing SSD results in data lossSSDs are hidden, so other layers can’t handle these failures
� Our write-through design avoids these issues
Yannis Klonatos et al. FORTH-ICS, Greece 8 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Overall design space
Yannis Klonatos et al. FORTH-ICS, Greece 9 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Overall design space
Yannis Klonatos et al. FORTH-ICS, Greece 10 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Overall design space
Yannis Klonatos et al. FORTH-ICS, Greece 11 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Dynamic block differentiation
Blocks are not equally important to performance� Makes sense to differentiate during admission to SSD cache
Introduce a 2-Level Block Selection scheme (2LBS)
First level: Prioritize filesystem metadata over data� Many more small files → more FS metadata� Additional FS metadata introduced for data protection� Cannot rely on DRAM for effective metadata caching� Metadata requests represent 50% – 80% of total I/O accesses �
Second level: Prioritize between data blocks� Some data are accessed more frequently� Some data are used for faster accesses to other data
� D. Roselli and T. E. Anderson, ”A comparison of file system workloads”, Usenix ATC 2000
Yannis Klonatos et al. FORTH-ICS, Greece 12 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Two-level Block Selection
Modify XFS filesystem to tag FS metadata requests� Transparent metadata detection also possible
Keep in DRAM an estimate of each HDD block’s accesses� Static allocation: 256 MB DRAM required per TB of HDDs� DRAM space required is amortized with better performance� Dynamic allocation of counters reduces DRAM footprint
Yannis Klonatos et al. FORTH-ICS, Greece 13 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Cache Associativity
Associativity: performance and metadata footprint tradeoff
Higher-way associativities need more DRAM space for metadata
Direct-Mapped cache� Minimizes metadata requirements� Suffers from conflict misses
Fully-Set-Associative cache� 4.7× more metadata than the direct-mapped cache� Proper choice of replacement policy is important
Yannis Klonatos et al. FORTH-ICS, Greece 14 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
Cache Associativity - Replacement policy
Large variety of replacement algorithms used in CPUs/DRAM� Prohibitively expensive in terms of metadata size� Assume knowledge of the workload I/O patterns� May cause up to 40% performance variance
We choose the LRU replacement policy� Good reference point for more sophisticated policies� Reasonable choice since buffer-cache uses LRU
Yannis Klonatos et al. FORTH-ICS, Greece 15 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Overall design spaceDynamic block differentiationCache AssociativityI/O Concurrency
I/O Concurrency
A high degree of I/O concurrency:
� Allows overlapping I/O with computation
� Effectively hides I/O latency
1 Allow concurrent read accesses on the same cache line� Track only pending I/O requests� Reader-writer locks per cache line are prohibitevely expensive
2 Hide SSD write I/Os of read misses� Copy the filled buffers to a new request� Introduces a memory copy� Must maintain state of pending I/Os
Yannis Klonatos et al. FORTH-ICS, Greece 16 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Experimental SetupBenchmarksExperimental Questions
Table of contents
1 Introduction
2 System Design
3 Experimental Platform
4 Evaluation
5 Conclusions
Yannis Klonatos et al. FORTH-ICS, Greece 17 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Experimental SetupBenchmarksExperimental Questions
Experimental Setup
Dual socket, quad core Intel Xeon 5400 (64-bit)
Twelve 500GB SATA-II disks with write-through caching
Areca 1680D-IX-12 SAS/SATA RAID storage controller
Four 32GB Intel SLC SSDs (NAND Flash)
HDDs and SSDs on RAID-0 setup, 64KB chunks
Centos 5.5 OS, kernel version 2.6.18-194
XFS filesystem
64GB DRAM, varied by experiment
Yannis Klonatos et al. FORTH-ICS, Greece 18 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Experimental SetupBenchmarksExperimental Questions
Benchmarks
I/O intensive workloads, between hours to days for each run
Type Properties File Set RAM SSD Cachesizes (GB)
TPC-H Datawarehouse
Read only 28GB 4GB 7,14,28
SPECsfs CIFS File-server
write-dominated,latency-sensitive
Up to2TB
32GB 128
TPC-C OLTPworkload
highly-concurrent
155GB 4GB 77.5
Yannis Klonatos et al. FORTH-ICS, Greece 19 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Experimental SetupBenchmarksExperimental Questions
Experimental Questions
Which is the best static decision for handling I/O misses?
Does dynamically differentiating blocks improve performance?
How does cache associativity impact performance?
Can our design options cope with a ”black box” workload?
Yannis Klonatos et al. FORTH-ICS, Greece 20 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Static decision for handling I/O missesDynamic differentiation of blocksImportance of cache associativityA black box workload
Table of contents
1 Introduction
2 System Design
3 Experimental Platform
4 Evaluation
5 Conclusions
Yannis Klonatos et al. FORTH-ICS, Greece 21 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Static decision for handling I/O missesDynamic differentiation of blocksImportance of cache associativityA black box workload
Static decision for I/O misses (SPECsfs2008)
Cache Write Policy
0
2000
4000
6000
8000
10000
12000
14000
16000
CIF
S o
ps/
sec
Native-12HDDswrite-hdd-ssd
Cache Write Policy
0
2000
4000
6000
8000
10000
12000
14000
16000
CIF
S o
ps/
sec
write-hdd-update
0 2000 4000 6000 8000 10000 12000 14000
CIFS ops/sec
0
5
10
15
20
Late
ncy
(m
sec/
op
)
Nativewrite-hdd-updatewrite-hdd-ssd
11% to 66% better performance than HDDs
Huge file set, only 30% accessed� write-hdd-ssd policy evicts useful blocks
Up to 5000 CIFS ops/sec difference for the same latency
Yannis Klonatos et al. FORTH-ICS, Greece 22 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Static decision for handling I/O missesDynamic differentiation of blocksImportance of cache associativityA black box workload
Differentiating filesystem metadata (SPECsfs2008)
FS metadata continuously increase during execution
4 8 10 12 16 32
RAM (GBytes)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
CIF
S o
ps/
sec
Maximum Sustained Load
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
550000
600000
# M
etad
ata
Req
uests
# Metadata Requests
0 1000 2000 3000 4000 5000 6000 7000
CIFS ops/sec
0
5
10
15
20
25
Late
ncy
(m
sec/
op
)
Native-12HDDsBase Fully-Set-Associative2LBS Fully-Set-Associative
Metadata DRAM misses ⇒ up to 71% impactDRAM data hit ratio less than 5%3,000 more CIFS ops/sec between HDDs and Azor∼23% latency reduction when using 2LBS in Azor
Yannis Klonatos et al. FORTH-ICS, Greece 23 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Static decision for handling I/O missesDynamic differentiation of blocksImportance of cache associativityA black box workload
Differentiating filesystem data blocks (TPC-H)
Filesystem data like indices important for databasesData differentiation improves performance
14 14 28 14 14 28
Cache size (GB)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Sp
eed
up
to n
ati
ve
Base 14GB
2LBS 14GB
DM
FA
14 14 28 14 14 28
Cache size (GB)
0.0
12.5
25.0
37.5
50.0
62.5
75.0
87.5
100.0
Hit
Rati
o (
%)
Base 28GB
DM
FA
1.95× and 1.53× improvement for DM and FA cachesMedium size DM is 20% better than large size DM→ With 10% less hit ratio
Yannis Klonatos et al. FORTH-ICS, Greece 24 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Static decision for handling I/O missesDynamic differentiation of blocksImportance of cache associativityA black box workload
Importance of cache associativity (TPC-H)
7 14 28 7 14 28
Cache size (GB)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Sp
eed
up
to H
DD
s
7GB cache
14GB cache
DM
FA
7 14 28 7 14 28
Cache size (GB)
0.0
12.5
25.0
37.5
50.0
62.5
75.0
87.5
100.0
Hit
Rati
o (
%)
28GB cache
DM
FA
FA better than DM for all cache sizes� Large size FA = 1.36× better than DM counterpart� Up to 15% less conflict misses than DM� Medium size FA 32% better than large size DM
Yannis Klonatos et al. FORTH-ICS, Greece 25 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Static decision for handling I/O missesDynamic differentiation of blocksImportance of cache associativityA black box workload
A black box workload (TPC-C)
We choose the best parameters found so far� Fully-set-associative cache design� SSD cache size of half the workload size
Cache Policy
0
250
500
750
1000
1250
1500
NO
TP
M
Native 12HDDs
Base
2LBS
Cache Policy
0.0
12.5
25.0
37.5
50.0
62.5
75.0
87.5
100.0
Hit
Rati
o(%
)
Base cache: 55% improvement to native2LBS cache: 34% additional improvementHit ratio remains the same in both versionsDisk utilization is 100%, SSD utilization under 7%
Yannis Klonatos et al. FORTH-ICS, Greece 26 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Table of contents
1 Introduction
2 System Design
3 Experimental Platform
4 Evaluation
5 Conclusions
Yannis Klonatos et al. FORTH-ICS, Greece 27 / 29
IntroductionSystem Design
Experimental PlatformEvaluation
Conclusions
Conclusions
We use SSD-based I/O caches to increase storage performance
Performance is improved with higher way associativities� At the cost of 4.7× higher metadata footprint
Write policy can make up to 50% performance difference
We explore differentiation of HDD blocks� According to their expected importance on system performance� Design and evaluation of a two-level block selection scheme
Overall, our work shows that differentiation of blocks is apromising technique for improving SSD-based I/O caches
Yannis Klonatos et al. FORTH-ICS, Greece 28 / 29