Download - Differentiated Storage Services
1
Differentiated Storage Services
Michael Mesnier, Jason Akers, Feng ChenIntel Corporation
Tian LuoThe Ohio State University
23rd ACM Symposium on Operating Systems Principles (SOSP)
October 23-26, 2011, Cascais, Portugal
2
An analogy: moving & shipping
Why should computer storage be any different?
Technology overview
Classification Policy assignment Policy enforcement
3
Differentiated Storage Services
(offline)
Classifier QoS Policy
Metadata Low latency
Boot files Low latency
Small files High throughput
Media files High bandwidth
… …
Computer system
Operating system
Applications or DB
File system
I/O Classification
I/O Classification
I/O Classification
Storage system
Management firmware
Storage controller
QoS Policies
QoS Mechanisms
StoragePool A
StoragePool B
StoragePool C
= Current & future research
Technology overview
Classification Policy assignment Policy enforcement
Classify each I/O in-band
4
The SCSI CDB
5 bits 32 classes
5
Motivation: disk caching with SSDs Universal challenges in the industry
– Keeping the right data cached– Avoiding thrash under cache pressure
Conventional approaches– Cache bypass for large/sequential requests– Evict cold data (LRU commonly used)
How I/O classification can help– Identify cacheable I/O classes– Assign relative caching priorities
Technology overview
6
Filesystem prototypes (Ext3 & NTFS)
Classify each I/O in-band
Classifier Cache priority
Metadata 0
Journal 0
Directories 0
Files <= 4KB 1
Files <=16KB 2
Files <=64KB 3
… …Files > GB Lowest
Computer system
Operating system
Applications or DB
File system
I/O Classification
I/O Classification
I/O Classification
Storage system
Management firmware
Storage controller
QoS Policies
QoS Mechanisms
= Current & future research
Technology overview
FS classification FS policy assignment FS policy enforcement
Disk SSD
7
Classifier Cache priority
System tables 0Temp. tables (on write) 1
Randomly tables 2Temp. tables (on read) 3
Sequential tables BypassIndex files Bypass
Database prototype (PostgreSQL)
Classify each I/O in-band
Computer system
Operating system
Applications or DB
File system
I/O Classification
I/O Classification
I/O Classification
Storage system
Management firmware
Storage controller
QoS Policies
QoS Mechanisms
= Current & future research
Technology overview
DB classification DB policy assignment DB policy enforcement
Disk SSD
8
Selective cache algorithms Selective allocation
– Always allocate high-priority classes– E.g. FS metadata and DB system tables always allocated
– Conditionally allocate low-priority classes– Depends on cache pressure, cache contents, etc.– High/low cutoff is a tunable parameter
Selective eviction– Evict in priority order (lowest priority first)
– E.g., temporary DB tables evicted system tables– Trivially implemented by managing one LRU per class
Technology overview
9
Technology development
10
Ext3 prototype OS changes (block layer)
– Add classifier to I/O requests– Only coalesce like-class requests– Copy classifier into SCSI CDB
Ext3 changes– 18 classes identified – Optimized for a file server
Small files & metadata A small kernel patch A one-time change to the FS
Ext3 Class
Group Number
Cache priority
Unclassified 0 12Superblock 1 0Group desc. 2 0
Bitmap 3 0Inode 4 0
Indirect block 5 0Directories 6 0
Journal 7 0File <= 4KB 8 1
File <= 16KB 9 2File <= 64KB 10 3
… … …File > 1GB 18 11
Technology development
11
Ext3 classification illustrated echo ‘Hello, world!’ >> foo; sync
– READ_10(lba 231495 len 8 grp 9) <=4KB– WRITE_10(lba 231495 len 8 grp 9) <=4KB– WRITE_10(lba 16519223 len 8 grp 8) Journal– WRITE_10(lba 16519231 len 8 grp 8) Journal– WRITE_10(lba 16519239 len 8 grp 8) Journal– WRITE_10(lba 16519247 len 8 grp 8) Journal– WRITE_10(lba 8279 len 8 grp 5) Inode
7 I/Os (28KB) to write 13 bytes– Metadata accounts for most of the overhead
I/O classification shows read-modify-write and
metadata updates
Technology development
NTFS classification is implementedwith Windows filter drivers
12
PostgreSQL prototype Classification API: scatter/gather I/O
OS changes (block layer)– Add O_CLASSIFIED file flag– Extract classifier from SG I/O
A small OS & DB patch A one-time change to the OS & DB
PostgreSQL class
Group Number
Unclassified 0Transaction log 19System table 20
Free space map 21Temporary table 22Random table 23
Sequential table 24Index file 25Reserved 26-31
fd=open("foo", O_RDWR|O_CLASSIFIED, 0666); class = 19;myiov[0].iov_base = &class;myiov[0].iov_len = 1;myiov[1].iov_base = “Hello, world!”;myiov[1].iov_len = 13;writev(fd, myiov, 2);
Preliminary DB classes
Technology development
13
Cache implementations Fully associative read/write LRU cache
– Insert(), Lookup(), Delete(), etc.– Hash table maps disk LBA to SSD LBA– Syncer daemon asynchronously cleans cache
Monitors cache pressure for selective allocateMaintains multiple LRU lists for selective evict
Front-ends: iSCSI (OS independent) and Linux MD MD cache module (RAID-9)
Technology development
Striping: mdadm –create /dev/md0 –level=0 –raid-devices=2 /dev/sdd /dev/sdeMirroring: mdadm –create /dev/md0 –level=1 –raid-devices=2 /dev/sdd /dev/sde RAID-9: mdadm –create /dev/md0 –level=9 –raid-devices=2 <cache> <base
14
Evaluation
15
Experimental setup Host OS (Xeon, 2-way, quad-core, 12GB RAM)
– Linux 2.6.34 (patched as described) Target storage system
– HW RAID array + X25-E cache Workloads and cache sizes
– SPECsfs: 18GB (10% of 184GB working set)– TPC-H: 8GB (28% of 29GB working set)
Comparison– LRU versus LRU-S (LRU with selective caching)
Evaluation
16
SPECsfs I/O breakdown
Large files pollute LRU cache(metadata and small files evicted)
LRU
LRU-S fences off large file I/O
LRU-S
17
SPECsfs performance metrics
Syncer overhead
LRU-SLRU
LRU LRU-S
I/O Throughput
LRU LRU-S
Hit rate
LRU LRU-SHDD
Running time
1.8x speedup
18
SPECsfs file latencies
LRULRU-S
Reduction in write latency over HDD
LRU suffers from write outliers(from eviction overheads)
LRULRU-S
Reduction in read latency over HDD
LRU-S reduces read latency(most small files are cached)
LRULRU-S
19
TPC-H I/O breakdown
Indexes pollute LRU cache(user tables evicted)
LRU
LRU-S fences off index files
LRU-S
20
TPC-H performance metrics
Syncer overhead I/O Throughput
LRU-SLRU
LRU LRU
LRU
LRU-S LRU-S
LRU-S
HDD
Running timeHit rate
1.2x speedup
Intel Confidential
21
Conclusion & future work Intelligent caching is just the beginning
– Other types of performance differentiation– Security, reliability, retention, …
Other applications we’re looking at – Databases– Hypervisors– Cloud storage– Big Data (NoSQL DB)
Work already underway in T10 Open source coming soon…
Thank you!
Questions?