1 supporting high performance i/o with effective caching and prefetching song jiang the ece...
Post on 20-Jan-2016
223 views
TRANSCRIPT
1
Supporting High Performance I/O with Effective Supporting High Performance I/O with Effective Caching and PrefetchingCaching and Prefetching
Song JiangSong Jiang
The ECE DepartmentThe ECE DepartmentWayne State UniversityWayne State University
22
Disk I/O performance is critical Disk I/O performance is critical to data-intensive applicationsto data-intensive applications
Data-intensive applications demand efficient I/O support.
Database;
Multimedia;
Scientific simulations;
Other desktop applications.
Hard disk is the most cost-effective online storage device.
Workstations;
Distributed systems;
Parallel systems.
33
0.3 0.37587,0000.9
1.2451,807
0.72
560,000
2.5
11.66
1,666,666
1.25
37.5
5,000,000
0500000
1000000
150000020000002500000300000035000004000000
45000005000000
CP
U C
ycle
s
1980 1985 1990 1995 2000
Year
Latencies of Cache, DRAM and Disk in CPU Cycles
SRAM Access Time DRAM Access Time Disk Seek Time
UnbalancedUnbalanced system improvements system improvements
Bryant and O’Hallaron, “Computer Systems: A Programmer’s Perspective”,
Prentice Hall, 2003.
The disks in 2000 are more than 57 times “SLOWER” than their ancestors in 1980.
44
Disk performance and disk arm seeks Disk performance and disk arm seeks
Disk performance is limited by mechanical constraints;
Seek
Rotation
Disk seek is the Achilles’ heel of disk access performance;
Access of sequential blocks is much faster.
55
An example: IBM Ultrastar 18ZX specification *An example: IBM Ultrastar 18ZX specification *
Seq. Read: 4,700 IO/s
Rand. Read:< 200 IO/s
* Taken from IBM “ULTRASTAR 9LZX/18ZX Hardware/Functional Specification” Version 2.4
Our goal: to increase the sequentiality
for a larger transfer rate.
6
8
40
60
0 20 40 60 80
Disk Accesstime
Peak disktransfer rate
Processorspeed
Annual growth rate (%)
Towards peak disk transfer rateTowards peak disk transfer rate
77
Random accesses are commonRandom accesses are common
Workloads on popular operating systems
UNIX: most accessed files are short in length (80% are smaller than 26 Kbytes) [Ousterhout 1991];
Windows NT: 40% of I/O operations are to files shorter than 2KBytes [Vogels 1999].
Scientific computing The SIO study: in many applications the majority of requests are for a small amount of data (less than a few Kbytes) [Reed 1997];
The CHARISMA study: large, regular data structures are distributed among processes with interleaved accesses of shared files [Kotz 1996].
88
Improving disk performance Improving disk performance by reducing random accesses by reducing random accesses
Two approaches:
Aggregate small random accesses into large sequential accesses;
Hide random accesses.
Software levels:
Applications: Some database systems;
Run-time systems: Collective-I/O;
Operating systems: TIP Informed prefetching;
Disk scheduler: CSCAN, SSTF.
My objective: improve access sequentiality in a general-purpose OS without any aid from programs.
99
Outline Outline
Inadequacy of current buffer cache management in OS;
Managing disk layout information;
Sequence-based and history-aware prefetching;
Miss-penalty aware caching;
Performance evaluation in a Linux implementation;
Related work;
Summary.
1010
Critical role of buffer cache in OSCritical role of buffer cache in OS
I/O Scheduler
Disk Driver
Application I/O Requests
disk
Buffer cache
(caching & prefetching)
Buffer cache can significantly influence request streams to disk.
Only unsatisfied requests are passed to disk;
Prefetching requests are generated to disk.
The opportunity for improving access sequentiality is not well exploited.
Buffer cache ignores disk layout information;
Buffer cache is an agent between two sides;
Only the behavior of one side (application) is considered.
The effectiveness of the I/O scheduler relies on the output of buffer cache.
1111
Problems of caching without disk layout information Problems of caching without disk layout information
Minimizing cache miss ratio by only exploiting temporal locality;
Avg. access time = (1-miss_ratilo)*hit_time + miss_ratio*miss_penalty
Small number of misses could mean long access time.
Should minimize access time by also considering on-disk data layout.
Distinguish sequentially and randomly accessed blocks;
Give “expensive” random blocks a high caching priority to suppress
future random accesses;
Disk accesses become more sequential.
Average access time =
(1-miss_ratio)*hit_time+miss_ratio*miss_penalty
Sequentially accessed blocks small miss penalty Randomly accessed blocks large miss penalty
X2
C
A
BD
X1X3X4
Disk Tracks
Hard Disk Drive
1212
Problems of prefetching without disk layout information Problems of prefetching without disk layout information
Conducting prefetching only on file abstraction. Working only inside individual file.
Performance constraints due to prefetching on logical file abstraction.
Sequential accesses spanning multiple small files cannot be detected.
Sequentiality at logic abstraction may not correspond to sequentiality on
physical disk.
Previously detected sequences of blocks are not remembered.
Metadata blocks cannot be prefetched.
Hard Disk Drive
X2X1
Y1Y2 Z1 Z2
File X
File Y
File Z
CD
AB
File R
Disk Tracks
Should conduct prefetching directly on data disk layout.
1313
Outline Outline
Inadequacy of current buffer cache Inadequacy of current buffer cache management in OS;management in OS;
Managing disk layout information;
Sequence-based and history-aware prefetching;
Miss-penalty aware caching;
Performance evaluation in a Linux implementation;
Related work;
Summary.
1414
Using Disk layout information Using Disk layout information
in buffer cache managementin buffer cache management What disk layout information to use?
Logical block number (LBN): logical disk geometry provided by firmware;
Accesses of contiguous LBNs have a performance close to accesses of
contiguous blocks on disk;
The LBN interface is highly portable across platforms.
How to efficiently manage the disk layout information?
Currently LBN is only used to identify disk locations for read/write;
To track access times of disk blocks and search for access sequences via LBNs;
Disk block table: a data structure for efficient disk block tracking.
1515
DiskDisk block table: block table: a new data structure for tracking disk blocksa new data structure for tracking disk blocks
= 7
1^
LBN : Block
N2N3
N1
N4
time1
Timestamps
time2
0
10
20
LBN: 5140 = 0*5122 + 10*512 + 20
7
7
718
71
8
= 8= 9
9
9
= 10
10
10
1616
DiskSeenDiskSeen: a two-side-oriented buffer cache scheme : a two-side-oriented buffer cache scheme
Prefetching
area
Buffer Cache
Caching
area
Destaging
area
Disk
Block transfers between areas
Asynchronous prefetching
On-demand read
Delayed write-back
1717
Outline Outline
Inadequacy of current buffer cache Inadequacy of current buffer cache management in OS;management in OS;
Managing disk layout information;Managing disk layout information;
Sequence-based and history-aware prefetching;
Miss-penalty aware caching;
Performance evaluation in a Linux implementation;
Related work;
Summary.
1818
Sequence-based prefetchingSequence-based prefetching
LBN
Timestamp
Current window
readahead window
Prefetching is based on LBN;
The algorithm is similar to the Linux 2.6.11 readahead algorithm:
When 32 contiguous blocks are accessed, create 16-block current window and 32-block readahead window;
The two windows are shifted forward while sequential accesses are detected.
1919
History trail
History-aware prefetchingHistory-aware prefetching
740006311143550
34950
63200435013500020001
85000 85001
… …
63110520004350020000
63110 6320043501
43500
632904351037000
N/A
85010
4351063290
4351522000
N/AN/A
8501143515
4352043521
43550
73560
23531N/A
6337051300
21600
712116430052405
N/AN/A
2353113540 43520
4352143550 ……
B1 B2 B3 B4 B5
B6B7B8B37
Current on-going trail
Prefetching:
•Spatial window size: 32 [ Block: 5 … (5+32) ]
•Temporal window size: 256 [ timestamp: 43515 … 43515+256 ]
2020
History-aware prefetching (Cont’d)History-aware prefetching (Cont’d)
LBN
Timestamp
Temporal window size
Spatial window sizereadahead window
current window
Two windows are shifted forward to mask disk times;
To set parameters for next readahead window:
Increase spatial window size if many prefetched blocks are requested.
Increase temporal window size if only a small number of blocks are prefetchable;
2121
Outline Outline
Inadequacy of current buffer cache Inadequacy of current buffer cache management in OS; management in OS;
Managing disk layout information;Managing disk layout information;
Disk-based and history-aware prefetching;Disk-based and history-aware prefetching;
Miss-penalty aware caching;
Performance evaluation in a Linux implementation;
Related work;
Summary.
2222
Miss-penalty-aware management Miss-penalty-aware management
of the destaging areaof the destaging area
On-demand queue
Prefetch queue
Two FIFO queues: on-demand blocks in on-demand queue, prefetched blocks in prefetch queue;
Memory allocation between the queues determines block replacement;
Benefit with addition of a segment = miss penalty in the ghost segment.
Ghost segment
2323
Summary: the DiskSeen schemeSummary: the DiskSeen scheme
Prefetching
area
Buffer Cache
Caching
area
Destaging
area
Using block table, disk layout information is easily examined;
Prefetching works directly on disk layout and is history-aware;
Block replacement is miss-penalty aware.
2424
Outline Outline
Inadequacy of current buffer cache Inadequacy of current buffer cache management in OS; management in OS;
Managing disk layout information;Managing disk layout information;
Disk-based and history-aware prefetching;Disk-based and history-aware prefetching;
Miss-penalty aware caching;Miss-penalty aware caching;
Performance evaluation in a Linux implementation;
Related work;
Future work.
2525
The DiskSeen prototype in Linux 2.6.11The DiskSeen prototype in Linux 2.6.11
Prefetch blocks directly on block device;
Blocks without disk mappings are treated as random blocks;
The size of segment for memory allocation is 128KB;
Intel P4 3.0GHz processor, a 512MB memory, and Western Digital hard disk of 7200 RPM and 160GB;
The file system is Ext3.
2626
BenchmarksBenchmarks
grep: search a collection of files for lines containing a match to a given regular expression.
CVS: a versioning control system;
TPC-H: a decision support benchmark;
BLAST: a tool searching multiple databases to match nucleotide or protein sequences.
2727
Execution times of benchmarksExecution times of benchmarks
0
10
20
30
40
50
60
70
80
90
100
grep CVS TPC-H BLAST
Exe
cuti
on
tim
e (S
ec)
Linux 2.6.11 Disk-based prefetch DiskSeen
25%74%
DiskSeen (1st run) DiskSeen (2nd run)
2828
Disk head movements (CVS)Disk head movements (CVS)
W/O DiskSeen
With DiskSeen
Working directory
CVS repository
Prefetching
2929
Disk head movement (grep) Disk head movement (grep)
W/O DiskSeen
With DiskSeen
3030
Disk head movement (grep) Disk head movement (grep)
W/O DiskSeen
with DiskSeen
Inode blocks
data blocks
3131
Disk head movement (grep) Disk head movement (grep)
W/O DiskSeen
with DiskSeen
Prefetching
3232
Outline Outline
Inadequacy of current buffer cache Inadequacy of current buffer cache management in OS; management in OS;
Managing disk layout information;Managing disk layout information;
Disk-based and history-aware prefetching;Disk-based and history-aware prefetching;
Miss-penalty aware caching;Miss-penalty aware caching;
Performance evaluation in a Linux Performance evaluation in a Linux implementation;implementation;
Related work;
Summary.
3333
Related work Related work
Caching Numerous replacement algorithms have been proposed:
LRU, FBR [Robinson, 90], LRU-K [O’Neil, 93], 2Q [Johnson, 94], LRFU [Lee, 99],EELRU [Smaragdakis, 99], MQ [Zhou, 01], LIRS [Jiang, 02], ARC [Modha, 03], CLOCK-pro [JIang, 05] …..
They all aim to only minimize the number of misses, instead of disk times due to misses.
3434
Related work Related work (Con’t) (Con’t)
Prefetching
Based on complex models
probability graph model [Vellanki, 99]
Information-theoretic Lempel-Ziv algorithm [Curewitz, 93]
Time series model [Tran, 01]
Commercial systems have rarely used sophisticated prediction schemes.
3535
Related work Related work (Con’t)(Con’t)
Prefetching
Hint-based prefetching ([Patterson 95, Cao 96, Brown 01])
Across-file prefetching ([Griffioen 94, Kroeger 96, Kroeger 01])
Using file as prefetching unit;
High overhead and inconsistent performance;
Unsuitable for general-purpose OS;
Applicable only in application-level prefetching such as web proxy/server file prefetching [Chen 03].
3636
Related work Related work (Con’t) (Con’t)
Improving disk placement Cluster related data and metadata into the same cylinder group [Mckusick 84, Ganger 97];
Reorganize disk block placement [Black 91, Hsu 03];
Dynamically create block replicas [Huang 05];
DiskSeen is more flexible and responsive to changing access patterns.
3737
SummarySummary
Disk performance is constrained by random accesses;
The buffer cache plays a critical role in reducing random accesses;
Missing data disk layout information in buffer cache limits its performance;
DiskSeen improves disk performance by considering both temporal and spatial locality.
A Linux kernel implementation shows significant performance improvements for various types of workloads.