Download - Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek
![Page 1: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/1.jpg)
Conquest: Preparing forLife After Disks
An-I Andy Wang
Geoff Kuenning, Peter Reiher, Gerald Popek
![Page 2: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/2.jpg)
2
Conquest Overview File systems are optimized for disks
Performance problem Complexity
Now we have tons of inexpensive RAM What can we do with that RAM?
![Page 3: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/3.jpg)
3
Conquest Approach Combine disk and persistent RAM (e.g.,
battery-backed RAM) in a novel way Simplification
> 20% fewer semicolons than ext2, reiserfs, and SGI XFS
Performance (under popular benchmarks) 24% to 1900% faster than LRU disk caching
![Page 4: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/4.jpg)
4
Outline of the Talk Motivation Conquest design (high level) Conquest components Performance evaluation Conclusion
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 5: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/5.jpg)
5
Motivation Most file systems are built for disks
Problems with the disk assumption: Performance Complexity
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 6: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/6.jpg)
6
Hardware Evolution
1990 2000
1 KHz
1 MHz
1 GHzCPU (50% /yr)memory (50% /yr)
disk (15% /yr)
accessespersecond(log scale)
105106
1995(1 sec : 6 days) (1 sec : 3 months)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 7: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/7.jpg)
7
Inside Pandora’s Box
Disk arm Disk platters
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Access time = seek time (disk arm)
+ rotational delay (disk platter)
+ transfer time
![Page 8: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/8.jpg)
8
Disk Optimization Methods Disk arm scheduling Group information on
disk Disk readahead Buffered writes Disk caching
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Data mirroring Hardware parallelism
![Page 9: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/9.jpg)
9
Complexity Bytes
synchronization
predictive readahead
cache replacement
elevator algorithm
data clusteringdata consistencyasynchronous write
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 10: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/10.jpg)
[Caceres et al., 1993; Hillyer et al., 1996; Qualstar 1998; Tanisys 1999; Micron Semiconductor Products 2000; Quantum 2000]
10
Storage Media Alternatives
accesses/sec (log scale)
$/MB (log scale)
100 103
persistent RAM
magnetic RAM?
(write once) flash memorydisktape
battery-backed DRAM10-3
10-3 106
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 11: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/11.jpg)
[Grochowski 2000] 11
Price Trend of Persistent RAM
1995 2005
100
year
$/MB(log scale)
2000
10-2
10-1
101
102
paper/film
3.5" HDD2.5" HDD1" HDDpersistent RAM
booming of digitalphotography
4 to 10 GB of persistent RAM
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 12: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/12.jpg)
12
Old Order; New World Disk will stay around
Cost, capacity, power, heat RAM as a viable storage alternative
PDAs, digital cameras, MP3 players More architectural changes due to RAM
A big assumption change from disk Rethink data structures, interfaces,
applications
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 13: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/13.jpg)
13
What does it take to design and build a system that assumes ample persistent RAM as the primary storage medium?
Getting a Fresh Start
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 14: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/14.jpg)
14
Conquest Design Design and build a disk/persistent-RAM
hybrid file system Deliver all file system services from memory,
with the exception of high-capacity storage Two separate data paths to memory and disk Benefits:
Simplicity Performance
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 15: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/15.jpg)
15
Simplicity Remove disk-related complexities for most
files Make things simpler for disk as well Less complexity
Fewer bugs Easier maintenance Shorter data paths
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 16: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/16.jpg)
16
Overall All management performed in memory
Memory data path No disk-related overhead
Disk data path Faster speed due to simpler access models
Performance
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 17: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/17.jpg)
17
Conquest Components Media management Metadata representation Directory service Allocation service Persistence support Resiliency support
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 18: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/18.jpg)
[Iram 1993; Douceur et al., 1999; Roselli et al., 2000] 18
User Access Patterns Small files
Take little space (10%) Represent most accesses (90%)
Large files Take most space Mostly sequential accesses
Not characteristic of database applications
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 19: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/19.jpg)
19
Files Stored in Persistent RAM Small files (< 1MB)
No seek time or rotational delays Fast byte-level accesses Contiguous allocation
Metadata Fast synchronous update No dual representations
Executables and shared libraries In-place execution
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 20: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/20.jpg)
20
Memory Data Path of Conquest
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Conventional File Systems
IO buffer
disk management
storage requests
IO buffermanagement
disk
persistencesupport
Conquest Memory Data Path
storage requests
persistencesupport
battery-backedRAM
small file and metadata storage
![Page 21: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/21.jpg)
[Devlinux.com 2000] 21
Large-File-Only Disk Storage Allocate in big chunks
Lower access overhead Reduced management overhead
No fragmentation management No tricks for small files
Storing data in metadata No elaborate data structures
Wrapping a balanced tree onto disk cylinders
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 22: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/22.jpg)
22
Sequential-Access Large Files Sequential disk accesses
Near-raw bandwidth Well-defined readahead semantics Read-mostly
Little synchronization overhead (between memory and disk)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 23: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/23.jpg)
23
Disk Data Path of Conquest
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Conventional File Systems
IO buffer
disk management
storage requests
IO buffermanagement
disk
persistencesupport
Conquest Disk Data Path
IO buffermanagement
IO buffer
storage requests
disk management
disk
battery-backedRAM
small file and metadata storage
large-file-only file system
![Page 24: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/24.jpg)
24
Random-Access Large Files Random access?
Common definition: nonsequential access A typical movie has 150 scene changes MP3 stores the title at the end of the files
Near sequential access? Simplifies large-file metadata representation
significantly
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 25: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/25.jpg)
25
Logical File Representation
File
Name(s) i-node File attributes
Data
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 26: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/26.jpg)
26
Physical File Representation
File
Name(s) i-node File attributes Data locations
Data blocks
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 27: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/27.jpg)
27
Ext2 Data Representation
data block location
index block location
index block location
index block location
data block location
index block location
index block location
data block location
data block location
i-node
12
data block location
data block locationdata block location
data block location
index block location
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 28: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/28.jpg)
28
Disadvantages with Ext2 Design Designed for disk storage Optimization for small files makes things
complex Random-access data structure for large files
that are accessed mostly sequentially Data access time dependent on the byte
position in a file Maximum file size is limited
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 29: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/29.jpg)
29
Conquest Representation Persistent RAM
Hash(file name) = location of data Offset(location of data)
Disk storage Per-file, doubly linked list of disk block
segments (stored in persistent RAM)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 30: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/30.jpg)
30
Advantages Conquest Design Direct data access for in-core files Worse case: sequential memory search for
random disk locations Maximum file size limited by physical storage
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 31: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/31.jpg)
31
Directory Service Requirements
Fast sequential traversal (e.g., ls) Fast random lookup (e.g., locate file x) Hard links (apply multiple names to data)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 32: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/32.jpg)
32
First Design A doubly hashed table for each directory
Conserves space Problems:
Dynamic resizing of directories Need to handle the current file position Important for rm -fr
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 33: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/33.jpg)
[Fagin et al., 1979] 33
Second Design A variant of extensible hash table for each
directory An old data structure fits nicely
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
empty
empty
0100 | file_1
1001 | file_2
empty
empty0100 | file1
1001 | file2
empty
0011 | dir1
1110 | file2_hardlink
![Page 34: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/34.jpg)
34
Additional Engineering Details Popular hash functions randomize lower bits Dynamic file positioning Need to handle collisions Memory overhead and complexity tradeoffs
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 35: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/35.jpg)
35
Metadata Allocation Requirements
Keep track of usage status of metadata entries
Avoid duplicate allocation with unique IDs
Fast retrieval of metadata with a given ID
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
ID: 1| free
ID: 2| in use
ID: 3| free
ID: 4| free
ID: 5| in use
ID: 6| free
![Page 36: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/36.jpg)
36
Existing Memory Allocation Services
Keep track of unallocated memory
No duplicate allocation of physical addresses
Hmm…
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
ADDR 0xe000000| free
ADDR 0xe000038| in use
ADDR 0xe000070| free
ADDR 0xe0000A8| free
ADDR 0xe0000E0| free
ADDR 0xe000118| in use
![Page 37: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/37.jpg)
37
Conquest Metadata Management Metadata = memory allocated by memory
manager Metadata ID = physical address of metadata
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
ID: 1| free
ID: 2| in use
ID: 3| free
ID: 4| free
ID: 5| in use
ID: 6| free
ADDR 0xe000000| free
ADDR 0xe000038| in use
ADDR 0xe000070| free
ADDR 0xe0000A8| free
ADDR 0xe0000E0| free
ADDR 0xe000118| in use
Usage status
Unique IDs and fast retrieval
![Page 38: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/38.jpg)
38
Persistence Support Restore file system states after a reboot
Data Metadata Memory manager
Keep track of metadata allocation
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 39: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/39.jpg)
39
Linux Memory Manager (1) Page allocator maintains individual pages
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Page allocator
![Page 40: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/40.jpg)
40
Linux Memory Manager (2) Zone allocator allocates memory in power-of-
two sizes
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Page allocator
Zone allocator
![Page 41: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/41.jpg)
41
Linux Memory Manager (3) Slab allocator groups allocations by sizes to
reduce internal memory fragmentation
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Page allocator
Zone allocator
Slab allocator
![Page 42: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/42.jpg)
42
Linux Memory Manager (4) Difficult to restore the persistent states
Three layers of pointer-rich mappings Mixing of persistent and temporary allocations
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Page allocator
Slab allocator
Zone allocator
![Page 43: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/43.jpg)
43
Conquest Persistence Create memory zones with own instantiations
of memory managers
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Page allocator
Slab allocator
Zone allocator
![Page 44: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/44.jpg)
44
Conquest Persistence Encapsulate all pointers within each zone Pointers can survive reboots No serialization and deserialization Swapping and paging
Disabled for Conquest memory zones Enabled for non-Conquest zones
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 45: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/45.jpg)
45
Resiliency Support Instantaneous metadata commit
No fsck (ad hoc metadata consistency check) Built-in checkpointing Pointer-switch commit semantics
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
pointerpointer
![Page 46: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/46.jpg)
46
Implementation Status Kernel module under Linux 2.4.2 Fully functional and POSIX compliant Modified memory manager to support
Conquest persistence Need to overcome BIOS limitations for
distribution Looking for licensing opportunities
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 47: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/47.jpg)
47
Performance Evaluation Architectural simplification
Feature count Performance improvement
Memory-only workload Memory and disk workload
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 48: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/48.jpg)
48
Conventional Data Path Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management
Conventional File Systems
IO buffer
disk management
storage requests
IO buffermanagement
disk
persistencesupport
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 49: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/49.jpg)
49
Memory Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management
Conquest Memory Data Path
storage requests
Persistencesupport
battery-backedRAM
small file and metadata storage
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Memory manager encapsulation
![Page 50: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/50.jpg)
50
Disk Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management
Conquest Disk Data Path
IO buffermanagement
IO buffer
storage requests
disk management
disk
battery-backedRAM
small file and metadata storage
large-file-only file system
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 51: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/51.jpg)
[Katcher 1997; Sweeney et al., 1996; Card et al., 1999; Namesys 2002] 51
Conquest is comparable to ramfs At least 24% faster than the LRU disk cache
ISP workload (emails, web-based transactions)
PostMark Benchmark (1)
0100020003000400050006000700080009000
5000 10000 15000 20000 25000 30000
files
trans / sec
SGI XFS reiserfs ext2fs ramfs Conquest
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
40 to 250 MB working set with 2 GB physical RAM
![Page 52: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/52.jpg)
52
0
1000
2000
3000
4000
5000
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
percentage of large files
trans / sec
SGI XFS reiserfs ext2fs Conquest
When both memory and disk components are exercised, Conquest can be several times faster than ext2fs, reiserfs, and SGI XFS
PostMark Benchmark (2)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
10,000 files,80 MB to 3.5 GB working setwith 2 GB physical RAM
> RAM<= RAM
![Page 53: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/53.jpg)
53
When working set > RAM, Conquest is 1.4 to 2 times faster than ext2fs, reiserfs, and SGI XFS
PostMark Benchmark (3)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
0
20
40
60
80
100
120
6.0 7.0 8.0 9.0 10.0
percentage of large files
trans / sec
SGI XFS reiserfs ext2fs Conquest
10,000 files,80 MB to 3.5 GB working setwith 2 GB physical RAM
![Page 54: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/54.jpg)
54
Sprite LFS Microbenchmarks (1) Small-file benchmark
Operates on 10,000 1-KB files in three phases
Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion
020000400006000080000
100000120000140000160000180000
create read delete
op / sec
SGI XFS reiserfs ext2fs ramfs Conquest
![Page 55: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/55.jpg)
55
Sprite LFS Microbenchmarks (2) Modified large-file microbenchmark: 10 1-MB
files (Conquest in-core files)
Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion
0
100
200
300
400
500
600
700
seq write seq read rand write rand read seq read
MB / sec
SGI XFS reiserfs ext2fs ramfs Conquest
![Page 56: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/56.jpg)
56
Sprite LFS Microbenchmarks (3) Modified large-file microbenchmark: 10 1.01-
MB files (Conquest on-disk files)
Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion
0
100
200
300
400
500
600
700
seq write seq read rand write rand read seq read
MB / sec
SGI XFS reiserfs ext2fs ramfs Conquest
![Page 57: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/57.jpg)
57
Sprite LFS Microbenchmarks (4) Large-file microbenchmark: 40 100-MB files
(Conquest on-disk files)
Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion
0
5
10
15
20
25
30
seq write seq read rand write rand read seq read
MB / sec
SGI XFS reiserfs ext2fs Conquest
![Page 58: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/58.jpg)
58
History’s Mystery
Puzzling Microbenchmark Numbers…
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
Geoffrey Kuenning: “If Conquest is slower than ext2, I will toss you off of the balcony…”
![Page 59: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/59.jpg)
59
With me hanging off a balcony… Original large-file microbenchmark: 1-MB file
(Conquest in-core file)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
0
100
200
300
400
500
600
700
seq write seq read rand write rand read seq read
MB / sec
SGI XFS reiserfs ext2fs ramfs Conquest
![Page 60: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/60.jpg)
60
0
100
200
300
400
500
600
700
seq write seq read rand write rand read seq read
MB / sec
SGI XFS reiserfs ext2fs ramfs Conquest
Odd Microbenchmark Numbers Why are random reads slower than sequential
reads?
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 61: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/61.jpg)
61
0
100
200
300
400
500
600
700
seq write seq read rand write rand read seq read
MB / sec
SGI XFS reiserfs ext2fs ramfs Conquest
Odd Microbenchmark Numbers Why are RAM-based file systems slower than
disk-based file systems?
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 62: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/62.jpg)
62
A Series of Hypotheses Warm-up effect?
Maybe Why do RAM-based systems warm up slower?
Bad initial states? No
Pentium III streaming IO option? No
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 63: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/63.jpg)
63
Effects of Cache Footprint SizesLarge cache footprint Small cache footprint
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
write a file sequentially
footprint file end
footprint
read the same file sequentially
footprint
flush
file endfile
read
write a file sequentially
footprint file end
footprint
read the same file sequentially
footprint
flush
file end
read
file
![Page 64: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/64.jpg)
64
LFS Sprite Microbenchmarks Modified large-file microbenchmark: 10 1-MB
files (Conquest in-core files)
Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion
0
100
200
300
400
500
600
700
seq write seq read rand write rand read seq read
MB / sec
SGI XFS reiserfs ext2fs ramfs Conquest
faster random over sequential accesses due to cache reuse
![Page 65: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/65.jpg)
66
Lessons Learned Faster than LRU caching, unexpected
Heavyweight disk handling Severe penalty for accessing memory content
Matching user access patterns to storage media offers considerable simplification and better performance Not an automatic result Need careful design
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 66: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/66.jpg)
67
More Lessons Learned Effects of L2 caching become highly visible in
memory workloads (modern workloads) Cannot blindly apply existing disk-based
microbenchmarks to measure memory performance of file systems
Need to consider states of L2 cache and memory behaviors at each stage of microbenchmarking
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 67: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/67.jpg)
68
Additional Lessons Learned Don’t discuss your performance numbers next
to a balcony…unless…
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 68: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/68.jpg)
[McKusick et al., 1990; Ganger et al., 2000; Roselli et al., 2000; Seltzer et al., 2000]
69
Related Work (1) Disk caching
Assumption of scarce memory Complex mechanisms to maintain consistency
Especially with the presence of metadata
RAM drives and RAM file systems Not meant to be persistent Use disk-related mechanisms Limitations on storage capacity
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 69: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/69.jpg)
[Riedel 1998; ZDNet 1999] 70
Related Work (2) Disk emulators
RAM storage accessed through SCSI interface Ad hoc approaches
Manual transferring of files to and from ramfs Capacity limitation
Background daemon to stage RAM files to a disk
Semantic and name space problems
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 70: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/70.jpg)
71
Going Beyond Conquest (1) Matching usage patterns with heterogeneous
machines in the distributed domain Specialized tasks for machines within a cluster Preferably self-organizing and self-evolving
State-rich computing Caching of runtime data structures Similar to /tmp
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 71: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/71.jpg)
72
Going Beyond Conquest (2) Separate storage of metadata from data
Association of metadata with data of different fidelity
Opportunity for hierarchical replication across devices with different calibers
Benchmarking memory performance of file systems Developing new memory benchmarks
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 72: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/72.jpg)
73
Contributions Demonstrated the feasibility of disk-memory
hybrid file systems Showed performance does not preclude
simplicity Pinpointed cache-related problems with
modern benchmarks Opened doors to many exciting areas of
research
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
![Page 73: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek](https://reader031.vdocuments.us/reader031/viewer/2022032702/56649cec5503460f949b85a5/html5/thumbnails/73.jpg)
74
Conclusion Conquest demonstrates how rethinking
changes in underlying assumptions can lead to significant architectural and performance improvements
Radical changes in hardware, applications, and user expectations in the past decade should lead us to rethink other aspects of OS as well.
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion