web caching file system jonathan ledlie matt mccormick
DESCRIPTION
Two points Optimizing on invariants Impending I/O bottleneckTRANSCRIPT
Web Caching File System
Jonathan LedlieMatt McCormick
Outline
• Motivation - why design a new file system?• Current state of affairs• Design of web caching file system• Performance comparison - WCFS to Unix• Future work• Conclusions
Two points
• Optimizing on invariants• Impending I/O bottleneck
Motivation
• Disks are slow• Communication rates increasing rapidly• Web cache anomalies
– only write to files when they are created– permissions stay constant for files– all files are have copies at original server
Web Caching File System vs.Unix File System
0
0.2
0.4
0.6
0.8
1
1.2
cfscreate
unix cfsread
unix cfsdelete
unix
Operation
Tim
e
cfscreate
unix
cfsread
unix
cfsdelete
unix
50,000 of each operation
CFS is using one thread.
Unix file I/O is synchronous
500 mHz PIII, 8.5 G disk
Current State of AffairsInternet Topology
NETWORK
Client SideCache
Client
Client
Client
Client SideCache
Client
Client
Client
Server SideCache
SERVER
Current State of AffairsUnix File System
• Life of file in web cache– create, write, close– open, read, close (multiple times)– delete
• Using i-nodes– lots of flexibility that is not needed– extra access to disk for each file reference
• Directory structure and name lookup
Design of WCFSSpecializations
• Life of file– create– read (multiple times)– delete
• No i-nodes or permanent file status data– faster create and file access
• In memory hash table stores file locations– faster file lookup and delete
• All file data written to consecutive blocks– faster reads and writes
Design of WCFSObject Diagram
RequestQueueFileTable BitMap
Disk
CacheDisk
Request
Request
Request
CacheCache Cache
getNewCacheObject
Design of WCFSDisk Initialization
• First create cache disk object– creates disk object to represent physical disk– starts a disk thread running
• Disk object and physical disk– utilize an SGI raw I/O patch for Linux– bypass kernel and kernel buffers
Disk
CacheDisk
Design of WCFSDisk Object
• FileTable– stores names and locations of files on disk– MD5 conversion of url
• RequestQueue– stores read and write requests from process threads– whenever anything in queue, disk thread runs
• BitMap– keeps status of each block on disk– locates and marks spot on disk for files to be placed
RequestQueueFileTable BitMap
Disk
Design of WCFSRequest Objects
• Request– write starting block, length, buffer to write from– read starting block, length, buffer to write to– (implies files must be smaller than virtual memory)
• Currently queued by FIFO (soon to be one-way elevator)
RequestQueue
Request
Request
Request
Design of WCFSCache Objects for Threading
• Multiple threads for handling clients• Each thread gets a single Cache object• Cache Object
– create, read, remove, length, sync• Thread create and read Asynchronous
– turned into request objects– placed in request queue for disk
• Thread calls sync to guarantee its operations are done
CacheDisk
CacheCache Cache
Design of CFSCode Snippet
• Common web caching operations: create(“url”, buffer, size); read(“url”, buffer); remove(“url”); sync();
• Equivalent Operations in Unix: fd = creat(“url”, permissions); write(fd, buffer, size); close(fd);
fd = open(“url”, mode); read(fd, buffer, size); close(fd);
unlink(“url”);
Design of WCFSBasic File System Layout
RequestQueueFileTable BitMap
Disk
CacheDisk
Request
Request
Request
CacheCache Cache
getNewCacheObject
Design of WCFSFeature Recap
• Raw I/O• Multi-threading• Asynchronous I/O• Quick name lookup• File data on consecutive blocks
Performance ComparisonsTrace
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0
600
1200
1800
2400
3000
3600
4200
4800
5400
6000
6600
7200
7800
8400
9000
9600
1020
0
1080
0
1140
0
1200
0
1260
0
1320
0
1380
0
1440
0
Operations
Tim
e (u
sec)
cfsext2
Performance ComparisonsCreate
0500
10001500200025003000350040004500
0 50 100
150
200
250
300
350
400
450
operations (x 100)
mill
isec
onds ext2 - 1
ext2 - 2
cfs - 1
cfs - 2
Performance ComparisonsRead
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475
operations (x 100)
mill
isec
onds
ext2 - 1
ext2 - 2
ext2 - 3
ext2 - 4
cfs - 1
cfs - 2
Performance ComparisonsDelete
0100002000030000400005000060000700008000090000
0 50 100
150
200
250
300
350
400
450
operations (x 100)
mill
isec
onds ext2 - 1
ext2 - 2
cfs - 1
cfs - 2
Two points, revisited
• Optimizing on invariants• Impending I/O bottleneck
What’s coming...
• Real raw I/O and proper memory alignment• Testing with more threads• Trace testing• Determining optimal fragmentation and cleaning• Is MD5 a bottleneck?• Elevator algorithm• Adding save on clean shutdown• Examine memory requirements for FileTable
Conclusions
• Unix file system induces unnecessary overhead
• Possible to take advantage of application specific traits
• Specialization works