Key Ideas A network file systems for slow or wide-area
networks Exploits similarities between files or versions
of the same file Avoids sending data that can be found in
the server’s file system or the client’s cache Also uses conventional compression and
caching Requires 90% less bandwidth than traditional
network file systems
Working on slow networks Make local copies
Must worry about update conflicts Use remote login
Only for text-based applications Use instead a LBFS
Better than remote login Must deal with issues like auto-saves
blocking the editor for the duration of transfer
LBFS Exploits cross-file similarities especially with
previous versions of the same file Auto-save files, …
LBFS file server divides the files it stores into chunks and indexes the chunks by hash value
LBFS client similarly indexes a large persistent file cache
LBFS never transfers chunks that the recipient already has
Previous Work (I) AFS Callbacks require server to notify clients
when a cached file has been modified Leases achieve same goal but have an
expiration time Coda supports slow networks and even
disconnected operation Defers some updates to saves bandwidth
OceanStore applies Bayou’s conflict resolution mechanisms to a file system
Previous Work (II) Operation-based updates (Lee et al.)
Proxy-client close to the server duplicates client computations in the hope of duplicating its output files
Spring and Wetherall propose to use two large cooperating caches storing identical copies of the last n megabytes of network traffic
Rsync uses directory tree mirroring at client and server.
LBFS LBFS provides close-to-open consistency
Similar to AFS session consistency LBFS assumes clients will have a cache
large enough to contain a user’s entire working set of files
When possible, LBFS reconstitutes files using chunks of existing data in the file system and client cache instead of transmitting those chunks over the network
Indexing Issues Major challenge is keeping the index a
reasonable size while dealing with shifting offsets Indexing conventional file blocks
would not work Indexing and hashing overlapping file
blocks at all offsets would require too much space
LBFS Solution Considers only non-overlapping chunks of
files Sets chunk boundaries based on file
contents to avoid sensitivity to shifting file offset
Examines every overlapping 48-byte region of the file to selects boundary regions, or breakpoints, using Rabin fingerprints
Expected chunk size is 8 KB plus the size of the 48-byte breakpoint window
More Indexing Issues Pathological cases
Very small chunks Sending hashes of chunks would
consume as much bandwidth as just sending the file
Very large chunks Cannot be sent in a single RPC
LBFS imposes minimum and maximum chuck sizes
The Chunk Database Indexes each chunk by the first 64 bits
of its SHA-1 hash To avoid synchronization problems,
LBFS always recomputes the SHA-1 hash of any data chunk before using it Simplifies crash recovery
Recomputed SHA-1 values are also used to detect hash collisions in the database
Protocol Based on NFS version 3 Adds
Extensions to exploit inter-file commonality (GETHASH)
Leases Compresses all traffic using
conventional gzip
File Consistency (I) Whenever a client makes any RPC on an LBFS
file, it gets back a read lease on the file. If a user opens a file whose lease has expired,
the client asks the server for the attributes of the file Grants the client a lease on the file. Client can check if it has the current version
of the file in its cache If the file times have changed, client must
obtain new contents of file from server
File Consistency (II) No need for write leases
LBFS provides close-to-open consistency Server never demands back a dirty file
If multiple clients are writing the same file,the last one to close the file will overwrite changes from the others
File updates are atomic Limits damage caused by concurrent
updates
Security Issues LBFS uses SFS security infrastructure
Servers have public keys Messages are encrypted
Specific security issue: A user could check whether the file
system contains a particular chunk of data by observing subtle timing differences in server’s answer to CONDWRITE request
Implementation (II) Uses NFS Two NFS-related issues
When server commits a temporary file to a target file, it must copy the contents of the temporary file onto the target file to preserve the target file i-node
Hard to preserve previous contents of a truncated file
Message order is guaranteed by TCP
Key First four bars of each workload show
upstream bandwidth, the second four downstream bandwidth.
CIFS is Windows natural network file system “Leases+Gzip” uses LBFS file caching,
leases, and data compression but not its chunking scheme
“LBFS, new DB” is LBFS starting with a a new database
Key Execution times weere normalized
orma,ized execution times Measurements made over a cable modem link with 384 Kb/sc uplink and 1.5 Mb/s downlink
LAN data were obtained on a 100 Mb/s full-duplex LAN.