scale and performance in a distributed file system
DESCRIPTION
Jinyong Yoon, 2010. 10. 18. Scale and Performance in a Distributed File System. Outline. Andrew File System The Prototype Changes for Performance Effect of Changes for Performance Comparison with A Remote-Open File System Conclusion. Andrew File System. - PowerPoint PPT PresentationTRANSCRIPT
Jinyong Yoon, 2010. 10. 18.
Andrew File System The Prototype Changes for Performance Effect of Changes for Performance Comparison with A Remote-Open File
System Conclusion
Developed at Carnegie Mellon University Distributed file system by considerations of scale
Locality of file references Present a homogeneous, location-transparent
file name space to all the client workstations Use 4.2 BSD
Server▪ A set of trusted servers – Vice
Clients▪ User level processes – Venus▪ File system call hooking▪ Contacts with servers only opens and closes for a whole-file transfer
▪ Caches files from Vice▪ Store modified copies of files back on the servers
workstation
Venus
UserProgram
Unix Kernel
Disk
Server
Vice
Unix Kernel
Disk
workstation
Venus
UserProgram
Unix Kernel
Disk
workstation
Venus
UserProgram
Unix Kernel
Disk
Server
Vice
Unix Kernel
Disk
Network
Venus on the client with a dedicated process Persistent process on the server Each server stored the directory hierarchy
Mirroring the structure of the Vice files .admin directory – Vice file status info Stub directory – location database
Vice-Venus interface by their full pathname There’s no notion of a low-level name such as
inode Before using a cached file, Venus verifies its
timestamp Each open of a file thus resulted in at least one
interaction with a server, even if the file were already in the cache and up to date
stat primitive To test for the presence of files To obtain status information before opening files Each stat call involved a cache validity check Increase total running time and the load on
servers Dedicated Process
Excessive context switching overhead Critical resource limits excess High virtual memory paging demands
Remote Procedure Call (RPC) Simplification of implementation Network related resources in the kernel to be
exceeded Location Database
Difficult to move users’ directories between servers
Etc. Use Vice file without recompilation or relinking
Benchmark Command scripts that operates on a collection of files 70 files (source code of an application program) 200kb Stand-alone Benchmark and 5 phases
Skewed distribution of Vice calls TestAuth – Validate cache entries GetFileStat – Obtain status information about files absent
from the cache
Load unit Load placed on a server by a single client workstation
running this benchmark A load unit – 5 Andrew users
CPU/disk utilization profiling
Performance bottleneck is CPU Frequently context switches The time spent by the servers in traversing full pathnames
Cache management Previous▪ Status(in virtual memory)/Data(in local disk)
cache▪ Interception only opening/closing operations▪ Modifications to a cached files are reflected back
to Vice when the file is closed Callback - the server promises to notify it before
allowing a modification▪ This reduces cache validation traffic▪ Each should maintain callback state information ▪ There is a potential for inconsistency
Name resolution Previous▪ inode – unique, fixed-length▪ pathname – one or more, variable-length▪ namei routine – maps a pathname to an inode▪ Each Vice pathname involves implicit namei
operation▪ CPU overhead on the servers
fid – unique, fixed-length, two-level name▪ Map a component of a pathname to a fid▪ Each 32 bit-Volume number, Vnode number,
Uniquifuier▪ Volume number: Identifying a Volume on one server▪ Vnode number: Index into an file storage information array▪ Uniquifuier: Allowing Reuse of Vnode number
Communication and server process structure Using Lightweight Processes (LWPs) instead
of a single process An LWP is bound to a particular client only
for the duration of a single server operation. Low-level storage representation
Access files by their inodes▪ vnode on the servers▪ inode on the clients
workstation
UserProgram
Unix Kernel
Unix File System
Unix file system calls
- If D is in the cache and has a callback on it
- If D is in the cache but has no callback on it
- If D is not in the cacheNon-local file operations
Local Disk
Scalability 19% slower than stand-alone workstation Prototype is 70% slower
Scalability
Remote Open The data in a file are not fetched en masse Instead the remote site potentially participates in each
individual read an write operation File is actually opened on the remote site rather than the
local site NFS
Advantage of remote-open file system Low latency
Scale impacts Andrew in areas besides performance and operability