1 unix internals – the new frontiers distributed file systems
TRANSCRIPT
![Page 1: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/1.jpg)
1
UNIX Internals – the New Frontiers
Distributed File Systems
![Page 2: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/2.jpg)
2
Difference between DOS and DFS
Distributed OS looks like a centralized OS, but runs simultaneously on multiple machines. It may provide a FS shared by all its host machines.
Distributed FS is a software layer that manages communication between conventional operating systems and file systems
![Page 3: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/3.jpg)
3
General Characteristics of DFS
Network transparency Location transparency & Location
independence User Mobility Fault tolerance Scalability File mobility
![Page 4: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/4.jpg)
4
Design Considerations Name Space Stateful or stateless Semantics of sharing
UNIX semantics Session semantics
Remote access method
![Page 5: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/5.jpg)
5
Network File System(NFS)
Based on Client-server model Communicate via remote procedure call
![Page 6: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/6.jpg)
6
User Perspective An NFS server exports one or more file
systems Hard mount: must get a reply Soft mount: returns an error Spongy mount: hard for mount, soft for I/O
Commands: mount –t nfs nfssrv:/usr /usr mount –t nfs nfssrv:/usr/u1 /u1 mount –t nfs nfssrv:/usr /users mount –t nfs nfssrv:/usr/local
/usr/local
![Page 7: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/7.jpg)
7
![Page 8: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/8.jpg)
8
Design goals Not restricted to UNIX Not be dependent on any hardware Simple recovery mechanisms To access remote files transparently UNIX semantics NFS performance must be comparable
to that of a local disk Transport-independent
![Page 9: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/9.jpg)
9
NFS components
NFS protocol RPC protocol XDR(Extended Data Representation) NFS server code NFS client code Mount protocol Daemon processes (nfsd, mountd,biod) NLM(Network Lock Manager)& NSM(Network Status Monitor)
![Page 10: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/10.jpg)
10
Statelessness Each request is independent It makes crash recovery simple
Client crash Server crash
Problem: It must commit all modifications to stable
storage before replying to a request.
![Page 11: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/11.jpg)
11
10.4 The protocol suite
Why XDR? Differences among internal
representation of data elements: Order, sizes of types. Opaque (byte stream) Typed Little-endian Big-endian
![Page 12: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/12.jpg)
12
XDR
Integers 32 bits, (0 byte leftmost - most significant),
(signed integers - 2’s compliment) Variable-length opaque data
Length(4B),data is NULL padded Strings
Length(4B), ASCII string, NULL padded Arrays
size(4B),same type of data Structures
Natural order
![Page 13: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/13.jpg)
13
![Page 14: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/14.jpg)
14
RPC Specify the format of communications
between the client and the server. SUN RPC: synchronous requests only. Implemented on UDP/IP. Authentication to identify callers
AUTH _NULL, AUTH _UNIX, AUTH_SHORT, AUTH _DES, and AUTH _KERB
RPC language compiler: rpcgen
![Page 15: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/15.jpg)
15
![Page 16: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/16.jpg)
16
10.5 NFS Implementation Control Flow Vnode Rnode
![Page 17: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/17.jpg)
17
File Handle Assign a file handle for lookup, create or
mkdir. Subsequent I/O operations will use it. A file handle =Opaque 32B object =<file
system ID, inode number, generation number>
Generation number is used to check if the file is not obsolete (its inode is allocated to another file)
![Page 18: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/18.jpg)
18
The mount operation nfs_mount():
send RPC request with argument of pathname
Mountd daemon translate Checks Reply success with a file handle Initialize vfs, records name, address Allocate rnode & vnode Server must check access rights on each
request
![Page 19: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/19.jpg)
19
Pathname Lookup Client:
Initiate lookup during open, create & stat From current or root directory, proceeds one
component at a time Send request if it is a NFS directory
Server From file handle ->FS ID->vfs->VGET-> vnode
->VOP_LOOKUP->vnode & pointer VOP_GETATTR->VOP_FID-> file handle Reply message= status+file handle+file attributes
Client: Gets the reply, allocates rnode+vnode, copy info and
proceeds to search for the next component
![Page 20: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/20.jpg)
20
10.6 UNIX Semantics
NFS leads to a few incompatibilities with UNIX because of stateless.
Open file permission UNIX checks for open NFS checks for each read and write In NFS, the server always allows the owner of the
file to read or write the file. Write to the write-protected?
Save attributes containing the file permission when open
![Page 21: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/21.jpg)
21
Deletion of open files The server has no ideas about the
open file. The clients renames the file to be
deleted. Delete it when closing it Delete on different machines?
![Page 22: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/22.jpg)
22
Reads and Writes UNIX locks the vnode at the start of I/O NFS clients can lock the vnode on the
same machine. NFS offers no protection against
overlapping I/O requests. Using NLM(Network Lock Manager)
protocol is only advisory.
![Page 23: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/23.jpg)
23
10.7 NFS Performance
Bottlenecks Writes must be committed to stable storage Fetching of file attributes requires one RPC
call per file Processing retransmitted requests adds to
the load on the server
![Page 24: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/24.jpg)
24
Client-side caching Caching both blocks and file attributes To avoid invalid data
Keep an expiry time in the kernel 60 seconds for rechecking the modified time
Reduces but not eliminates the problem
![Page 25: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/25.jpg)
25
Deferral of writes
Asynchronous writes for full blocks Delayed writes for partial blocks Flush delayed writes when closing or 30
seconds by biod daemon Server uses NVRAM buffer, flushes the
buffer to disk Write-gathering:
Wait, process >1 writes to one file and reply for each
The server process gathered write requests
![Page 26: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/26.jpg)
26
The retransmissions cache Idempotent Nonidempotent Problem:
Retransmissions (xid) cache (server): Check xid, procedure number, & client ID Check cache only when failure
Remove request Remove, sends reply success, but lostClient restransmit removeServer processes remove request Remove error, sends remove failureClient receives the error message
![Page 27: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/27.jpg)
27
New implementation
Caches all requests Check xid, procedure number, client ID, state
field & timestamp If request in progress, discard; if done,
discards if timestamp shows the request is in the throwaway window(3-6s)
Otherwise processes request if idempotent; For nonidempotent, checks the file if
modified, if not - send success; otherwise, retry it.
![Page 28: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/28.jpg)
28
10.9 NFS Security NFS Access Control
On mount and request By an exports list
Mount: checks the list, denies the ineligible Request: authentication information,
AUTH_UNIX form(UID,GID)
Loophole: a imposter can use <UID,GID> to access the files of others
![Page 29: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/29.jpg)
29
UID Remapping
A translation map for each client. Same UID may map to different UID on
the server Nobody if does not match in the map Implemented at RPC level Implemented at NFS level
Merging the map and /etc/exports file
![Page 30: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/30.jpg)
30
Root Remapping Map the super user to nobody Limit the super user of the client to
access files on the server The UNIX framework is designed for an
isolated, multi-user environment. The users trust each other.
![Page 31: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/31.jpg)
31
10.10 NFS Version 3 Commit request
Client writes, the kernel sends asynchronous write
Server saves to local cache, replies immediately Client holds the data copy until the process
closes the file and sends commit request Server flushes data to disk
file length: From 32 bits(4GB) to 64 bits(234 GB)
READDIRPLUS =(LOOKUP+GETATTR) Returns names, file handles, file attributes
![Page 32: 1 UNIX Internals – the New Frontiers Distributed File Systems](https://reader031.vdocuments.us/reader031/viewer/2022032200/56649cef5503460f949bd3ed/html5/thumbnails/32.jpg)
32
Other DFS
The Andrew File System (10.15 – 10.17)
The DCE Distributed File System (10.18 – 10.18.5)