distributedfilesys.ppt
TRANSCRIPT
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
1/16
University of Pennsylvania
11/21/00
Distributed File Systems
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
2/16
University of Pennsylvania
11/21/00
Remote Files
File service vs. file server File service interface: the specification of what the file system
offers to its clients. File server: a process that runs on some machine and helps
implement the file service.
File Service Model (Fig 13-1) upload/download model remote access model Comparison between two model
The directory service
creating and deleting directories naming and renaming files
moving files
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
3/16
University of Pennsylvania
11/21/00
Goals
1 Network transparency: uses do not have to aware thelocation of files to access them location transparency: the name of a file does not reveal any
kind of the file's physical storage location.
/server1/dir1/dir2/X
server1 can be moved anywhere (e.g., from CIS to SEAS). location independence: the name of a file does not need to be
changed when the file's physical storage location changes.
The above file X cannot moved to server2 if server1 is fulland server2 is no so full.
2 High availability: system failures or scheduled activitiessuch as backups, addition of nodes
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
4/16
University of Pennsylvania
11/21/00
Architecture
Computation model file severs -- machines dedicated to storing files and
performing storage and retrieval operations (for highperformance)
clients -- machines used for computational activities may
have a local disk for caching remote files Two most important services
name server -- maps user specified names to storedobjects, files and directories
cache manager -- to reduce network delay, disk delay
problem: inconsistency
Typical data access actions open, close, read, write, etc.
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
5/16
University of Pennsylvania
11/21/00
Design Issues
Naming and name resolution
Semantics of file sharing (Fig 13-4, Fig 13-5)
Stateless versus stateful servers (Fig 13-8)
Caching -- where to store files (Fig 13-9) Cache consistency (Fig 13-11)
Replication (Fig 13-12)
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
6/16
University of Pennsylvania
11/21/00
Naming and Name Resolution
a name space -- collection of names name resolution -- mapping a name to an object
same or different view of a directory hierarchy (Fig. 13-3)
3 traditional ways to name files in a distributed environment
concatenate the host name to the names of files stored on that
host:system-wide uniqueness guaranteed, simple to located a file;however, not network transparent, not location independent, e.g.,/machine/usr/foo
mount remote directories onto local directories:once mounted, files can be referenced in a location-transparent
manner provide a single global directory:
requires a unique file name for every file, location independent,cannot encompass heterogeneous environments and widegeographical areas
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
7/16
University of Pennsylvania
11/21/00
Semantics of File SharingConsistency Semantics Problem (Fig 13-4): read after write
Assume open; reads/writes; close
1 UNIX semantics: value read is the value stored by last writeWrites to an open file are visible immediately to others thathave this file opened at the same time. Easy to implement ifone server and no cache.
2 Session semantics:Writes to an open file by a user is not visible immediately byother users that have files opened already.Once a file is closed, the changes made by it are visible bysessions started later.
3 Immutable-Shared-Files semantics:
A sharable file cannot be modified.File names cannot be reused and its contents may not bealtered.Simple to implement.
4 Transactions: All changes have all-or-nothing property.W1,R1,R2,W2 not allowed where P1 = W1;W2 and P2 = R1;R2
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
8/16
University of Pennsylvania
11/21/00
Stateful versus Stateless Service
Two approaches to server-side information1 stateful file server
a client performs open on a file
the server keeps file information (e.g., file descriptorentry, offset)
Adv: increased performance On server crash, it looses all its volatile state information
On client crash, the server needs to know to claim statespace
2 stateless file server -- each request is self-contained
each request identifies the file, the position, read/write. server failure is identical to slow server (client retries...)
each request must be idempotent.
NFS employs this.
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
9/16
University of Pennsylvania
11/21/00
CachingFour places to store files (Fig. 13-9)
servers disk: slow performance server caching: in main memory
cache management issue, how much to cache, replacement strategy
still slow due to network delay
Used in high-performance web-search engine servers
client caching in main memory can be used by diskless workstation
faster to access from main memory than disk
compete with the virtual memory system for physical memory space
Three options (Fig. 13-10)
client-cache on a local disk
large files can be cached the virtual memory management is simpler
a workstation can function even when it is disconnected from thenetwork
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
10/16
University of Pennsylvania
11/21/00
A Comparison of Caching and Remote Service
1 reduces remote accesses (esp, when locality is capitalized) reduces network traffic and server load
2 total network overhead is lower for big chunks of data(caching) than a series of responses to specific requests.
3 disk access can be optimized better for large requests thanrandom disk blocks
4 cache-consistency problem is the major drawback. If thereare frequent writes, overhead due to the consistencyproblem is significant.
5 OS is simpler for remote service.
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
11/16
University of Pennsylvania
11/21/00
Cache Consistency
Reflecting changes to local cache to master copy Reflecting changes to master copy to local caches
update
Copy 1
Copy 2
Master copy
write
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
12/16
University of Pennsylvania
11/21/00
Update algorithms for client caching
write-through: all writes are carried out immediately Reliable: little information is lost in the event of a client crash
Slow: cache not that useful
delayed-write: delays writing at the server
possible to perform many writes to a block in the cache beforeit is written
if data is written and then deleted immediately, data need notbe written at all (20-30 % of new data is deleted with 30 secs)
write-on-close: delay writing until the file is closed at theclient
if file is open for short duration, works fine
if file is open for long, susceptible to losing data in the event ofclient crash
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
13/16
University of Pennsylvania
11/21/00
Cache Coherence
How to maintain consistency between locally cached datawith the master data when the data has been modified byanother client?
1 Client-initiated approach -- check validity onevery access: too much overhead
first access to a file (e.g., file open)every fixed time interval
2 Server-initiated approach -- server records, for each client,the (parts of) files it caches.After the server detects a potential inconsistency, it reacts.
3 Not allow caching when concurrent-write sharing occurs.Allow many readers.If a client opens for writing, inform all the clients to purgetheir cached data.
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
14/16
University of Pennsylvania
11/21/00
Cache consistency, cont.
Potential inconsistency: In session semantics, a client closes a modified file.
In UNIX semantics, the server must be notifiedwhenever a file is opened and the intended mode (read or
write mode) must be indicated for every open. Disable cache when a file is opened in conflicting modes.
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
15/16
University of Pennsylvania
11/21/00
Replication
Reasons: Increase reliability improve availability balance the servers workload
how to make replication transparent (Fig. 13-12)
how to keep the replicas consistent Problems -- mainly with updates
1 a replica is not updated due to its server failure2 network partitioned
Replication Management:1 weighted vote for read and write2 current synchronization site for each file group to
control access
-
7/28/2019 DISTRIBUTEDFILESYS.ppt
16/16
University of Pennsylvania
11/21/00
Current research issues
Scalability Mobile Users
disconnected operation
low bandwidth communication
Security