distributedfilesys.ppt

Upload: mb4u

Post on 03-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    1/16

    University of Pennsylvania

    11/21/00

    Distributed File Systems

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    2/16

    University of Pennsylvania

    11/21/00

    Remote Files

    File service vs. file server File service interface: the specification of what the file system

    offers to its clients. File server: a process that runs on some machine and helps

    implement the file service.

    File Service Model (Fig 13-1) upload/download model remote access model Comparison between two model

    The directory service

    creating and deleting directories naming and renaming files

    moving files

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    3/16

    University of Pennsylvania

    11/21/00

    Goals

    1 Network transparency: uses do not have to aware thelocation of files to access them location transparency: the name of a file does not reveal any

    kind of the file's physical storage location.

    /server1/dir1/dir2/X

    server1 can be moved anywhere (e.g., from CIS to SEAS). location independence: the name of a file does not need to be

    changed when the file's physical storage location changes.

    The above file X cannot moved to server2 if server1 is fulland server2 is no so full.

    2 High availability: system failures or scheduled activitiessuch as backups, addition of nodes

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    4/16

    University of Pennsylvania

    11/21/00

    Architecture

    Computation model file severs -- machines dedicated to storing files and

    performing storage and retrieval operations (for highperformance)

    clients -- machines used for computational activities may

    have a local disk for caching remote files Two most important services

    name server -- maps user specified names to storedobjects, files and directories

    cache manager -- to reduce network delay, disk delay

    problem: inconsistency

    Typical data access actions open, close, read, write, etc.

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    5/16

    University of Pennsylvania

    11/21/00

    Design Issues

    Naming and name resolution

    Semantics of file sharing (Fig 13-4, Fig 13-5)

    Stateless versus stateful servers (Fig 13-8)

    Caching -- where to store files (Fig 13-9) Cache consistency (Fig 13-11)

    Replication (Fig 13-12)

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    6/16

    University of Pennsylvania

    11/21/00

    Naming and Name Resolution

    a name space -- collection of names name resolution -- mapping a name to an object

    same or different view of a directory hierarchy (Fig. 13-3)

    3 traditional ways to name files in a distributed environment

    concatenate the host name to the names of files stored on that

    host:system-wide uniqueness guaranteed, simple to located a file;however, not network transparent, not location independent, e.g.,/machine/usr/foo

    mount remote directories onto local directories:once mounted, files can be referenced in a location-transparent

    manner provide a single global directory:

    requires a unique file name for every file, location independent,cannot encompass heterogeneous environments and widegeographical areas

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    7/16

    University of Pennsylvania

    11/21/00

    Semantics of File SharingConsistency Semantics Problem (Fig 13-4): read after write

    Assume open; reads/writes; close

    1 UNIX semantics: value read is the value stored by last writeWrites to an open file are visible immediately to others thathave this file opened at the same time. Easy to implement ifone server and no cache.

    2 Session semantics:Writes to an open file by a user is not visible immediately byother users that have files opened already.Once a file is closed, the changes made by it are visible bysessions started later.

    3 Immutable-Shared-Files semantics:

    A sharable file cannot be modified.File names cannot be reused and its contents may not bealtered.Simple to implement.

    4 Transactions: All changes have all-or-nothing property.W1,R1,R2,W2 not allowed where P1 = W1;W2 and P2 = R1;R2

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    8/16

    University of Pennsylvania

    11/21/00

    Stateful versus Stateless Service

    Two approaches to server-side information1 stateful file server

    a client performs open on a file

    the server keeps file information (e.g., file descriptorentry, offset)

    Adv: increased performance On server crash, it looses all its volatile state information

    On client crash, the server needs to know to claim statespace

    2 stateless file server -- each request is self-contained

    each request identifies the file, the position, read/write. server failure is identical to slow server (client retries...)

    each request must be idempotent.

    NFS employs this.

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    9/16

    University of Pennsylvania

    11/21/00

    CachingFour places to store files (Fig. 13-9)

    servers disk: slow performance server caching: in main memory

    cache management issue, how much to cache, replacement strategy

    still slow due to network delay

    Used in high-performance web-search engine servers

    client caching in main memory can be used by diskless workstation

    faster to access from main memory than disk

    compete with the virtual memory system for physical memory space

    Three options (Fig. 13-10)

    client-cache on a local disk

    large files can be cached the virtual memory management is simpler

    a workstation can function even when it is disconnected from thenetwork

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    10/16

    University of Pennsylvania

    11/21/00

    A Comparison of Caching and Remote Service

    1 reduces remote accesses (esp, when locality is capitalized) reduces network traffic and server load

    2 total network overhead is lower for big chunks of data(caching) than a series of responses to specific requests.

    3 disk access can be optimized better for large requests thanrandom disk blocks

    4 cache-consistency problem is the major drawback. If thereare frequent writes, overhead due to the consistencyproblem is significant.

    5 OS is simpler for remote service.

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    11/16

    University of Pennsylvania

    11/21/00

    Cache Consistency

    Reflecting changes to local cache to master copy Reflecting changes to master copy to local caches

    update

    Copy 1

    Copy 2

    Master copy

    write

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    12/16

    University of Pennsylvania

    11/21/00

    Update algorithms for client caching

    write-through: all writes are carried out immediately Reliable: little information is lost in the event of a client crash

    Slow: cache not that useful

    delayed-write: delays writing at the server

    possible to perform many writes to a block in the cache beforeit is written

    if data is written and then deleted immediately, data need notbe written at all (20-30 % of new data is deleted with 30 secs)

    write-on-close: delay writing until the file is closed at theclient

    if file is open for short duration, works fine

    if file is open for long, susceptible to losing data in the event ofclient crash

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    13/16

    University of Pennsylvania

    11/21/00

    Cache Coherence

    How to maintain consistency between locally cached datawith the master data when the data has been modified byanother client?

    1 Client-initiated approach -- check validity onevery access: too much overhead

    first access to a file (e.g., file open)every fixed time interval

    2 Server-initiated approach -- server records, for each client,the (parts of) files it caches.After the server detects a potential inconsistency, it reacts.

    3 Not allow caching when concurrent-write sharing occurs.Allow many readers.If a client opens for writing, inform all the clients to purgetheir cached data.

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    14/16

    University of Pennsylvania

    11/21/00

    Cache consistency, cont.

    Potential inconsistency: In session semantics, a client closes a modified file.

    In UNIX semantics, the server must be notifiedwhenever a file is opened and the intended mode (read or

    write mode) must be indicated for every open. Disable cache when a file is opened in conflicting modes.

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    15/16

    University of Pennsylvania

    11/21/00

    Replication

    Reasons: Increase reliability improve availability balance the servers workload

    how to make replication transparent (Fig. 13-12)

    how to keep the replicas consistent Problems -- mainly with updates

    1 a replica is not updated due to its server failure2 network partitioned

    Replication Management:1 weighted vote for read and write2 current synchronization site for each file group to

    control access

  • 7/28/2019 DISTRIBUTEDFILESYS.ppt

    16/16

    University of Pennsylvania

    11/21/00

    Current research issues

    Scalability Mobile Users

    disconnected operation

    low bandwidth communication

    Security