disconnected operation in the coda file system

31
Disconnected Operation in the Coda File System James J. Kistler and M. Satyanarayanan Carnegie Mellon University Presented by Cong

Upload: arnaud

Post on 24-Feb-2016

97 views

Category:

Documents


0 download

DESCRIPTION

Disconnected Operation in the Coda File System. James J. Kistler and M. Satyanarayanan Carnegie Mellon University. Presented by Cong. Content. IMotivation IICoda Overview IIIImplementation IVEvaluation VConclusion. IMotivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Disconnected Operation in the Coda File System

Disconnected Operation in the Coda File System

James J. Kistler and M. SatyanarayananCarnegie Mellon University

Presented by Cong

Page 2: Disconnected Operation in the Coda File System

Content• I Motivation• II Coda Overview• III Implementation• IV Evaluation• V Conclusion

Page 3: Disconnected Operation in the Coda File System

I MotivationIn 1980s, AFS served about 1000 clients in CMU

Network is slow and not stable

Page 4: Disconnected Operation in the Coda File System

In 1990s, people had

“Powerful” client:33MHz CPU, 16MB RAM, 100MB hard drive

Mobile Users appeared:1st IBM Thinkpad in 1992 (Thinkpad 700C)

We can do sth. at client without network!

Page 5: Disconnected Operation in the Coda File System

Birth of Coda --Disconnected Operation

Page 6: Disconnected Operation in the Coda File System

II Coda Overview (I) – Purpose & Features

• Successor of the Andrew File System (AFS)First DFS aimed at a campus-sized user community

• Features– open-to-close consistency– callbacks

Page 7: Disconnected Operation in the Coda File System

Coda Overview (II) – How it works

• Clients view Coda as a single location-transparent shared Unix file system

• Coda namespace is mapped to individual file servers at the granularity of subtrees called volumes

• Each client has a cache manager (Venus)

Page 8: Disconnected Operation in the Coda File System

Client Program Venus Coda

ServerUser level

System Call

Return from syscall Read/Write

VFS

Kernel

Coda FS

Network

RPC

Coda Overview (III) – How it worksAll the clients see “/coda” #cat /coda/tmp/foo

Bang!

Page 9: Disconnected Operation in the Coda File System

• Continue critical work when network/server/repository is inaccessible.

• Key idea: Caching data– Performance– Availability

Page 10: Disconnected Operation in the Coda File System

High availability is achieved through

– Server replication

– Client replication (Cache)

– Disconnected Operations

Page 11: Disconnected Operation in the Coda File System

Server replicationSet of replicas of a volume is VSG

(Volume Storage Group)

At any time, client can access AVSG (Available Volume Storage Group)

+ Persistent, Secure physically- Expensive

Client replication- Low quality relatively+Cheap

Page 12: Disconnected Operation in the Coda File System

Design Rationale –Replica Control• Pessimistic– Disable all partitioned writes - Require a client to acquire control of a cached object

prior to disconnection+ Acceptable for voluntary disconnections

• Optimistic– Assuming no others touching the file- sophisticated: conflict detection + fact: low write-sharing in Unix+ high availability: access anything in range

Page 13: Disconnected Operation in the Coda File System

III IMPLEMENTATION

• Venus states• Hoarding– Hoard walking– Prioritized algorithm

• Emulation• Reintegration– Conflicts handling

Page 14: Disconnected Operation in the Coda File System

Client Structure

Page 15: Disconnected Operation in the Coda File System

Venus States (I)

1. Hoarding:Normal operation mode

2. Emulating:Disconnected operation mode

3. Reintegrating:Propagates changes and detects inconsistencies

Page 16: Disconnected Operation in the Coda File System

Venus States (II)

Page 17: Disconnected Operation in the Coda File System

Hoarding

• Hoard useful data for disconnection• How useful is the data? – Prioritized algorithm: Cache manage

• How to keep data updated?– Hoard walking : Reevaluate objects

• Balance the needs of connected and disconnected operation– Cache size is restricted– Unpredictable disconnections

Page 18: Disconnected Operation in the Coda File System

Prioritized algorithm

• User defined hoard priority p: how interest it is?• Recent Usage q • Object priority = f(p,q)• Kick out the one with lowest priority+ Fully tunable

Everything can be customized

Page 19: Disconnected Operation in the Coda File System

Hoard Walking

• Equilibrium – uncached obj < cached obj

• Walking: restore equilibrium– Reloading HDB (changed by others)– Reevaluate priorities in HDB and cache

Page 20: Disconnected Operation in the Coda File System
Page 21: Disconnected Operation in the Coda File System

Emulation

• Act like a server• Record modified objects• Replay update activity Preparation– Log based per volume

• Persistence– Meta-data Recoverable virtual memory (RVM) – Exhaustion• Compress?

Page 22: Disconnected Operation in the Coda File System

Reintegration

• Replay algorithm– Execute in parallel to all AVSG– Transaction based– Succeed?• Yes. Free logs, reset priority• No. Save logs to a tar. Ask for help

Page 23: Disconnected Operation in the Coda File System

Conflict Handling

• Only care write/write confliction• File vs Directory– File: Halt entire reintegration process– Dir: investigate more– Manual repair

Page 24: Disconnected Operation in the Coda File System

Coda Evaluation

• Hardware– 386 laptop, IBM Decstation 3100s– 350MB disk

• How …?– How long does reintegration take?– How large a local disk does one need?– How likely are conflicts?

Page 25: Disconnected Operation in the Coda File System

Answers

• Duration of Reintegration– Requires very large data transfers– A few hours disconnection ->1 min

• Cache size– 100MB at client is enough for a “typical” workday

• Conflicts– Over 99% modification by the same person– Two users modify the same obj within a day: <0.75%

Page 26: Disconnected Operation in the Coda File System

Conclusion

• Disconnected operation is a simple idea• Hard to implement in each stage• An extended version of write-back cache• Feasible, efficient and usable

Page 27: Disconnected Operation in the Coda File System

Q1 Can Coda be easily extended to become a code repository?

Q2 They do not handle any conflict resolution between simultaneously modified files and state that even on collaborative projects, most files were modified by the same person. Wouldn't this not work with rigorously change logged code like in many projects today? Also is aborting the entire reintegration because of a single file a good idea?

Page 28: Disconnected Operation in the Coda File System

Q3 Coda conflict resolution mechanism seems complex and time consuming even though it is providing a tool for this purpose. What kind of conflicts may be difficult for CODA to resolve without user intervention?

Page 29: Disconnected Operation in the Coda File System

Q4 Do we need to address logical extreme problems like this in the file systems? Because this system will be worth nothing during disconnection if a user cannot work with the set of files he is having already.

E.g: Suppose a developer may continue development with hislocal copy of the code but to compile the code he may need shred libraries and other statically linked libraries in the server. In this situation it is unable to continue because of part of the source code and libraries are in the sever. (I am assuming the libraries and source files are large in both size and numbers)

Page 30: Disconnected Operation in the Coda File System

Q5 Why is back-fetching special? Why is it done in a separate stage? If the metadata is immediately updated to point at the shadow file (as the text suggests), why can't writes be pushed to final storage immediately?

Page 31: Disconnected Operation in the Coda File System

Thanks!