![Page 1: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/1.jpg)
SPECULATIVE EXECUTION INA DISTRIBUTED FILE SYSTEM
E. B. NightingaleP. M. Chen
J. FlintUniversity of Michigan
![Page 2: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/2.jpg)
Motivation
• Distributed file systems are often much slower than local file systems– Due to synchronous operations required for
cache coherence and data safety– Even true for file systems that weaken
consistency and safety guarantees• Close-to-open consistency for AFS and
most versions of NFS
![Page 3: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/3.jpg)
A better solution
• Most of these synchronous operations havepredictable outcomes– We can bet on the outcome and let the client
process go forward (speculation)• Make operation asynchronous
– Must take before that a checkpoint of the process• Can restart operation if speculation failed
![Page 4: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/4.jpg)
Why it works
1. Clients can correctly predict the outcome of many operations
• Few concurrent accesses to files2. Time to take a lightweight checkpoint is often
less than network round-trip time• 52 ms for a small process thanks to
copy-on-write
3. Most clients have free cycles
![Page 5: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/5.jpg)
Speculator
• File system controls when speculations start, succeed and fail
• Speculator provides a mechanism to ensure correct execution of speculative code
• No application changes are required• Speculative state is never visible from the
outside
![Page 6: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/6.jpg)
Correctness rules (I)
• A process that executes in speculative mode cannot externalize output– Speculator blocks the process
• Speculator tracks causal dependencies between kernel objects– Kernel objects modified by a speculative
process will be put in a speculative state
![Page 7: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/7.jpg)
Correctness rules (II)
• Speculator tracks causal dependencies between processes– Processes receiving a message or a signal
from a speculative process will be checkpointed and become speculative
• In case of doubt, Speculator will block the execution of the speculative process
![Page 8: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/8.jpg)
An example: conventional NFS
![Page 9: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/9.jpg)
An example: conventional NFS
• Linux 2.4.21 NFSv3 implements close to open consistency
– At close time, client sends to server:1. Asynchronous write calls with the
modified data2. A synchronous commit call once it
has received replies for all write calls
![Page 10: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/10.jpg)
An example: SpecNFS
![Page 11: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/11.jpg)
An example: SpecNFS
• All calls are non-blocking but force the calling process to become speculative
• If a call returns an unexpected result, the calling process is rolled back to its checkpoint and the call is executed again– A new speculation starts
![Page 12: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/12.jpg)
Speculation interface
• Three new system calls:– Create_speculation():
• Returns unique spec_id and a list of previous speculations on which the speculation depends
– Commit_speculation(spec_id)– Fail_speculation(spec_id)
![Page 13: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/13.jpg)
Implementing checkpoints
• Checkpoints are implemented throughcopy-on-write fork– Speculator also saves the state of any open
file descriptor and copies all pending signals• Forked child is not placed on the ready queue
– It just waits• If speculation fails, forked child assumes the
identity of the failed parent
![Page 14: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/14.jpg)
New kernel structures
• Speculation structure:– Created during create_speculation()– Tracks the set of kernel objects that depend
on the speculation• Undo log:
– Associated with each kernel object that has a speculative state
– Ordered list of speculative modifications
![Page 15: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/15.jpg)
Sharing checkpoints
• Letting successive speculations share the same checkpoint reduces the speculation overhead
• Two limitations– Speculator limits the amount of rollback work
by not letting speculation share a checkpoint that is more than 500 ms old
– Cannot let a speculation share a checkpoint with a previous speculation that changes state of file system
![Page 16: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/16.jpg)
Correctness invariants
1. Speculative state should never be visible to the user or to any external device– Speculator prevents all speculative
processes from externalizing output to any interface
2. A process should never view speculative state unless it is already speculatively dependent upon that state.
![Page 17: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/17.jpg)
Invariant implementations (I)
• First Implementation:Block speculative processes whenever they try to perform a system call– Always correct– Limits the amount of work that can be done by
a process in a speculative state
![Page 18: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/18.jpg)
Invariant implementations (II)
• Second Implementation:Allow speculative processes to perform systems calls that– Do not modify state
• “Read-only” calls such as getpid()– Only modify state that is private to the calling
process• It will be rolled back if speculation fails
![Page 19: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/19.jpg)
Invariant implementations (III)
• Third Implementation:Allow speculative processes to perform operations on files in speculative file systems– With VFS, can have multiple file systems on
the same machine• Typically NFS plus FFS or ext3• Must check type of file system
–Have a special bit in superblock
![Page 20: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/20.jpg)
Multiprocess speculation (I)
• Whenever a speculative process P participates in interprocess communication with a process Q
• Process Q must become speculatively dependent on the speculative state of process P and get checkpointed
![Page 21: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/21.jpg)
Multiprocess speculation (II)
• Whenever a speculative process P modifies an object X
• Object X must become speculatively dependent on the speculative state of process P and get an undo list
You are not responsible for the implementation details
![Page 22: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/22.jpg)
Performance: PostMark benchmark
![Page 23: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/23.jpg)
Performance: PostMark benchmark
• SpecNFS is– 2.5 times faster than NFS with no latency
between client and server– 41 times faster than NFS with a 30ms round-
trip time delay between client and server• A version of BlueFS providing single-copy
semantics is 49 times faster than NFS with same 30ms round-trip time delay
![Page 24: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/24.jpg)
Performance: Apache benchmark
![Page 25: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/25.jpg)
Performance: Apache benchmark
• Building Apache server from a tarred file• SpecNFS is
– 2 times faster than NFS with no latency between client and server
– 14 times faster than NFS with a 30ms round-trip time delay between client and serve
– Always better than BlueFS and Coda
![Page 26: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/26.jpg)
Performance: impact of rollbacks
![Page 27: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/27.jpg)
Performance: impact of rollbacks
• Repeated Apache benchmark marking avarying fraction of the files out-of-date– Will result in speculation failures– Percentage of out-of-date files has little
impact on SpecNFS performance
![Page 28: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/28.jpg)
Performance: other
![Page 29: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/29.jpg)
Performance: other
• Impact of group commits and sharing state– Mostly affects Blue FS
• When speculative processes cannot propagate their state, Blue FS performs worse than NFS with no latency between client and server
• Impact magnified at 30ms latency
![Page 30: SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f145503460f94c28daf/html5/thumbnails/30.jpg)
Conclusion
• Speculation enables the development of distributed file systems that are– Safe– Consistent– Fast
• Generic kernel support for speculative execution and causal dependency tracking could have many other applications