petal and frangipani. petal/frangipani petal frangipani nfs “san” “nas”
TRANSCRIPT
![Page 1: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/1.jpg)
Petal and FrangipaniPetal and Frangipani
![Page 2: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/2.jpg)
Petal/FrangipaniPetal/Frangipani
PetalPetal
FrangipaniFrangipani
NFSNFS
““SAN”SAN”
““NAS”NAS”
![Page 3: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/3.jpg)
Petal/FrangipaniPetal/Frangipani
PetalPetal
FrangipaniFrangipani
NFSNFSUntrustedOS-agnostic
FS semanticsSharing/coordinationDisk aggregation (“bricks”)Filesystem-agnosticRecovery and reconfigurationLoad balancingChained declusteringSnapshotsDoes not control sharing
Each “cloud” may resize or reconfigure independently.What indirection is required to make this happen, and where is it?
![Page 4: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/4.jpg)
Remaining SlidesRemaining SlidesThe following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is still available through Chandu Thekkath’s site at www.thekkath.org.
For CPS 212, several issues are important:• Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN).• Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani.• Understand the similarities/differences between Petal and the other reconfigurable cluster service work we have studied: DDS and Porcupine.• Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it.• Understand the nature, purpose, and role of the three key design elements added for Frangipani: leased locks, a write-ownership consistent caching protocol, and server logging for recovery.
![Page 5: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/5.jpg)
5
Petal: Distributed Virtual DisksPetal: Distributed Virtual Disks
Systems Research Center
Digital Equipment Corporation
Edward K. Lee
Chandramohan A. Thekkath
04/18/23
![Page 6: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/6.jpg)
6
Logical System ViewLogical System View
/dev/vdisk1/dev/vdisk2 /dev/vdisk3 /dev/vdisk4
/dev/vdisk5
AdvFS NT FS PC FS UFS
Scalable Network
Petal
![Page 7: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/7.jpg)
7
Physical System ViewPhysical System View
Scalable Network
Petal Server Petal Server Petal Server Petal Server
Parallel Database or Cluster File System
/dev/shared1
![Page 8: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/8.jpg)
8
Virtual DisksVirtual Disks
Each disk provides 2^64 byte address space.
Created and destroyed on demand.
Allocates disk storage on demand.
Snapshots via copy-on-write.
Online incremental reconfiguration.
![Page 9: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/9.jpg)
9
Virtual to Physical TranslationVirtual to Physical Translation
PMap0
vdiskID
offset
(disk, diskOffset)
PMap1
Virtual Disk Directory
GMap
PMap2 PMap3
(server, disk, diskOffset)(vdiskID, offset)
Server 0 Server 1 Server 2 Server 3
![Page 10: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/10.jpg)
10
Global State ManagementGlobal State Management
Based on Leslie Lamport’s Paxos algorithm.
Global state is replicated across all servers.
Consistent in the face of server & network failures.
A majority is needed to update global state.
Any server can be added/removed in the presence of failed servers.
![Page 11: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/11.jpg)
11
Fault-Tolerant Global OperationsFault-Tolerant Global Operations
Create/Delete virtual disks.
Snapshot virtual disks.
Add/Remove servers.
Reconfigure virtual disks.
![Page 12: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/12.jpg)
12
Data Placement & RedundancyData Placement & Redundancy
Supports non-redundant and chained-declustered virtual disks.
Parity can be supported if desired.
Chained-declustering tolerates any single component failure.
Tolerates many common multiple failures.
Throughput scales linearly with additional servers.
Throughput degrades gracefully with failures.
![Page 13: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/13.jpg)
13
Chained DeclusteringChained Declustering
D0
Server0
D3
D4
D7
D1
Server1
D0
D5
D4
D2
Server2
D1
D6
D5
D3
Server3
D2
D7
D6
![Page 14: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/14.jpg)
14
Chained DeclusteringChained Declustering
D0
Server0
D3
D4
D7
Server1
D2
Server2
D1
D6
D5
D3
Server3
D2
D7
D6
D1
D0
D5
D4
![Page 15: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/15.jpg)
15
The PrototypeThe Prototype
Digital ATM network.
• 155 Mbit/s per link.
8 AlphaStation Model 600.
• 333 MHz Alpha running Digital Unix.
72 RZ29 disks.
• 4.3 GB, 3.5 inch, fast SCSI (10MB/s).
• 9 ms avg. seek, 6 MB/s sustained transfer rate.
Unix kernel device driver.
User-level Petal servers.
![Page 16: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/16.jpg)
16
The PrototypeThe Prototype
src-ss1
Digital ATM Network (AN2)
src-ss2 src-ss8
petal1 petal2 petal8
/dev/vdisk1
/dev/vdisk1 /dev/vdisk1 /dev/vdisk1
………
………
![Page 17: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/17.jpg)
17
Throughput ScalingThroughput Scaling
0
2
4
6
8
0 2 4 6 8
Number of Servers
Th
rou
pu
t S
cale
-up LINEAR
512B Rd
8KB Rd
64KB Rd
512B Wr
8KB Wr
64KB Wr
![Page 18: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/18.jpg)
18
Virtual Disk ReconfigurationVirtual Disk Reconfiguration
0
5
10
15
20
25
30
0 1 2 3 4 5 6
Elapsed Time in Minutes
Th
rou
gh
pu
t in
MB
/s
6 servers
8 servers
virtual disk w/ 1GB of allocated storage8KB reads & writes
![Page 19: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/19.jpg)
Frangipani: A Scalable Distributed File Frangipani: A Scalable Distributed File SystemSystem
C. A. Thekkath, T. Mann, and E. K. Lee
Systems Research Center
Digital Equipment Corporation
![Page 20: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/20.jpg)
Why Not An Old File System on Petal?Why Not An Old File System on Petal?
Traditional file systems (e.g., UFS, AdvFS) cannot share a block device
The machine that runs the file system can become a bottleneck
![Page 21: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/21.jpg)
FrangipaniFrangipani
Behaves like a local file system
• multiple machines cooperatively managea Petal disk
• users on any machine see a consistentview of data
Exhibits good performance, scaling, and load balancing
Easy to administer
![Page 22: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/22.jpg)
Ease of AdministrationEase of Administration
Frangipani machines are modular
• can be added and deleted transparently
Common free space pool
• users don’t have to be moved
Automatically recovers from crashes
Consistent backup without halting the system
![Page 23: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/23.jpg)
Components of FrangipaniComponents of Frangipani
File system core
• implements the Digital Unix vnode interface
• uses the Digital Unix Unified Buffer Cache
• exploits Petal’s large virtual space
Locks with leases
Write-ahead redo log
![Page 24: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/24.jpg)
Locks Locks
Multiple reader/single writer
Locks are moderately coarse-grained
• protects entire file or directory
Dirty data is written to disk before lock is given to another machine
Each machine aggressively caches locks• uses lease timeouts for lock recovery
![Page 25: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/25.jpg)
LoggingLogging
Frangipani uses a write ahead redo log for metadata
• log records are kept on Petal
Data is written to Petal
• on sync, fsync, or every 30 seconds
• on lock revocation or when the log wraps
Each machine has a separate log
• reduces contention
• independent recovery
![Page 26: Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649c905503460f94949b28/html5/thumbnails/26.jpg)
RecoveryRecovery
Recovery is initiated by the lock service
Recovery can be carried out on any machine
• log is distributed and available via Petal