a highly available network file server
DESCRIPTION
TRANSCRIPT
- 1. A Highly Available Network File Server
Bhide et al. (1991)
Presented By: Anand Janjal (CS 8631)
1 - 2. Network File Server Reliability
The problem of network file server reliability divided into three sub problems:
1) Server reliability
2) Disk reliability
3) Network reliability
2 - 3. Contd.
1) Server reliability: Dual ported disks and impersonation
Dual ported disks: Allows the drive to continue functioning when one port becomes nonfunctional, eliminating single point of failure
2) Disk reliability: Disk mirroring
Disk Mirroring: Disk mirroring is the replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability.
3) Network reliability: Network replication
Source: http://wiki.answers.com/Q/Single_port_hard_drive_vs_dual_port_hard_drive
http://en.wikipedia.org/wiki/Disk_mirroring
3 - 4. Mirroring
Fast recovery from disk failures achieved by mirroring the files on different disks
All copies of the same file are on the disks controlled by the same server to eliminate the overhead of ensuring consistency and coherence between two servers
Mirroring used for applications required continuous availability, otherwise archival backup is used
4 - 5. Network Failures
Network failures are tolerated by optional replication of network components, including the transmission medium
Packets are NOT replicated over the two networks
Network load is distributed over the networks
5 - 6. Contd.
Reliability in NFS by server replication suffers from resource overhead, performance degradation and increased complexity
Replicated servers use expensive protocols to maintain consistency and coherence leading to performance degradation
Uses complex protocols to update the state of the stale replica when repaired after the failure
Handling network partition requires quorum management which increases the system complexity
6 - 7. HA NFS
HA-NFS adheres the semantics of Sun NFS.
Server failures are tolerated by using dual ported disks accessible to the two servers each acting as backup for other.
Disks are divided into sets, each served by one server during normal operation
Each server maintains on its disk enough information to reconstruct its current volatile state
Servers exchange liveness checking messages
7 - 8. Design goals of HA NFS
1) Failure and recovery transparent to the applications running on file servers clients; failure must not force operation in progress to terminate
2) Failure free performance must not be penalized to provide high availability
3) NFS client protocol implementation should not require modification to use HA NFS servers
8 - 9. Contd.
HA NFS implemented on top of AIXv3 journaled file system
AIXv3 provides serializable and atomic modification of the file system meta data by using transactional locking and logging techniques.
In the event of failure, meta data are restored to a consistent state by applying the changes contained in the log
Reliability of files ensured by NFS semantics: Forcing data to disk before sending an acknowledgement to the client
9 - 10. Contd.
AIXv3 supports logical volumes, which can be mirrored to provide the disk reliability.
Though NFS is a stateless file server protocol, most implementations maintain a small amount of the state information
NFS server maintains a reply cache to maintain successful, non idempotent RPC
HA NFS records changes to volatile state of AIXv3 disk log, to reconstruct the reply cache in the event of failure
10 - 11. HA NFS Architecture
Consists of two NFS servers sharing a number of SCSI buses
Each shared SCSI bus and the disk connected to it have a designated primary server (to balance the load across the servers)
During normal operation, disks are served by their corresponding primary server
Each server has two network interfaces and IP addresses
Primary interface normal operation; secondary interface- impersonation of the other server during the failure
Figure 1- Where is it?
11 - 12. Normal operation
Server performs the operation described in each NFS RPC it receives
Upon success, the meta data changes are recorded to the AIXv3 log
An entry added in the reply cache for the RPC
Upon failure, server checks if there is an entry in the reply cache corresponding to the RPC
If an entry is found, then RPC is a retry of the non idempotent operation succeeded before; else server replies with an error code to the client
12 - 13. Take over
If a server fails, disks are taken over by the other server
The server uses the log to retrieve the reply cache entries of the failed server
Impersonate the failed server by changing its secondary network interface to the primary address of the failed server
Packets destined for the failed server will be received by the live server on its secondary interface
13 - 14. Alternative method
If network interfaces that can change their hardware addresses are not available, then ARP protocol is used
HA NFS sends ARP request to query the hardware address
This query appears to have been sent by the failed servers IP address, but the source address of the secondary interface of the live server.
14 - 15. Re- Integration
When a server comes up, it has its primary network interface turned off and sends a reintegration request with the secondary network interface to the backup server
Two servers periodically check the status through liveness messages till the second server reintegrates itself into the system
15 - 16. Network Failure
Replicates the network to tolerate the network failure
Recovery from the server failure does not require any changes to the client
Recovery from the network failure requires a daemon to run on the client to observe the status of each network and reroute the requests to the operational network
Where is figure 2?
16 - 17. Contd.
Sever broadcasts heart beat messages
When the daemon on the client does not receive the heart beat message after a time out period, it concludes that the path to the primary interface to the server is broken
Daemon updates the clients routing table to use the alterative path to the server
17 - 18. Performance
Performance of HA NFS measured by running a set of experiments on a number of RISC System/6000 family workstations connected by 10 Mbit/sec Ethernet
The underlying system uses 4 Kbyte disk blocks
18 - 19. Effect of disk logging
Comparison between HA NFS and a traditional implementation of NFS that doesnt use disk logging
NFS forces data and meta data to the disk before responding to RPC
HA NFS records meta data modification as a log record and requires no disk arm movement
Reply cache entries are piggybacked on the normal disk log information, saving the volatile state on the disk does not incur additional overhead
Disk logging improves the response time of all RPCs that modify the file system structure
19 - 20. Performance of HA NFS
20 - 21. Contd.
Disk logging improvement ranges from 33% for SETATTR and WRITE RPC and up to 75% for MKDIR RPC.
Placing the log on the same disk reduces the performance due to additional disk arm movement
21 - 22. Contd.
The overhead introduced due to mirroring is 17% slow down for WRITE RPC
This is due to variation in the disk arm position among the mirrors
It takes 15 seconds for a backup to perform all tasks related to take over (excluding failure detection), and 30 seconds including the failure detection
It takes 60 seconds for a server to reintegrate into the system after repair/maintenance
22 - 23. Conclusions/Future work
Replicated file servers are well suited for WAN where a client can access the file from the nearest replica
HA NFS offers server reliability by using dual ported disks, and impersonation, disk reliability by using mirroring and network reliability by replication
Impersonation prevents the client from hanging during failure
HA NFS is not flexible; can not tolerate more than one server failure
During failure, disks are not available for a period of 30 seconds
Servers must be physically close due to restriction on the length of the SCSI bus (use of optical links is in consideration)
23 - 24. Questions?
Thank You!
24