ultra high bandwidth secure wireless...
TRANSCRIPT
![Page 1: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/1.jpg)
Clusters
Paul Krzyzanowski
Distributed Systems
Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
![Page 2: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/2.jpg)
Designing highly available systems
Incorporate elements of fault-tolerant design
– Replication, TMR
Fully fault tolerant system will offernon-stop availability
– You can’t achieve this!
Problem: expensive!
![Page 3: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/3.jpg)
Designing highly scalable systems
SMP architecture
Problem:performance gain as f(# processors) is sublinear
– Contention for resources (bus, memory, devices)
– Also … the solution is expensive!
![Page 4: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/4.jpg)
Clustering
Achieve reliability and scalability by interconnecting multiple independent systems
Cluster: group of standard, autonomous servers configured so they appear on the network as a single machine
approach single system image
![Page 5: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/5.jpg)
Ideally…
• Bunch of off-the shelf machines
• Interconnected on a high speed LAN
• Appear as one system to external users
• Processors are load-balanced– May migrate
– May run on different systems
– All IPC mechanisms and file access available
• Fault tolerant– Components may fail
– Machines may be taken down
![Page 6: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/6.jpg)
we don’t get all that (yet)
(at least not in one package)
![Page 7: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/7.jpg)
Clustering types
• Supercomputing (HPC)
• Batch processing
• High availability (HA)
• Load balancing
![Page 8: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/8.jpg)
High Performance Computing(HPC)
![Page 9: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/9.jpg)
The evolution of supercomputers
• Target complex applications:
– Large amounts of data
– Lots of computation
– Parallelizable application
• Many custom efforts
– Typically Linux + message passing software + remote exec + remote monitoring
![Page 10: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/10.jpg)
Clustering for performance
Example: One popular effort
– Beowulf• Initially built to address problems associated with
large data sets in Earth and Space Science applications
• From Center of Excellence in Space Data & Information Sciences (CESDIS), division of University Space Research Association at the Goddard Space Flight Center
![Page 11: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/11.jpg)
What makes it possible
• Commodity off-the-shelf computers are cost effective
• Publicly available software:– Linux, GNU compilers & tools
– MPI (message passing interface)
– PVM (parallel virtual machine)
• Low cost, high speed networking
• Experience with parallel software
– Difficult: solutions tend to be custom
![Page 12: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/12.jpg)
What can you run?
• Programs that do not require fine-grain communication
• Nodes are dedicated to the cluster– Performance of nodes not subject to external factors
• Interconnect network isolated from external network– Network load is determined only by application
• Global process ID provided– Global signaling mechanism
![Page 13: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/13.jpg)
Beowulf configuration
Includes:
– BPROC: Beowulf distributed process space• Start processes on other machines
• Global process ID, global signaling
– Network device drivers• Channel bonding, scalable I/O
– File system (file sharing is generally not critical)
• NFS root
• unsynchronized
• synchronized periodically via rsync
![Page 14: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/14.jpg)
Programming tools: MPI
• Message Passing Interface
• API for sending/receiving messages– Optimizations for shared memory & NUMA
– Group communication support
• Other features:– Scalable file I/O
– Dynamic process management
– Synchronization (barriers)
– Combining results
![Page 15: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/15.jpg)
Programming tools: PVM
• Software that emulates a general-purpose heterogeneous computing framework on interconnected computers
• Present a view of virtual processing elements– Create tasks
– Use global task IDs
– Manage groups of tasks
– Basic message passing
![Page 16: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/16.jpg)
Beowulf programming tools
• PVM and MPI libraries
• Distributed shared memory– Page based: software-enforced ownership and consistency
policy
• Cluster monitor
• Global ps, top, uptime tools
• Process management– Batch system
– Write software to control synchronization and load balancing with MPI and/or PVM
– Preemptive distributed scheduling: not part of Beowulf (two packages: Condor and Mosix)
![Page 17: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/17.jpg)
Another example
• Rocks Cluster Distribution
– Based on CentOS Linux
– Mass installation is a core part of the system• Mass re-installation for application-specific configurations
– Front-end central server + compute & storage nodes
– Rolls: collection of packages• Base roll includes: PBS (portable batch system), PVM (parallel
virtual machine), MPI (message passing interface), job launchers, …
![Page 18: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/18.jpg)
Another example• Microsoft HPC Server 2008
– Windows Server 2008 + clustering package
– Systems Management• Management Console: plug-in to System Center UI with support for
Windows PowerShell
• RIS (Remote Installation Service)
– Networking• MS-MPI (Message Passing Interface)
• ICS (Internet Connection Sharing) : NAT for cluster nodes• Network Direct RDMA (Remote DMA)
– Job scheduler
– Storage: iSCSI SAN and SMB support
– Failover support
![Page 19: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/19.jpg)
Batch Processing
![Page 20: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/20.jpg)
Batch processing
• Common application: graphics rendering
– Maintain a queue of frames to be rendered
– Have a dispatcher to remotely exec process
• Virtually no IPC needed
• Coordinator dispatches jobs
![Page 21: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/21.jpg)
Single-queue work distributionRender Farms:
Pixar:• 1,024 2.8 GHz Xeon processors running Linux and Renderman
• 2 TB RAM, 60 TB disk space
• Custom Linux software for articulating, animating/lighting (Marionette), scheduling (Ringmaster), and rendering (RenderMan)
• Cars: each frame took 8 hours to Render. Consumes ~32 GB storage on a SAN
DreamWorks:• >3,000 servers and >1,000 Linux desktops
HP xw9300 workstations and HP DL145 G2 servers with 8 GB/server
• Shrek 3: 20 million CPU render hours. Platform LSF used for scheduling + Maya for modeling + Avid for editing+ Python for pipelining – movie uses 24 TB storage
![Page 22: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/22.jpg)
Single-queue work distribution
Render Farms:
–ILM:• 3,000 processor (AMD) renderfarm; expands to 5,000 by harnessing
desktop machines
• 20 Linux-based SpinServer NAS storage systems and 3,000 disks from Network Appliance
• 10 Gbps ethernet
–Sony Pictures’ Imageworks:• Over 1,200 processors
• Dell and IBM workstations
• almost 70 TB data for Polar Express
![Page 23: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/23.jpg)
Batch Processing
OpenPBS.org:
– Portable Batch System
– Developed by Veridian MRJ for NASA
• Commands
– Submit job scripts• Submit interactive jobs
• Force a job to run
– List jobs
– Delete jobs
– Hold jobs
![Page 24: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/24.jpg)
Load Balancingfor the web
![Page 25: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/25.jpg)
Functions of a load balancer
Load balancing
Failover
Planned outage management
![Page 26: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/26.jpg)
Redirection
Simplest technique
HTTP REDIRECT error code
![Page 27: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/27.jpg)
Redirection
Simplest technique
HTTP REDIRECT error code
www.mysite.com
![Page 28: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/28.jpg)
Redirection
Simplest technique
HTTP REDIRECT error code
www.mysite.com
REDIRECTwww03.mysite.com
![Page 29: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/29.jpg)
Redirection
Simplest technique
HTTP REDIRECT error code
www03.mysite.com
![Page 30: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/30.jpg)
Redirection
• Trivial to implement
• Successive requests automatically go to the same web server
– Important for sessions
• Visible to customer
– Some don’t like it
• Bookmarks will usually tag a specific site
![Page 31: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/31.jpg)
Software load balancer
e.g.: IBM Interactive Network Dispatcher Software
Forwards request via load balancing– Leaves original source address
– Load balancer not in path of outgoing traffic (high bandwidth)
– Kernel extensions for routing TCP and UDP requests
• Each client accepts connections on its own address and dispatcher’s address
• Dispatcher changes MAC address of packets.
![Page 32: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/32.jpg)
Software load balancer
www.mysite.com
![Page 33: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/33.jpg)
Software load balancer
www.mysite.com
src=bobby, dest=www03
![Page 34: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/34.jpg)
Software load balancer
www.mysite.com
src=bobby, dest=www03
response
![Page 35: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/35.jpg)
Load balancing router
Routers have been getting smarter
– Most support packet filtering
– Add load balancing
Cisco LocalDirector, Altheon, F5 Big-IP
![Page 36: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/36.jpg)
Load balancing router
• Assign one or more virtual addresses to physical address– Incoming request gets mapped to physical address
• Special assignments can be made per port– e.g. all FTP traffic goes to one machine
Balancing decisions:
– Pick machine with least # TCP connections
– Factor in weights when selecting machines
– Pick machines round-robin
– Pick fastest connecting machine (SYN/ACK time)
![Page 37: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/37.jpg)
High Availability(HA)
![Page 38: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/38.jpg)
High availability (HA)
Class LevelAnnual Downtime
Continuous 100% 0
Six nines(carrier class switches)
99.9999% 30 seconds
Fault Tolerant(carrier-class servers)
99.999% 5 minutes
Fault Resilient 99.99% 53 minutes
High Availability 99.9% 8.3 hours
Normal availability
99-99.5% 44-87 hours
![Page 39: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/39.jpg)
Clustering: high availability
Fault tolerant designStratus, NEC, Marathon technologies
– Applications run uninterrupted on a redundant subsystem• NEC and Stratus has applications running in lockstep
synchronization
– Two identical connected systems
– If one server fails, other takes over instantly
Costly and inefficient
– But does what it was designed to do
![Page 40: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/40.jpg)
Clustering: high availability• Availability addressed by many:
– Sun, IBM, HP, Microsoft, SteelEye Lifekeeper, …
• If one server fails– Fault is isolated to that node
– Workload spread over surviving nodes
– Allows scheduled maintenance without disruption
– Nodes may need to take over IP addresses
![Page 41: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/41.jpg)
Example: Windows Server 2003 clustering
• Network load balancing– Address web-server bottlenecks
• Component load balancing– Scale middle-tier software (COM objects)
• Failover support for applications– 8-node failover clusters
– Applications restarted on surviving node
– Shared disk configuration using SCSI or fibre channel
– Resource group: {disk drive, IP address, network name, service} can be moved during failover
![Page 42: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/42.jpg)
Example: Windows Server 2003 clustering
Top tier: cluster abstractions
– Failover manager, resource monitor, cluster registry
Middle tier: distributed operations
– Global status update, quorum (keeps track of who’s in charge), membership
Bottom tier: OS and drivers
– Cluster disk driver, cluster network drivers
– IP address takeover
![Page 43: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/43.jpg)
Clusters
Architectural models
![Page 44: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/44.jpg)
HA issues
How do you detect failover?
How long does it take to detect?
How does a dead application move/restart?
Where does it move to?
![Page 45: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/45.jpg)
Heartbeat network
• Machines need to detect faulty systems– “ping” mechanism
• Need to distinguish system faults from network faults– Useful to maintain redundant networks
– Send a periodic heartbeat to test a machine’s liveness
– Watch out for split-brain!
• Ideally, use a network with a bounded response time– Lucent RCC used a serial line interconnect
– Microsoft Cluster Server supports a dedicated “private network”
• Two network cards connected with a pass-through cable or hub
![Page 46: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/46.jpg)
Failover Configuration Models
Active/Passive (N+M nodes)– M dedicated failover node(s) for N active nodes
Active/Active– Failed workload goes to remaining nodes
![Page 47: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/47.jpg)
Design options for failover
Cold failover
– Application restart
Warm failover
– Application checkpoints itself periodically
– Restart last checkpointed image
– May use writeahead log (tricky)
Hot failover
– Application state is lockstep synchronized
– Very difficult, expensive (resources), prone to software faults
![Page 48: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/48.jpg)
Design options for failover
With either type of failover …
Multi-directional failover
– Failed applications migrate to / restart on available systems
Cascading failover
– If the backup system fails, application can be restarted on another surviving system
![Page 49: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/49.jpg)
System support for HA
• Hot-pluggable devices
– Minimize downtime for component swapping
• Redundant devices
– Redundant power supplies
– Parity on memory
– Mirroring on disks (or RAID for HA)
– Switchover of failed components
• Diagnostics
– On-line serviceability
![Page 50: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/50.jpg)
Shared resources (disk)
Shared disk
– Allows multiple systems to share access to disk drives
– Works well if applications do not generate much disk I/O
– Disk access must be synchronizedSynchronization via a distributed lock manager (DLM)
![Page 51: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/51.jpg)
Shared resources (disk)
Shared nothing
– No shared devices
– Each system has its own storage resources
– No need to deal with DLMs
– If a machine A needs resources on B, Asends a message to B• If B fails, storage requests have to be switched
over to a live node
![Page 52: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/52.jpg)
Cluster interconnectsTraditional WANs and LANs may be slow as cluster interconnect
– Connecting server nodes, storage nodes, I/O channels, even memory pages
– Storage Area Network (SAN)• Fibre channel connectivity to external storage devices
• Any node can be configured to access any storage through a fibre channel switch
– System Area Network (SAN)• Switched interconnect to switch cluster resources
• Low-latency I/O without processor intervention
• Scalable switching fabric
• (Compaq, Tandem’s ServerNet)
• Microsoft Windows 2000 supports Winsock Direct for SAN communication
![Page 53: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/53.jpg)
Achieving High Availability
heartbeat 2
heartbeat 3
Server A Server B
Fibre channelswitch
Fibre channelswitch
Fabric A Fabric B
Storage Area Network
Local Area Networks
switch Bswitch A heartbeat
![Page 54: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/54.jpg)
Achieving High Availability
heartbeat 2
heartbeat 3
Server A Server B
Ethernet switch A’
Ethernet switch B’
ethernet A ethernet B
Storage AreaNetwork (iSCSI)
Local Area Networks
switch BSwitch A heartbeat
![Page 55: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/55.jpg)
HA Storage: RAID
Redundant Array of Independent (Inexpensive) Disks
![Page 56: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/56.jpg)
RAID 0: Performance
Striping• Advantages:
– Performance
– All storage capacity can be used
• Disadvantage:
– Not fault tolerant
![Page 57: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/57.jpg)
RAID 1: HA
Mirroring• Advantages:
– Double read speed
– No rebuild necessary if a disk fails: just copy
• Disadvantage:
– Only half thespace
![Page 58: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/58.jpg)
RAID 3: HA
Separate parity disk• Advantages:
– Very fast reads
– High efficiency: low ratio of parity/data
• Disadvantages:
– Slow random I/O performance
– Only one I/Oat a time
![Page 59: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/59.jpg)
RAID 5
Interleaved parity• Advantages:
– Very fast reads
– High efficiency: low ratio of parity/data
• Disadvantage:
– Slower writes
– Complexcontroller
![Page 60: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/60.jpg)
RAID 1+0
Combine mirroring and striping– Striping across a set of disks
– Mirroring of the entire set onto another set
![Page 61: Ultra High Bandwidth Secure Wireless Interfacepxk/rutgers/notes/content/22-clusters-slides.pdfProgramming tools: PVM •Software that emulates a general-purpose heterogeneous computing](https://reader034.vdocuments.us/reader034/viewer/2022050514/5f9eb272136fd103af1da31d/html5/thumbnails/61.jpg)
The end