©1996, 1997 microsoft corp. 1 ft nt: a tutorial on microsoft cluster server ™ (formerly...
TRANSCRIPT
©1996, 1997 Microsoft Corp.
1
FT NT: A Tutorial on FT NT: A Tutorial on Microsoft Cluster ServerMicrosoft Cluster Server™™
(formerly “Wolfpack”)(formerly “Wolfpack”)
Joe BarreraJoe Barrera
Jim GrayJim Gray
Microsoft Research Microsoft Research {joebar, gray} @ microsoft.com{joebar, gray} @ microsoft.com
http://research.microsoft.com/barchttp://research.microsoft.com/barc
©1996, 1997 Microsoft Corp.
2
OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A
©1996, 1997 Microsoft Corp.
3
DEPENDABILITY: The 3 ITIESDEPENDABILITY: The 3 ITIESRELIABILITY / INTEGRITY:RELIABILITY / INTEGRITY: Does the Does the
right thing.right thing. (also large MTTF)(also large MTTF)
AVAILABILITY:AVAILABILITY: Does it now Does it now. . (also small (also small MTTRMTTR ) )
MTTF+MTTR MTTF+MTTRSystem Availability:System Availability:If 90% of terminals up & 99% of DB up?If 90% of terminals up & 99% of DB up?
(=>89% of transactions are serviced on time(=>89% of transactions are serviced on time).).
Holistic vs. Reductionist viewHolistic vs. Reductionist view
SecurityIntegrityReliability
Availability
©1996, 1997 Microsoft Corp.
4
Case Study - JapanCase Study - Japan"Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi Watanabe)."Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi Watanabe).
VendorVendor (hardware and software) (hardware and software) 5 Months 5 MonthsApplication softwareApplication software 9 Months 9 MonthsCommunications linesCommunications lines 1.5 Years1.5 YearsOperationsOperations 2 Years 2 YearsEnvironment Environment 2 Years 2 Years
10 Weeks10 Weeks1,383 institutions reported (6/84 - 7/85)1,383 institutions reported (6/84 - 7/85)
7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES
To Get 10 Year MTTF, Must Attack All These AreasTo Get 10 Year MTTF, Must Attack All These Areas
42%
12%
25%9.3%
11.2%
Vendor
Environment
OperationsApplication
Software
Tele Comm lines
©1996, 1997 Microsoft Corp.
5
Case Studies - Tandem TrendsCase Studies - Tandem Trends
MTTF improved MTTF improved
ShiftShift from from Hardware & Maintenance to from 50% to 10%Hardware & Maintenance to from 50% to 10%
toto Software (62%) & Operations (15%)Software (62%) & Operations (15%)
NOTE: Systematic under-reporting ofNOTE: Systematic under-reporting of EnvironmentEnvironmentOperations errorsOperations errorsApplication Software Application Software
unknown environment operations maintenance hardware software
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
100
1985 1987 1989
0
20
40
60
80
1 00
1 20
1985 19 87 1 989
Outag es/ 1000 Syste m Yearsby Primar y Cause
% of Outage s by Pri mary Cause
©1996, 1997 Microsoft Corp.
6
Summary of FT StudiesSummary of FT StudiesCurrent Situation: ~4-year MTTF => Current Situation: ~4-year MTTF =>
Fault Tolerance Works.Fault Tolerance Works.Hardware is GREAT (maintenance and MTTF).Hardware is GREAT (maintenance and MTTF).Software masks most hardware faults.Software masks most hardware faults.Many Many hiddenhidden software outages in operations: software outages in operations:
New Software.New Software.Utilities.Utilities.
Must make all software ONLINE.Must make all software ONLINE.Software seems to define a 30-year MTTF ceiling.Software seems to define a 30-year MTTF ceiling.
Reasonable Goal: 100-year MTTF.Reasonable Goal: 100-year MTTF. class 4 today class 4 today =>=> class 6class 6 tomorrow.tomorrow.
©1996, 1997 Microsoft Corp.
7
Fault Tolerance vs Disaster ToleranceFault Tolerance vs Disaster Tolerance
Fault-Tolerance:Fault-Tolerance: mask local faults mask local faults RAID disksRAID disks Uninterruptible Power SuppliesUninterruptible Power Supplies Cluster Failover Cluster Failover
Disaster Tolerance:Disaster Tolerance: masks site failures masks site failures Protects against fire, flood, sabotage,..Protects against fire, flood, sabotage,.. Redundant system and service at remote Redundant system and service at remote
site. site.
©1996, 1997 Microsoft Corp.
8
The Microsoft “Vision”: The Microsoft “Vision”: Plug & Play DependabilityPlug & Play Dependability
Integrity / SecurityIntegrityReliability
Availability
Transactions for reliabilityTransactions for reliability Clusters: for availabilityClusters: for availability SecuritySecurity All built into the OS All built into the OS
©1996, 1997 Microsoft Corp.
9
Cluster GoalsCluster Goals ManageabilityManageability
Manage nodes as a single systemManage nodes as a single system Perform server maintenance without affecting usersPerform server maintenance without affecting users Mask faults, so repair is non-disruptiveMask faults, so repair is non-disruptive
AvailabilityAvailability Restart failed applications & serversRestart failed applications & servers
• un-availability ~ MTTR / MTBF , so quick repair.un-availability ~ MTTR / MTBF , so quick repair. Detect/warn administrators of failuresDetect/warn administrators of failures
ScalabilityScalability Add nodes for incremental Add nodes for incremental
• processing processing • storagestorage• bandwidthbandwidth
©1996, 1997 Microsoft Corp.
10
Fault Model Failures are independent
So, single fault tolerance is a big win Hardware fails fast (blue-screen) Software fails-fast (or goes to sleep) Software often repaired by reboot:
Heisenbugs Operations tasks: major source of outage
Utility operationsSoftware upgrades
©1996, 1997 Microsoft Corp.
11
Cluster: Servers Combined to Cluster: Servers Combined to Improve Availability & ScalabilityImprove Availability & Scalability
ClusterCluster: : A group of independent systems working A group of independent systems working together as a single system. together as a single system. Clients see scalable & FT services (single system image).Clients see scalable & FT services (single system image).
NodeNode: A server in a cluster. May be an SMP server.: A server in a cluster. May be an SMP server. InterconnectInterconnect: Communications link used for intra-: Communications link used for intra-
cluster status info such as “heartbeats”. Can be Ethernet.cluster status info such as “heartbeats”. Can be Ethernet.Client PCsClient PCs PrintersPrinters
Server AServer A
Disk array ADisk array ADisk array BDisk array B
Server BServer B
InterconnectInterconnect
©1996, 1997 Microsoft Corp.
12
Microsoft Cluster ServerMicrosoft Cluster Server™™ 2-node availability Summer 97 2-node availability Summer 97 (20,000 Beta Testers now)(20,000 Beta Testers now)
Commoditize fault-tolerance (high availability)Commoditize fault-tolerance (high availability) Commodity hardware (no special hardware)Commodity hardware (no special hardware) Easy to set up and manageEasy to set up and manage Lots of applications work out of the box.Lots of applications work out of the box.
16-node scalability later 16-node scalability later (next year?)(next year?)
©1996, 1997 Microsoft Corp.
13
Web Web sitesite
DatabaseDatabase
Web site filesWeb site files
Database filesDatabase files
Server 1Server 1 Server 2Server 2
BrowserBrowser
Failover ExampleFailover Example
Web Web sitesite
DatabaseDatabase
Server 1Server 1 Server 2Server 2
©1996, 1997 Microsoft Corp.
14
Client/Server Software failure Admin shutdown Server failure
MS Press Failover DemoMS Press Failover Demo
!
Resource States
- Pending- Pending
- Partial- Partial
- Failed- Failed
- Offline- Offline
©1996, 1997 Microsoft Corp.
Windows NT Server Cluster
SCSI Disk CabinetSCSI Disk Cabinet
SharedDisks
LocalDisks
Demo ConfigurationDemo Configuration Server “Alice”Server “Alice”
SMP PentiumSMP Pentium®® Pro Processors Pro ProcessorsWindows NT Server with WolfpackWindows NT Server with WolfpackMicrosoft Internet Information ServerMicrosoft Internet Information ServerMicrosoft SQL ServerMicrosoft SQL Server
Server “Betty”Server “Betty”
SMP PentiumSMP Pentium®® Pro Processors Pro ProcessorsWindows NT Server with WolfpackWindows NT Server with WolfpackMicrosoft Internet Information ServerMicrosoft Internet Information ServerMicrosoft SQL ServerMicrosoft SQL Server
InterconnectInterconnectstandard Ethernetstandard Ethernet
ClientClient
Windows NT WorkstationWindows NT WorkstationInternet ExplorerInternet ExplorerMS Press OLTP appMS Press OLTP app
AdministratorAdministrator
Windows NT WorkstationWindows NT WorkstationCluster AdminCluster AdminSQL Enterprise MgrSQL Enterprise Mgr
LocalDisks
©1996, 1997 Microsoft Corp.
Windows NT Server Cluster
SCSI Disk CabinetSCSI Disk Cabinet
SharedDisks
LocalDisks
Demo AdministrationDemo Administration
ClientClient
Server “Alice”Server “Alice”
Runs SQL TraceRuns SQL TraceRuns GlobeRuns Globe
Server “Betty”Server “Betty”
Run SQL TraceRun SQL Trace
LocalDisks
Cluster Admin ConsoleCluster Admin ConsoleWindows GUIWindows GUIShows cluster resource statusShows cluster resource statusReplicates status to all serversReplicates status to all serversDefine apps & related resourcesDefine apps & related resourcesDefine resource dependenciesDefine resource dependenciesOrchestrates recovery orderOrchestrates recovery order
SQL Enterprise MgrSQL Enterprise MgrWindows GUIWindows GUIShows server statusShows server statusManages many serversManages many serversStart, stop manage DBsStart, stop manage DBs
©1996, 1997 Microsoft Corp.
17
Generic Stateless ApplicationGeneric Stateless ApplicationRotating GlobeRotating Globe
Mplay32 is generic app.Mplay32 is generic app. Registered with MSCSRegistered with MSCS MSCS restarts it on failureMSCS restarts it on failure Move/restart ~ 2 secondsMove/restart ~ 2 seconds Fail-over Fail-over ifif
4 failures 4 failures (= process exits) (= process exits)
in 3 minutesin 3 minutes settable defaultsettable default
©1996, 1997 Microsoft Corp.
Windows NT Server Cluster
SCSI Disk CabinetSCSI Disk Cabinet
SharedDisks
LocalDisks
Demo Moving or Failing Over Demo Moving or Failing Over An ApplicationAn Application
LocalDisks
AVI AVI ApplicationApplication
X
Alice Fails or Alice Fails or Operator Operator Requests moveRequests move
AVI AVI ApplicationApplication
X
©1996, 1997 Microsoft Corp.
19
Generic Stateful ApplicationGeneric Stateful ApplicationNotePadNotePad
Notepad saves state on shared diskNotepad saves state on shared disk Failure before save => lost changesFailure before save => lost changes Failover or move (disk & state move)Failover or move (disk & state move)
©1996, 1997 Microsoft Corp.
Windows NT Server Cluster
SCSI Disk CabinetSCSI Disk Cabinet
SharedDisks
LocalDisks
Demo Step 1: Demo Step 1: Alice Delivering ServiceAlice Delivering Service
LocalDisks
No SQL Activity SQL Activity
IIS
SQL
HTTP
OD
BC
IP
IIS
SQL
OD
BC
©1996, 1997 Microsoft Corp.
Windows NT Server Cluster
SCSI Disk CabinetSCSI Disk Cabinet
SharedDisks
LocalDisks
2: Request Move to Betty2: Request Move to Betty
LocalDisks
HTTP
IIS
SQL
OD
BC
IP
IIS
SQL
OD
BC
No SQL Activity
IP
SQL Activity
©1996, 1997 Microsoft Corp.
Windows NT Server Cluster
SCSI Disk CabinetSCSI Disk Cabinet
SharedDisks
LocalDisks
3: Betty Delivering Service3: Betty Delivering Service
LocalDisks
IIS
SQL
OD
BC
IIS
SQL
OD
BC
No SQL Activity
IP
.
SQL Activity
©1996, 1997 Microsoft Corp.
Windows NT Server Cluster
SCSI Disk CabinetSCSI Disk Cabinet
SharedDisks
LocalDisks
4: Power Fail Betty, Alice Takeover4: Power Fail Betty, Alice Takeover
LocalDisks
IIS
SQL
OD
BC
No SQL Activity
IP
SQL Activity
IIS
SQL
OD
BC
IP
©1996, 1997 Microsoft Corp.
Windows NT Server Cluster
SCSI Disk CabinetSCSI Disk Cabinet
SharedDisks
LocalDisks
5: Alice Delivering Service5: Alice Delivering Service
LocalDisks
No SQL Activity SQL Activity
IIS
SQL
HTTP
OD
BC
IP
©1996, 1997 Microsoft Corp.
Windows NT Server Cluster
SCSI Disk CabinetSCSI Disk Cabinet
SharedDisks
LocalDisks
6: Reboot Betty, now can takeover6: Reboot Betty, now can takeover
LocalDisks
No SQL Activity SQL Activity
IIS
SQL
HTTP
OD
BC
IP
IIS
SQL
OD
BC
©1996, 1997 Microsoft Corp.
26
OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A
©1996, 1997 Microsoft Corp.
27
Cluster and NT AbstractionsCluster and NT Abstractions
ClusterCluster GroupGroup ResourceResource
DomainDomain NodeNode ServiceService
Cluster AbstractionsCluster Abstractions
NT AbstractionsNT Abstractions
©1996, 1997 Microsoft Corp.
28
Basic NT AbstractionsBasic NT Abstractions
DomainDomain NodeNode ServiceService Service: program or device managed by a node
e.g., file service, print service, database server can depend on other services (startup ordering) can be started, stopped, paused, failed
Node: a single (tightly-coupled) NT system hosts services; belongs to a domain services on node always remain co-located unit of service co-location; involved in naming services
Domain: a collection of nodes cooperation for authentication, administration, naming
©1996, 1997 Microsoft Corp.
29
Cluster AbstractionsCluster Abstractions
ClusterCluster ResourceResourceGroupGroup ResourceResource
Resource: program or device managed by a cluster e.g., file service, print service, database server can depend on other resources (startup ordering) can be online, offline, paused, failed
Resource Group: a collection of related resources hosts resources; belongs to a cluster unit of co-location; involved in naming resources
Cluster: a collection of nodes, resources, and groups cooperation for authentication, administration, naming
©1996, 1997 Microsoft Corp.
30
ResourcesResources
Resources have...Resources have... Type: Type: what it does (file, DB, print, web…) what it does (file, DB, print, web…) An operational An operational statestate (online/offline/failed) (online/offline/failed) CurrentCurrent and and possiblepossible nodesnodes Containing Containing Resource GroupResource Group DependenciesDependencies on other resources on other resources Restart parametersRestart parameters (in case of resource failure) (in case of resource failure)
ClusterCluster GroupGroup ResourceResource
©1996, 1997 Microsoft Corp.
31
Resource Types Resource Types Built-in typesBuilt-in types
Generic ApplicationGeneric Application Generic ServiceGeneric Service Internet Information Server Internet Information Server
(IIS) Virtual Root(IIS) Virtual Root Network NameNetwork Name TCP/IP AddressTCP/IP Address Physical DiskPhysical Disk FT Disk (Software RAID)FT Disk (Software RAID) Print SpoolerPrint Spooler File ShareFile Share
Added by othersAdded by others Microsoft SQL Server, Microsoft SQL Server, Message Queues, Message Queues, Exchange Mail Server, Exchange Mail Server, Oracle, Oracle, SAP R/3SAP R/3 Your application? Your application?
(use developer kit wizard).(use developer kit wizard).
©1996, 1997 Microsoft Corp.
32
Physical Disk
©1996, 1997 Microsoft Corp.
33
TCP/IP Address
©1996, 1997 Microsoft Corp.
34
Network Name
©1996, 1997 Microsoft Corp.
35
File Share
©1996, 1997 Microsoft Corp.
36
IIS (WWW/FTP) Server
©1996, 1997 Microsoft Corp.
37
Print Spooler
©1996, 1997 Microsoft Corp.
38
Resource StatesResource States Resources states:Resources states:
OfflineOffline:: exists, not offering serviceexists, not offering service OnlineOnline:: offering serviceoffering service Failed:Failed: not able to offer servicenot able to offer service
Resource failure may cause:Resource failure may cause: local local restartrestart other resources to goother resources to go offlineoffline resource group to resource group to movemove (all subject to group and resource parameters)(all subject to group and resource parameters)
Resource failure detected by:Resource failure detected by: Polling failurePolling failure Node failureNode failure
OnlinePending
Online
Failed
Offline
OfflinePending
Go
Online!
I’m
Online!
I’m
Off-line!
Go
Off-line!
I’mhere!
©1996, 1997 Microsoft Corp.
39
Resource DependenciesResource Dependencies Similar to NT Service DependenciesSimilar to NT Service Dependencies Orderly startup & shutdownOrderly startup & shutdown
A resource is brought online A resource is brought online after after any any resources it depends on are online.resources it depends on are online.
A Resource is taken offline A Resource is taken offline beforebefore any any resources it depends onresources it depends on
Interdependent resources Interdependent resources Form Form dependency treesdependency trees move among nodes togethermove among nodes together failover togetherfailover together as per resource groupas per resource group
Network Name
IP AddressResource DLL
IIS Virtual Root
File Share
©1996, 1997 Microsoft Corp.
40
Dependencies Tab
©1996, 1997 Microsoft Corp.
41
NT RegistryNT Registry Stores all configuration informationStores all configuration information
Software Software HardwareHardware
Hierarchical (name, value) mapHierarchical (name, value) map Has a open, documented interfaceHas a open, documented interface Is secureIs secure Is visible across the net (RPC interface)Is visible across the net (RPC interface) Typical Entry:Typical Entry:
\Software\Microsoft\MSSQLServer\MSSQLServer\\Software\Microsoft\MSSQLServer\MSSQLServer\DefaultLogin = “GUEST”DefaultLogin = “GUEST”DefaultDomain = “REDMOND”DefaultDomain = “REDMOND”
©1996, 1997 Microsoft Corp.
42
Cluster RegistryCluster Registry Separate from local NT Registry Separate from local NT Registry Replicated at each nodeReplicated at each node
Algorithms explained laterAlgorithms explained later
Maintains configuration information:Maintains configuration information: Cluster membersCluster members Cluster resourcesCluster resources Resource and group parameters (e.g. restart)Resource and group parameters (e.g. restart)
Stable storageStable storage Refreshed from “master” copy when node joins Refreshed from “master” copy when node joins
clustercluster
©1996, 1997 Microsoft Corp.
43
Other Resource PropertiesOther Resource Properties NameName Restart policy (restart N times, failover…)Restart policy (restart N times, failover…) Startup parametersStartup parameters Private configuration info (resource type specific)Private configuration info (resource type specific)
Per-node as well, if necessaryPer-node as well, if necessary Poll Intervals (LooksAlive, IsAlive, TimeoutPoll Intervals (LooksAlive, IsAlive, Timeout))
These properties are all kept in Cluster RegistryThese properties are all kept in Cluster Registry
©1996, 1997 Microsoft Corp.
44
General Resource Tab
©1996, 1997 Microsoft Corp.
45
Advanced Resource Tab
©1996, 1997 Microsoft Corp.
46
Resource GroupsResource Groups
Every resource belongs to a Every resource belongs to a resource group.resource group.
Resource groups Resource groups move move (failover) as a unit(failover) as a unit
Dependencies Dependencies NEVERNEVER cross cross groups. (Dependency groups. (Dependency treestrees contained within groups.)contained within groups.)
Group may contain Group may contain forestforest of of dependency treesdependency trees
ClusterCluster GroupGroup ResourceResource
Drive E:IP Address
SQLServer
Web Server
Drive F:
Payroll Group
©1996, 1997 Microsoft Corp.
47
Moving a Resource Group
©1996, 1997 Microsoft Corp.
48
Group PropertiesGroup Properties CurrentState:CurrentState: Online, Partially Online, Offline Online, Partially Online, Offline
Members:Members: resources that belong to group resources that belong to group members determine which nodes can host group.members determine which nodes can host group.
Preferred Owners:Preferred Owners: ordered list of host nodesordered list of host nodes
FailoverThreshold:FailoverThreshold: How many faults cause failover How many faults cause failover
FailoverPeriod:FailoverPeriod: Time window for failover threshold Time window for failover threshold
FailbackWindowsStart:FailbackWindowsStart: When can failback happen? When can failback happen?
FailbackWindowEnd:FailbackWindowEnd: When can failback happen? When can failback happen?
Everything (except CurrentState) is stored in registryEverything (except CurrentState) is stored in registry
©1996, 1997 Microsoft Corp.
49
Failover and FailbackFailover and Failback Failover parametersFailover parameters
timeout on LooksAlive, IsAlivetimeout on LooksAlive, IsAlive # local restarts in failure window# local restarts in failure window
after this, offline. after this, offline.
Failback to preferred nodeFailback to preferred node (during failback window)(during failback window)
Do resource failures affect group?Do resource failures affect group?
ClusterService
name
IPaddr
ClusterService
Node \\BettyNode \\Alice
Failover
Failback
©1996, 1997 Microsoft Corp.
50
Cluster ConceptsCluster ConceptsClustersClusters
ClusterCluster GroupGroup ResourceResource
GroupGroup
GroupGroup
GroupGroup ResourceResource
ResourceResource
ResourceResource
©1996, 1997 Microsoft Corp.
51
Cluster PropertiesCluster Properties DefinedDefined Members: Members: nodes that can join the clusternodes that can join the cluster
ActiveActive Members: Members: nodes currently joined to clusternodes currently joined to cluster
Resource GroupsResource Groups:: groups in a clustergroups in a cluster
Quorum ResourceQuorum Resource::
Stores copy of cluster registry.Stores copy of cluster registry.
Used to form quorum.Used to form quorum.
NetworkNetwork:: Which network used for communicationWhich network used for communication
All properties kept in Cluster RegistryAll properties kept in Cluster Registry
©1996, 1997 Microsoft Corp.
52
Cluster API FunctionsCluster API Functions(operations on nodes & groups)(operations on nodes & groups)
Find and communicate with ClusterFind and communicate with Cluster Query/Set Cluster propertiesQuery/Set Cluster properties Enumerate Cluster objectsEnumerate Cluster objects
NodesNodes Groups Groups Resources and Resource TypesResources and Resource Types
Cluster Event NotificationsCluster Event Notifications Node state and property changes Node state and property changes Group state and property changesGroup state and property changes Resource state and property changesResource state and property changes
©1996, 1997 Microsoft Corp.
53
Cluster Management
©1996, 1997 Microsoft Corp.
54
DemoDemo Server startup and shutdownServer startup and shutdown Installing applicationsInstalling applications Changing statusChanging status Failing overFailing over Transferring ownership of groups or resourcesTransferring ownership of groups or resources Deleting Groups and ResourcesDeleting Groups and Resources
©1996, 1997 Microsoft Corp.
55
OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A
©1996, 1997 Microsoft Corp.
56
ArchitectureArchitecture Top tier provides
cluster abstractions
Middle tier provides distributed operations
Bottom tier is NT and drivers
Windows NT Server
Membership
Global Update
Failover Manager
Cluster Registry
Resource Monitor
ClusterDisk Driver
ClusterNet Drivers
Quorum
©1996, 1997 Microsoft Corp.
57
Membership and RegroupMembership and Regroup Membership:
Used for orderly addition and removal from{ active nodes }
Regroup: Used for failure detection
(via heartbeat messages)eartbeat messages) Forceful eviction from
{ active nodes }Windows NT Server
Regroup
Global Update
Failover Manager
Cluster Registry
Resource Monitor
ClusterDisk Driver
ClusterNet Drivers
Membership
©1996, 1997 Microsoft Corp.
58
MembershipMembership Defined cluster = all nodes Active cluster:
Subset of defined cluster Includes Quorum Resource Stable (no regroup in progress)
Windows NT Server
Regroup
Global Update
Failover Manager
Cluster Registry
Resource Monitor
ClusterDisk Driver
ClusterNet Drivers
Membership
©1996, 1997 Microsoft Corp.
59
Quorum ResourceQuorum Resource Usually (but not necessarily) a SCSI diskUsually (but not necessarily) a SCSI disk Requirements:Requirements:
Arbitrates for a resource by supporting the Arbitrates for a resource by supporting the challenge/defense protocolchallenge/defense protocol
Capable of Capable of storingstoring cluster registry and logs cluster registry and logs
Configuration Change LogsConfiguration Change Logs Tracks changes to configuration database when Tracks changes to configuration database when
any defined member missing (not active)any defined member missing (not active) Prevents configuration partitions in timePrevents configuration partitions in time
©1996, 1997 Microsoft Corp.
60
Challenge/Defense ProtocolChallenge/Defense Protocol SCSI-2 has reserve/release verbsSCSI-2 has reserve/release verbs
Semaphore on disk controllerSemaphore on disk controller
Owner gets Owner gets lease lease on semaphore on semaphore Renews lease once every 3 secondsRenews lease once every 3 seconds To preempt ownership:To preempt ownership:
Challenger clears semaphore (SCSI bus reset)Challenger clears semaphore (SCSI bus reset) Waits 10 secondsWaits 10 seconds
• 3 seconds for renewal + 2 seconds bus settle time• x2 to give owner two chances to renew
If still clear, then former owner loses leaseIf still clear, then former owner loses lease Challenger issues reserve to acquire semaphoreChallenger issues reserve to acquire semaphore
©1996, 1997 Microsoft Corp.
61
Challenge/Defense Protocol:Challenge/Defense Protocol:Successful DefenseSuccessful Defense
0 1 5432 6 7 111098 12 13 161514
Defender Node
Challenger Node
Reserve Reserve Reserve Reserve
Bus Reset
Reserve
Reservationdetected
©1996, 1997 Microsoft Corp.
62
Challenger Node
Noreservation
detected
Challenge/Defense Protocol:Challenge/Defense Protocol:Successful ChallengeSuccessful Challenge
Defender Node
Reserve
Bus ResetReserve
0 1 5432 6 7 111098 12 13 161514
©1996, 1997 Microsoft Corp.
63
RegroupRegroup Invariant:
All members agree on { members } Regroup re-computes { members } Each node sends Each node sends heartbeat heartbeat message message
to a peer (default is one per second)to a peer (default is one per second) Regroup if two lost heartbeat Regroup if two lost heartbeat
messagesmessages suspicionsuspicion that sender is dead that sender is dead failure detection in bounded timefailure detection in bounded time
Uses a 5-round protocol to agree.Uses a 5-round protocol to agree. Checks communication among nodes.Checks communication among nodes. Suspected missing node may survive.Suspected missing node may survive.
Upper levels (global update, etc.) informed of regroup event.
Windows NT Server
Regroup
Global Update
Failover Manager
Cluster Registry
Resource Monitor
ClusterDisk Driver
ClusterNet Drivers
Membership
©1996, 1997 Microsoft Corp.
64
Membership State MachineMembership State MachineInitialize
Joining
MemberSearch
Sleeping
QuorumDisk Search
Regroup
Forming
Online
Start Cluster
FoundOnline
Member
Acquire (reserve)Quorum
Disk
JoinSucceeds
SynchronizeSucceeds
Search or Reserve Fails
Search Fails
Minority orno Quorum
Non-Minorityand Quorum
LostHeartbeat
©1996, 1997 Microsoft Corp.
65
When a node starts up, it mounts and configures When a node starts up, it mounts and configures only local, non-cluster devices only local, non-cluster devices
Starts Cluster Service which Starts Cluster Service which looks in local (stale) registry for memberslooks in local (stale) registry for members Asks each member in turn to sponsor new node’s Asks each member in turn to sponsor new node’s
membership. (Stop when sponsor found.)membership. (Stop when sponsor found.)
Sponsor (any active member)Sponsor (any active member) Sponsor authenticates applicantSponsor authenticates applicant Broadcasts applicant to cluster membersBroadcasts applicant to cluster members Sponsor sends updated registry to applicantSponsor sends updated registry to applicant Applicant becomes a cluster memberApplicant becomes a cluster member
Joining a ClusterJoining a Cluster
©1996, 1997 Microsoft Corp.
66
Use registry to find quorum resourceUse registry to find quorum resource Attach to (arbitrate for) quorum resourceAttach to (arbitrate for) quorum resource Update cluster registry from quorum resourceUpdate cluster registry from quorum resource
e.g. if we were down when it was in usee.g. if we were down when it was in use
Form new one-node clusterForm new one-node cluster Bring other cluster resources onlineBring other cluster resources online Let others join your clusterLet others join your cluster
Forming a ClusterForming a Cluster(when Joining fails)(when Joining fails)
©1996, 1997 Microsoft Corp.
67
Leaving A Cluster (Gracefully)Leaving A Cluster (Gracefully) Pause:Pause:
Move all groups off this member.Move all groups off this member. Change to Change to pausedpaused state (remains a cluster member) state (remains a cluster member)
OfflineOffline:: Move all groups off this member.Move all groups off this member. Sends ClusterExit message all cluster membersSends ClusterExit message all cluster members
• Prevents regroup
• Prevents stalls during departure transitions Close Cluster connections Close Cluster connections
(now not an active cluster member)(now not an active cluster member) Cluster service stops on nodeCluster service stops on node
Evict:Evict: remove node from remove node from defined defined member listmember list
©1996, 1997 Microsoft Corp.
68
Node (or communication) failure triggers RegroupNode (or communication) failure triggers Regroup If after regroup:If after regroup:
Minority group Minority group OROR no quorum device: no quorum device:
• group does NOT survive Non-minority group Non-minority group ANDAND quorum device: quorum device:
• group DOES survive
Non-Minority rule:Non-Minority rule: Number of new members >= 1/2 old Number of new members >= 1/2 old activeactive cluster cluster Prevents minority from seizing quorum device at the expense of a larger Prevents minority from seizing quorum device at the expense of a larger
potentially surviving clusterpotentially surviving cluster
Quorum guarantees Quorum guarantees correctnesscorrectness Prevents “split-brain”Prevents “split-brain”
• e.g. with newly forming cluster containing a single node
Leaving a Cluster (Node Failure)Leaving a Cluster (Node Failure)
©1996, 1997 Microsoft Corp.
69
Global UpdateGlobal Update Propagates updates to all
nodes in cluster Used to maintain replicated
cluster registry Updates are atomic and
totally ordered Tolerates all benign failures. Depends on membership
all are up all can communicate
R. Carr, Tandem Systems Review. V1.2 R. Carr, Tandem Systems Review. V1.2 1985, sketches regroup and global update 1985, sketches regroup and global update protocol.protocol.
Windows NT Server
Regroup
Global Update
Failover Manager
Cluster Registry
Resource Monitor
ClusterDisk Driver
ClusterNet Drivers
Membership
©1996, 1997 Microsoft Corp.
70
Global Update AlgorithmGlobal Update Algorithm Cluster has Cluster has lockerlocker node that regulates node that regulates
updates.updates. Oldest active node in clusterOldest active node in cluster
Send Update to locker nodeSend Update to locker node Update other (active) nodesUpdate other (active) nodes
in seniority order (e.g. locker first)in seniority order (e.g. locker first) this includes the updating nodethis includes the updating node
Failure of Failure of allall updated nodes: updated nodes: Update never happenedUpdate never happened Updated nodes will roll back on recoveryUpdated nodes will roll back on recovery
Survival of Survival of anyany updated nodes: updated nodes: New locker is oldest and so has update if any do.New locker is oldest and so has update if any do. New locker restarts updateNew locker restarts update
S
L
X=1
00!
L
ack
S
©1996, 1997 Microsoft Corp.
71
Cluster RegistryCluster Registry Separate from local NT Registry Separate from local NT Registry Maintains cluster configuration Maintains cluster configuration
members, resources, restart members, resources, restart parameters, etc.parameters, etc.
Stable storageStable storage Replicated at each memberReplicated at each member
Global Update protocolGlobal Update protocol NT Registry keeps local copyNT Registry keeps local copy
Windows NT Server
Regroup
Global Update
Failover Manager
Cluster Registry
Resource Monitor
ClusterDisk Driver
ClusterNet Drivers
Membership
©1996, 1997 Microsoft Corp.
72
Cluster Registry BootstrappingCluster Registry Bootstrapping Membership uses Cluster Membership uses Cluster
Registry for list of nodesRegistry for list of nodes ……Circular dependencyCircular dependency
Solution:Solution: Membership uses stale Membership uses stale
local cluster registry local cluster registry Refresh after joining or Refresh after joining or
forming clusterforming cluster Master is eitherMaster is either
• quorum device, or
• active members
Windows NT Server
Membership
Global Update
Failover Manager
Cluster Registry
Resource Monitor
ClusterDisk Driver
ClusterNet Drivers
Regroup
©1996, 1997 Microsoft Corp.
73
Resource MonitorResource Monitor Polls resources:Polls resources:
IsAlive and LooksAliveIsAlive and LooksAlive
Detects failuresDetects failures polling failurepolling failure failure event from resourcefailure event from resource
Higher levels tell itHigher levels tell it Online, OfflineOnline, Offline RestartRestart Windows NT Server
Regroup
Global Update
Failover Manager
Cluster Registry
Resource Monitor
ClusterDisk Driver
ClusterNet Drivers
Membership
©1996, 1997 Microsoft Corp.
74
Failover ManagerFailover Manager
Assigns groups to nodes based on Failover parameters Possible nodes for each
resource in group Preferred nodes for
resource groupWindows NT Server
Regroup
Global Update
Failover Manager
Cluster Registry
Resource Monitor
ClusterDisk Driver
ClusterNet Drivers
Membership
©1996, 1997 Microsoft Corp.
75
FailoverFailover(Resource Goes Offline)(Resource Goes Offline)
Resource ManagerDetects resource error.
Attempt torestart resource.
Has the Resource Retry limit
been exceeded?
Yes
No
Switch resource(and Dependants)
Offline.
Notify Failover Manager.
Are Failoverconditions
withinConstraints?
Yes
No
Yes
No
Notify Failover Manageron the new system tobring resource Online.
Leave Group inpartially Online
state.
Wait forFailback Window
Can another owner be found?
(Arbitration)
Failover Manager checks:Failover Window andFailover Threshold
©1996, 1997 Microsoft Corp.
76
Pushing a Group Pushing a Group (Resource Failure)(Resource Failure)
Resource Monitornotifies Resource Manager
of resource failure.
Resource Managerenumerates all objects in theDependency Tree of the failed
resource.
Resource Manager notifiesFailover Manager that the Dependency Tree is Offline
and needs to fail over.
Failover Manager on thenew owner node brings the
resources Online.
Failover Manager performsArbitration to locate a new
owner for the group.
Resource Manager takeseach depending resource Offline.
Anyresource has
“Affect the Group”True
NoLeave Group inpartially Online
state.
Yes
©1996, 1997 Microsoft Corp.
77
Pulling a GroupPulling a Group(Node Failure)(Node Failure)
Cluster Servicenotifies Failover Manager
of node failure.
Failover Managerdetermines which groupswere owned by the failed
node.
Failover Manager on thenew owner(s) bring the
resources Online in dependency order.
Failover Manager performsArbitration to locate a new
owner for the groups.
Resource Manager notifiesFailover Manager that the
node is Offlineand the groups it owned
need to fail over.
©1996, 1997 Microsoft Corp.
78
Failback to Preferred Owner NodeFailback to Preferred Owner Node
Preferred ownercomes back Online.
Is the time withinthe Failback Window?
Failover Manager on thePreferred Owner brings the resources Online.
Failover Manager performsArbitration to locate the
Preferred Owner of the group.
Resource Manager takeseach resource on thecurrent owner Offline.
Resource Manager notifiesFailover Manager that the
Group is Offlineand needs to fail over to the
Preferred Owner.
Group may have a Group may have a Preferred OwnerPreferred Owner Preferred Owner comes back onlinePreferred Owner comes back online Will only occur during the Failback WindowWill only occur during the Failback Window
(time slot, e.g. at night)(time slot, e.g. at night)
©1996, 1997 Microsoft Corp.
79
OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A
©1996, 1997 Microsoft Corp.
80
ClusterService
Process StructureProcess Structure Cluster ServiceCluster Service
Failover ManagerFailover Manager Cluster RegistryCluster Registry Global UpdateGlobal Update QuorumQuorum MembershipMembership
Resource MonitorResource Monitor Resource MonitorResource Monitor Resource DLLsResource DLLs
ResourcesResources ServicesServices ApplicationsApplications
A Node
ResourceMonitor
ResourceMonitor
DLL ResourcePrivatecalls
Privatecalls
©1996, 1997 Microsoft Corp.
81
Resource ControlResource Control
ResourceMonitor
DLL
Resource
Privatecalls
CommandsCommands CreateResource()CreateResource() OnlineResource()OnlineResource() OfflineResource()OfflineResource() TerminateResource()TerminateResource() CloseResource()CloseResource() ShutdownProcess()ShutdownProcess()
And resource eventsAnd resource events
ResourceMonitor
PrivatecallsCluster
Service
A Node
©1996, 1997 Microsoft Corp.
82
Resource DLLsResource DLLs
Calls to Resource DLLCalls to Resource DLL Open:Open: get handle get handle Online:Online: start offering service start offering service Offline:Offline: stop offering service stop offering service
• as a standby oras a standby or
• pair-is offlinepair-is offline LooksAlive:LooksAlive: Quick check Quick check IsAlive:IsAlive: Thorough check Thorough check Terminate:Terminate: Forceful Offline Forceful Offline Close:Close: release handle release handle
OnlinePending
Online
Failed
Offline
OfflinePending
Go
Online!
I’m
Online!
I’m
Off-line!
Go
Off-line!
I’mhere!
ResourceMonitor
DLL
Resource
Privatecalls
Stdcalls
©1996, 1997 Microsoft Corp.
83
ClusterService
ResourceMonitors
ResourceMonitors
DCOM / RPC
Cluster CommunicationsCluster Communications
Managementapps
ClusterService
ResourceMonitors
ResourceMonitors
DCOM / RPC
DCOMDCOM
DCOM / RPC: adminUDP: Heartbeat
Most communication via DCOM /RPCMost communication via DCOM /RPC UDP used for membership heartbeat messagesUDP used for membership heartbeat messages Standard (e.g. Ethernet) interconnectsStandard (e.g. Ethernet) interconnects
©1996, 1997 Microsoft Corp.
84
OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A
©1996, 1997 Microsoft Corp.
85
Application SupportApplication Support Virtual ServersVirtual Servers Generic Resource DLLsGeneric Resource DLLs Resource DLL VC++ WizardResource DLL VC++ Wizard Cluster APICluster API
©1996, 1997 Microsoft Corp.
86
Virtual Servers Virtual Servers Problem:Problem:
Client and Server Applications Client and Server Applications do not want node name to change do not want node name to change when server app moves to another node.when server app moves to another node.
A Virtual Server simulates an NT NodeA Virtual Server simulates an NT Node Resource Group (name, disks, databases,…)Resource Group (name, disks, databases,…) NetName and IP address NetName and IP address
(node: \\a keeps name and IP address as is moves)(node: \\a keeps name and IP address as is moves) Virtual Registry (registry “moves” (is replicated))Virtual Registry (registry “moves” (is replicated)) Virtual Service Control Virtual Service Control Virtual RPC service Virtual RPC service
Challenges:Challenges: Limit app to virtual server’s devices and services.Limit app to virtual server’s devices and services. Client reconnect on failover (easy if connectionless Client reconnect on failover (easy if connectionless
-- eg web-clients)-- eg web-clients)
VirtualServer
\\a:1.2.3.4
VirtualServer
\\a: 1.2.3.4
©1996, 1997 Microsoft Corp.
87
Virtual Servers (before failover)Virtual Servers (before failover) Nodes \\Y and \\Z Nodes \\Y and \\Z
support virtual servers \\support virtual servers \\A and \\BA and \\B
Things that need to fail Things that need to fail over transparentlyover transparently Client connectionClient connection Server dependenciesServer dependencies Service names Service names Binding to local resourcesBinding to local resources Binding to local serversBinding to local servers
SAP
“SAP on A” “SAP on B”
\\A \\B
SAP
SQL SQL
T:\S:\
\\Y \\Z
©1996, 1997 Microsoft Corp.
88
Virtual Servers Virtual Servers (just after failover)(just after failover) \\Y resources and groups \\Y resources and groups
(i.e. Virtual Server \\A)(i.e. Virtual Server \\A)moved to \\Zmoved to \\Z
A resources bind to each other A resources bind to each other and to local resources (e.g., and to local resources (e.g., local file system)local file system) RegistryRegistry Physical resourcePhysical resource Security domainSecurity domain TimeTime
Transactions used to make DB Transactions used to make DB state consistent.state consistent.
To “work”, local resources on \\To “work”, local resources on \\Y and \\Z have to be similarY and \\Z have to be similar E.g. time must remain monotonic E.g. time must remain monotonic
after failoverafter failover
SAP
SQL
S:\
SAP
SQL
T:\
“SAP on A” “SAP on B”
\\A \\B
\\Y \\Z
©1996, 1997 Microsoft Corp.
89
Address Failover andAddress Failover andClient ReconnectionClient Reconnection
Name and Address rebind Name and Address rebind to new nodeto new node Details laterDetails later
Clients reconnectClients reconnect Failure not transparentFailure not transparent Must log on againMust log on again Client context lost Client context lost
(encourages connectionless)(encourages connectionless) Applications could maintain Applications could maintain
contextcontext
SAP
SQL
S:\
SAP
SQL
T:\
“SAP on A” “SAP on B”
\\A \\B
\\Y \\Z
©1996, 1997 Microsoft Corp.
90
Mapping Local References to Mapping Local References to Group-Relative ReferencesGroup-Relative References
Send client requests to correct Send client requests to correct serverserver \\A\SAP refers to \\.\SQL\\A\SAP refers to \\.\SQL \\B\SAP refers to \\.\SQL\\B\SAP refers to \\.\SQL
Must remap references:Must remap references: \\A\SAP to \\.\SQL$A\\A\SAP to \\.\SQL$A \\B\SAP to \\.\SQL$B\\B\SAP to \\.\SQL$B
Also handles namespace collisionAlso handles namespace collision Done viaDone via
modifying server apps, ormodifying server apps, or DLLs to transparently renameDLLs to transparently rename
SAP
SQL
S:\
SAP
SQL
T:\
“SAP on A” “SAP on B”
\\A \\B
\\Y \\Z
©1996, 1997 Microsoft Corp.
91
Services rely on the NT node name and - or IP address Services rely on the NT node name and - or IP address to advertise Shares, Printers, and Services.to advertise Shares, Printers, and Services. Applications register Applications register namesnames to advertise services to advertise services Example: \\Alice\SQL (i.e. <node><service>)Example: \\Alice\SQL (i.e. <node><service>) Example: 128.2.2.2:80 (=http://www.foo.com/)Example: 128.2.2.2:80 (=http://www.foo.com/)
BindingBinding Clients bind to an Clients bind to an address address (e.g. name->IP address)(e.g. name->IP address)
Thus the node name and IP address must failover along Thus the node name and IP address must failover along with the services with the services (preserve client bindings)(preserve client bindings)
Naming and Binding and FailoverNaming and Binding and Failover
©1996, 1997 Microsoft Corp.
92
Client to Cluster CommunicationsClient to Cluster CommunicationsIP address mobility based on MAC rebindingIP address mobility based on MAC rebinding
Alice <-> 200.110.120.4Virtual Alice <-> 200.110.120.5
Betty <-> 200.110.120.6Virtual Betty <-> 200.110.120.7
ClientAlice <-> 200.110.12.4
Virtual Alice <-> 200.110.12.5Betty <-> 200.110.12.6
Virtual Betty <-> 200.110.12.7
Router:200.110.120.4 ->AliceMAC200.110.120.5 ->AliceMAC200.110.120.6 ->BettyMAC200.110.120.7 ->BettyMAC
WAN
Local Network
Cluster ClientsCluster Clients Must use IP (TCP, UDP, NBT,... )Must use IP (TCP, UDP, NBT,... ) Must Reconnect or Retry after failureMust Reconnect or Retry after failure
Cluster ServersCluster Servers All cluster nodes must be on same LAN segmentAll cluster nodes must be on same LAN segment
IP rebinds to failover MAC addrIP rebinds to failover MAC addr Transparent to client or serverTransparent to client or server Low-level ARP (address Low-level ARP (address
resolution protocol) rebinds IP resolution protocol) rebinds IP add to new MAC addr.add to new MAC addr.
©1996, 1997 Microsoft Corp.
93
TimeTime
Time must increase monotonicallyTime must increase monotonically Otherwise applications get confusedOtherwise applications get confused e.g. make/nmake/builde.g. make/nmake/build
Time is maintained within failover resolutionTime is maintained within failover resolution Not hard, since failover on order of secondsNot hard, since failover on order of seconds
Time is a resource, so one node owns time resource Time is a resource, so one node owns time resource Other nodes periodically correct drift from owner’s timeOther nodes periodically correct drift from owner’s time
©1996, 1997 Microsoft Corp.
94
Application Local Application Local NT Registry CheckpointingNT Registry Checkpointing
Resources can request that local NT registry sub-Resources can request that local NT registry sub-trees be replicatedtrees be replicated
Changes written out to quorum deviceChanges written out to quorum device Uses registry change notification interfaceUses registry change notification interface
Changes read and applied on fail-overChanges read and applied on fail-over
\\A on \\X
registry
QuorumDevice
registry
\\A on \\B
registryEach update
After Failover
©1996, 1997 Microsoft Corp.
95
Registry Replication
©1996, 1997 Microsoft Corp.
96
Application SupportApplication Support Virtual ServersVirtual Servers Generic Resource DLLsGeneric Resource DLLs Resource DLL VC++ WizardResource DLL VC++ Wizard Cluster APICluster API
©1996, 1997 Microsoft Corp.
97
Generic Resource DLLsGeneric Resource DLLs Generic Application DLLGeneric Application DLL
Simplest: just starts, stops application, and Simplest: just starts, stops application, and makes sure process is alivemakes sure process is alive
Generic Service DLLGeneric Service DLL Translates DLL calls into equivalent NT Translates DLL calls into equivalent NT
Server callsServer calls• Online => Service StartOnline => Service Start
• Offline => Service StopOffline => Service Stop
• Looks/IsAlive => Service StatusLooks/IsAlive => Service StatusResourceMonitor
DLL
ResourcePrivate
callsStdcalls
©1996, 1997 Microsoft Corp.
98
Generic Application
©1996, 1997 Microsoft Corp.
99
Generic Service
©1996, 1997 Microsoft Corp.
100
Application SupportApplication Support
Virtual ServersVirtual Servers Generic Resource DLLsGeneric Resource DLLs Resource DLL VC++ WizardResource DLL VC++ Wizard Cluster APICluster API
©1996, 1997 Microsoft Corp.
101
Resource DLL VC++ WizardResource DLL VC++ Wizard
Asks for resource type nameAsks for resource type name Asks for optional service to controlAsks for optional service to control Asks for other parameters (and associated types)Asks for other parameters (and associated types) Generates DLL source codeGenerates DLL source code Source can be modified as necessarySource can be modified as necessary
E.g. additional checks for Looks/IsAliveE.g. additional checks for Looks/IsAlive
©1996, 1997 Microsoft Corp.
102
Creating a New WorkspaceCreating a New Workspace
©1996, 1997 Microsoft Corp.
103
Specifying Resource Type NameSpecifying Resource Type Name
©1996, 1997 Microsoft Corp.
104
Specifying Resource ParametersSpecifying Resource Parameters
©1996, 1997 Microsoft Corp.
105
Automatic Code GenerationAutomatic Code Generation
©1996, 1997 Microsoft Corp.
106
Customizing The CodeCustomizing The Code
©1996, 1997 Microsoft Corp.
107
Application SupportApplication Support
Virtual ServersVirtual Servers Generic Resource DLLsGeneric Resource DLLs Resource DLL VC++ WizardResource DLL VC++ Wizard Cluster APICluster API
©1996, 1997 Microsoft Corp.
108
Cluster APICluster API Allows resources to:Allows resources to:
Examine dependenciesExamine dependencies Manage per-resource dataManage per-resource data Change parameters (e.g. failover)Change parameters (e.g. failover) Listen for cluster eventsListen for cluster events etc.etc.
Specs & API became public Sept 1996Specs & API became public Sept 1996 On all MSDN Level 3On all MSDN Level 3 On web site:On web site:
http://www.microsoft.com/clustering.htmhttp://www.microsoft.com/clustering.htm
©1996, 1997 Microsoft Corp.
109
Cluster API DocumentationCluster API Documentation
©1996, 1997 Microsoft Corp.
110
OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A
©1996, 1997 Microsoft Corp.
111
Research Topics?Research Topics? Even easier to manageEven easier to manage Transparent failoverTransparent failover Instant failoverInstant failover Geographic distribution (disaster tolerance)Geographic distribution (disaster tolerance) Server pools (load-balanced pool of processes)Server pools (load-balanced pool of processes) Process pair (active/backup process)Process pair (active/backup process) 10,000 nodes?10,000 nodes? Better algorithmsBetter algorithms Shared memory or shared disk among nodesShared memory or shared disk among nodes
a truly bad idea?a truly bad idea?
©1996, 1997 Microsoft Corp.
112
ReferencesReferencesMicrosoft NT site: Microsoft NT site: http://www.microsoft.com/ntserver/http://www.microsoft.com/ntserver/
BARC site: BARC site: http://research.microsoft.com/BARChttp://research.microsoft.com/BARC
These slidesThese slides: : http://research.microsoft.com/~joebar/ftcs-27/ftcs20.ppthttp://research.microsoft.com/~joebar/ftcs-27/ftcs20.ppt
Inside Windows NT, Inside Windows NT, H. Custer, Microsoft Pr, ISBN: 155615481 H. Custer, Microsoft Pr, ISBN: 155615481
Tandem Global Update ProtocolTandem Global Update Protocol, , R. Carr, Tandem Systems Review. V1.2 1985, sketches regroup and global update protocol.R. Carr, Tandem Systems Review. V1.2 1985, sketches regroup and global update protocol.
VAXclusters: a Closely Coupled Distributed System,VAXclusters: a Closely Coupled Distributed System, Kronenberg, N., Levey, H., Strecker, W., ACM TOCS, V 4.2 1986. A (the) shared disk Kronenberg, N., Levey, H., Strecker, W., ACM TOCS, V 4.2 1986. A (the) shared disk cluster.cluster.
In Search of Clusters : The Coming Battle in Lowly Parallel ComputingIn Search of Clusters : The Coming Battle in Lowly Parallel Computing, , Gregory F. Pfister, Prentice Hall, 1995, ISBN: 0134376250. Argues for shared nothing Gregory F. Pfister, Prentice Hall, 1995, ISBN: 0134376250. Argues for shared nothing
Transaction Processing Concepts and TechniquesTransaction Processing Concepts and Techniques ,, Gray, J., Reuter A., Morgan Kaufmann, 1994. ISBN 1558601902, survey of outages, Gray, J., Reuter A., Morgan Kaufmann, 1994. ISBN 1558601902, survey of outages, transaction techniques.transaction techniques.