©1996, 1997 microsoft corp. 1 ft nt: a tutorial on microsoft cluster server (formerly wolfpack) joe...

112
996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on FT NT: A Tutorial on Microsoft Cluster Microsoft Cluster Server Server (formerly “Wolfpack”) (formerly “Wolfpack”) Joe Barrera Joe Barrera Jim Gray Jim Gray Microsoft Research Microsoft Research {joebar, gray} @ microsoft.com {joebar, gray} @ microsoft.com http://research.microsoft.com/barc http://research.microsoft.com/barc

Upload: haley-ware

Post on 26-Mar-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

1

FT NT: A Tutorial on FT NT: A Tutorial on Microsoft Cluster ServerMicrosoft Cluster Server™™

(formerly “Wolfpack”)(formerly “Wolfpack”)

Joe BarreraJoe Barrera

Jim GrayJim Gray

Microsoft Research Microsoft Research {joebar, gray} @ microsoft.com{joebar, gray} @ microsoft.com

http://research.microsoft.com/barchttp://research.microsoft.com/barc

Page 2: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

2

OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A

Page 3: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

3

DEPENDABILITY: The 3 ITIESDEPENDABILITY: The 3 ITIESRELIABILITY / INTEGRITY:RELIABILITY / INTEGRITY: Does the Does the

right thing.right thing. (also large MTTF)(also large MTTF)

AVAILABILITY:AVAILABILITY: Does it now Does it now. . (also small (also small MTTRMTTR ) )

MTTF+MTTR MTTF+MTTRSystem Availability:System Availability:If 90% of terminals up & 99% of DB up?If 90% of terminals up & 99% of DB up?

(=>89% of transactions are serviced on time(=>89% of transactions are serviced on time).).

Holistic vs. Reductionist viewHolistic vs. Reductionist view

SecurityIntegrityReliability

Availability

Page 4: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

4

Case Study - JapanCase Study - Japan"Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi Watanabe)."Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi Watanabe).

VendorVendor (hardware and software) (hardware and software) 5 Months 5 MonthsApplication softwareApplication software 9 Months 9 MonthsCommunications linesCommunications lines 1.5 Years1.5 YearsOperationsOperations 2 Years 2 YearsEnvironment Environment 2 Years 2 Years

10 Weeks10 Weeks1,383 institutions reported (6/84 - 7/85)1,383 institutions reported (6/84 - 7/85)

7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES

To Get 10 Year MTTF, Must Attack All These AreasTo Get 10 Year MTTF, Must Attack All These Areas

42%

12%

25%9.3%

11.2%

Vendor

Environment

OperationsApplication

Software

Tele Comm lines

Page 5: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

5

Case Studies - Tandem TrendsCase Studies - Tandem Trends

MTTF improved MTTF improved

ShiftShift from from Hardware & Maintenance to from 50% to 10%Hardware & Maintenance to from 50% to 10%

toto Software (62%) & Operations (15%)Software (62%) & Operations (15%)

NOTE: Systematic under-reporting ofNOTE: Systematic under-reporting of EnvironmentEnvironmentOperations errorsOperations errorsApplication Software Application Software

unknown environment operations maintenance hardware software

0

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

100

1985 1987 1989

0

20

40

60

80

1 00

1 20

1985 19 87 1 989

Outag es/ 1000 Syste m Yearsby Primar y Cause

% of Outage s by Pri mary Cause

Page 6: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

6

Summary of FT StudiesSummary of FT StudiesCurrent Situation: ~4-year MTTF => Current Situation: ~4-year MTTF =>

Fault Tolerance Works.Fault Tolerance Works.Hardware is GREAT (maintenance and MTTF).Hardware is GREAT (maintenance and MTTF).Software masks most hardware faults.Software masks most hardware faults.Many Many hiddenhidden software outages in operations: software outages in operations:

New Software.New Software.Utilities.Utilities.

Must make all software ONLINE.Must make all software ONLINE.Software seems to define a 30-year MTTF ceiling.Software seems to define a 30-year MTTF ceiling.

Reasonable Goal: 100-year MTTF.Reasonable Goal: 100-year MTTF. class 4 today class 4 today =>=> class 6class 6 tomorrow.tomorrow.

Page 7: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

7

Fault Tolerance vs Disaster ToleranceFault Tolerance vs Disaster Tolerance

Fault-Tolerance:Fault-Tolerance: mask local faults mask local faults RAID disksRAID disks Uninterruptible Power SuppliesUninterruptible Power Supplies Cluster Failover Cluster Failover

Disaster Tolerance:Disaster Tolerance: masks site failures masks site failures Protects against fire, flood, sabotage,..Protects against fire, flood, sabotage,.. Redundant system and service at remote Redundant system and service at remote

site. site.

Page 8: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

8

The Microsoft “Vision”: The Microsoft “Vision”: Plug & Play DependabilityPlug & Play Dependability

Integrity / SecurityIntegrityReliability

Availability

Transactions for reliabilityTransactions for reliability Clusters: for availabilityClusters: for availability SecuritySecurity All built into the OS All built into the OS

Page 9: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

9

Cluster GoalsCluster Goals ManageabilityManageability

Manage nodes as a single systemManage nodes as a single system Perform server maintenance without affecting usersPerform server maintenance without affecting users Mask faults, so repair is non-disruptiveMask faults, so repair is non-disruptive

AvailabilityAvailability Restart failed applications & serversRestart failed applications & servers

• un-availability ~ MTTR / MTBF , so quick repair.un-availability ~ MTTR / MTBF , so quick repair. Detect/warn administrators of failuresDetect/warn administrators of failures

ScalabilityScalability Add nodes for incremental Add nodes for incremental

• processing processing • storagestorage• bandwidthbandwidth

Page 10: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

10

Fault Model Failures are independent

So, single fault tolerance is a big win Hardware fails fast (blue-screen) Software fails-fast (or goes to sleep) Software often repaired by reboot:

Heisenbugs Operations tasks: major source of outage

Utility operationsSoftware upgrades

Page 11: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

11

Cluster: Servers Combined to Cluster: Servers Combined to Improve Availability & ScalabilityImprove Availability & Scalability

ClusterCluster: : A group of independent systems working A group of independent systems working together as a single system. together as a single system. Clients see scalable & FT services (single system image).Clients see scalable & FT services (single system image).

NodeNode: A server in a cluster. May be an SMP server.: A server in a cluster. May be an SMP server. InterconnectInterconnect: Communications link used for intra-: Communications link used for intra-

cluster status info such as “heartbeats”. Can be Ethernet.cluster status info such as “heartbeats”. Can be Ethernet.Client PCsClient PCs PrintersPrinters

Server AServer A

Disk array ADisk array ADisk array BDisk array B

Server BServer B

InterconnectInterconnect

Page 12: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

12

Microsoft Cluster ServerMicrosoft Cluster Server™™ 2-node availability Summer 97 2-node availability Summer 97 (20,000 Beta Testers now)(20,000 Beta Testers now)

Commoditize fault-tolerance (high availability)Commoditize fault-tolerance (high availability) Commodity hardware (no special hardware)Commodity hardware (no special hardware) Easy to set up and manageEasy to set up and manage Lots of applications work out of the box.Lots of applications work out of the box.

16-node scalability later 16-node scalability later (next year?)(next year?)

Page 13: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

13

Web Web sitesite

DatabaseDatabase

Web site filesWeb site files

Database filesDatabase files

Server 1Server 1 Server 2Server 2

BrowserBrowser

Failover ExampleFailover Example

Web Web sitesite

DatabaseDatabase

Server 1Server 1 Server 2Server 2

Page 14: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

14

Client/Server Software failure Admin shutdown Server failure

MS Press Failover DemoMS Press Failover Demo

!

Resource States

- Pending- Pending

- Partial- Partial

- Failed- Failed

- Offline- Offline

Page 15: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

Windows NT Server Cluster

SCSI Disk CabinetSCSI Disk Cabinet

SharedDisks

LocalDisks

Demo ConfigurationDemo Configuration Server “Alice”Server “Alice”

SMP PentiumSMP Pentium®® Pro Processors Pro ProcessorsWindows NT Server with WolfpackWindows NT Server with WolfpackMicrosoft Internet Information ServerMicrosoft Internet Information ServerMicrosoft SQL ServerMicrosoft SQL Server

Server “Betty”Server “Betty”

SMP PentiumSMP Pentium®® Pro Processors Pro ProcessorsWindows NT Server with WolfpackWindows NT Server with WolfpackMicrosoft Internet Information ServerMicrosoft Internet Information ServerMicrosoft SQL ServerMicrosoft SQL Server

InterconnectInterconnectstandard Ethernetstandard Ethernet

ClientClient

Windows NT WorkstationWindows NT WorkstationInternet ExplorerInternet ExplorerMS Press OLTP appMS Press OLTP app

AdministratorAdministrator

Windows NT WorkstationWindows NT WorkstationCluster AdminCluster AdminSQL Enterprise MgrSQL Enterprise Mgr

LocalDisks

Page 16: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

Windows NT Server Cluster

SCSI Disk CabinetSCSI Disk Cabinet

SharedDisks

LocalDisks

Demo AdministrationDemo Administration

ClientClient

Server “Alice”Server “Alice”

Runs SQL TraceRuns SQL TraceRuns GlobeRuns Globe

Server “Betty”Server “Betty”

Run SQL TraceRun SQL Trace

LocalDisks

Cluster Admin ConsoleCluster Admin ConsoleWindows GUIWindows GUIShows cluster resource statusShows cluster resource statusReplicates status to all serversReplicates status to all serversDefine apps & related resourcesDefine apps & related resourcesDefine resource dependenciesDefine resource dependenciesOrchestrates recovery orderOrchestrates recovery order

SQL Enterprise MgrSQL Enterprise MgrWindows GUIWindows GUIShows server statusShows server statusManages many serversManages many serversStart, stop manage DBsStart, stop manage DBs

Page 17: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

17

Generic Stateless ApplicationGeneric Stateless ApplicationRotating GlobeRotating Globe

Mplay32 is generic app.Mplay32 is generic app. Registered with MSCSRegistered with MSCS MSCS restarts it on failureMSCS restarts it on failure Move/restart ~ 2 secondsMove/restart ~ 2 seconds Fail-over Fail-over ifif

4 failures 4 failures (= process exits) (= process exits)

in 3 minutesin 3 minutes settable defaultsettable default

Page 18: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

Windows NT Server Cluster

SCSI Disk CabinetSCSI Disk Cabinet

SharedDisks

LocalDisks

Demo Moving or Failing Over Demo Moving or Failing Over An ApplicationAn Application

LocalDisks

AVI AVI ApplicationApplication

X

Alice Fails or Alice Fails or Operator Operator Requests moveRequests move

AVI AVI ApplicationApplication

X

Page 19: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

19

Generic Stateful ApplicationGeneric Stateful ApplicationNotePadNotePad

Notepad saves state on shared diskNotepad saves state on shared disk Failure before save => lost changesFailure before save => lost changes Failover or move (disk & state move)Failover or move (disk & state move)

Page 20: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

Windows NT Server Cluster

SCSI Disk CabinetSCSI Disk Cabinet

SharedDisks

LocalDisks

Demo Step 1: Demo Step 1: Alice Delivering ServiceAlice Delivering Service

LocalDisks

No SQL Activity SQL Activity

IIS

SQL

HTTP

OD

BC

IP

IIS

SQL

OD

BC

Page 21: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

Windows NT Server Cluster

SCSI Disk CabinetSCSI Disk Cabinet

SharedDisks

LocalDisks

2: Request Move to Betty2: Request Move to Betty

LocalDisks

HTTP

IIS

SQL

OD

BC

IP

IIS

SQL

OD

BC

No SQL Activity

IP

SQL Activity

Page 22: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

Windows NT Server Cluster

SCSI Disk CabinetSCSI Disk Cabinet

SharedDisks

LocalDisks

3: Betty Delivering Service3: Betty Delivering Service

LocalDisks

IIS

SQL

OD

BC

IIS

SQL

OD

BC

No SQL Activity

IP

.

SQL Activity

Page 23: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

Windows NT Server Cluster

SCSI Disk CabinetSCSI Disk Cabinet

SharedDisks

LocalDisks

4: Power Fail Betty, Alice Takeover4: Power Fail Betty, Alice Takeover

LocalDisks

IIS

SQL

OD

BC

No SQL Activity

IP

SQL Activity

IIS

SQL

OD

BC

IP

Page 24: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

Windows NT Server Cluster

SCSI Disk CabinetSCSI Disk Cabinet

SharedDisks

LocalDisks

5: Alice Delivering Service5: Alice Delivering Service

LocalDisks

No SQL Activity SQL Activity

IIS

SQL

HTTP

OD

BC

IP

Page 25: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

Windows NT Server Cluster

SCSI Disk CabinetSCSI Disk Cabinet

SharedDisks

LocalDisks

6: Reboot Betty, now can takeover6: Reboot Betty, now can takeover

LocalDisks

No SQL Activity SQL Activity

IIS

SQL

HTTP

OD

BC

IP

IIS

SQL

OD

BC

Page 26: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

26

OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A

Page 27: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

27

Cluster and NT AbstractionsCluster and NT Abstractions

ClusterCluster GroupGroup ResourceResource

DomainDomain NodeNode ServiceService

Cluster AbstractionsCluster Abstractions

NT AbstractionsNT Abstractions

Page 28: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

28

Basic NT AbstractionsBasic NT Abstractions

DomainDomain NodeNode ServiceService Service: program or device managed by a node

e.g., file service, print service, database server can depend on other services (startup ordering) can be started, stopped, paused, failed

Node: a single (tightly-coupled) NT system hosts services; belongs to a domain services on node always remain co-located unit of service co-location; involved in naming services

Domain: a collection of nodes cooperation for authentication, administration, naming

Page 29: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

29

Cluster AbstractionsCluster Abstractions

ClusterCluster ResourceResourceGroupGroup ResourceResource

Resource: program or device managed by a cluster e.g., file service, print service, database server can depend on other resources (startup ordering) can be online, offline, paused, failed

Resource Group: a collection of related resources hosts resources; belongs to a cluster unit of co-location; involved in naming resources

Cluster: a collection of nodes, resources, and groups cooperation for authentication, administration, naming

Page 30: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

30

ResourcesResources

Resources have...Resources have... Type: Type: what it does (file, DB, print, web…) what it does (file, DB, print, web…) An operational An operational statestate (online/offline/failed) (online/offline/failed) CurrentCurrent and and possiblepossible nodesnodes Containing Containing Resource GroupResource Group DependenciesDependencies on other resources on other resources Restart parametersRestart parameters (in case of resource failure) (in case of resource failure)

ClusterCluster GroupGroup ResourceResource

Page 31: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

31

Resource Types Resource Types Built-in typesBuilt-in types

Generic ApplicationGeneric Application Generic ServiceGeneric Service Internet Information Server Internet Information Server

(IIS) Virtual Root(IIS) Virtual Root Network NameNetwork Name TCP/IP AddressTCP/IP Address Physical DiskPhysical Disk FT Disk (Software RAID)FT Disk (Software RAID) Print SpoolerPrint Spooler File ShareFile Share

Added by othersAdded by others Microsoft SQL Server, Microsoft SQL Server, Message Queues, Message Queues, Exchange Mail Server, Exchange Mail Server, Oracle, Oracle, SAP R/3SAP R/3 Your application? Your application?

(use developer kit wizard).(use developer kit wizard).

Page 32: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

32

Physical Disk

Page 33: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

33

TCP/IP Address

Page 34: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

34

Network Name

Page 35: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

35

File Share

Page 36: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

36

IIS (WWW/FTP) Server

Page 37: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

37

Print Spooler

Page 38: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

38

Resource StatesResource States Resources states:Resources states:

OfflineOffline:: exists, not offering serviceexists, not offering service OnlineOnline:: offering serviceoffering service Failed:Failed: not able to offer servicenot able to offer service

Resource failure may cause:Resource failure may cause: local local restartrestart other resources to goother resources to go offlineoffline resource group to resource group to movemove (all subject to group and resource parameters)(all subject to group and resource parameters)

Resource failure detected by:Resource failure detected by: Polling failurePolling failure Node failureNode failure

OnlinePending

Online

Failed

Offline

OfflinePending

Go

Online!

I’m

Online!

I’m

Off-line!

Go

Off-line!

I’mhere!

Page 39: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

39

Resource DependenciesResource Dependencies Similar to NT Service DependenciesSimilar to NT Service Dependencies Orderly startup & shutdownOrderly startup & shutdown

A resource is brought online A resource is brought online after after any any resources it depends on are online.resources it depends on are online.

A Resource is taken offline A Resource is taken offline beforebefore any any resources it depends onresources it depends on

Interdependent resources Interdependent resources Form Form dependency treesdependency trees move among nodes togethermove among nodes together failover togetherfailover together as per resource groupas per resource group

Network Name

IP AddressResource DLL

IIS Virtual Root

File Share

Page 40: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

40

Dependencies Tab

Page 41: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

41

NT RegistryNT Registry Stores all configuration informationStores all configuration information

Software Software HardwareHardware

Hierarchical (name, value) mapHierarchical (name, value) map Has a open, documented interfaceHas a open, documented interface Is secureIs secure Is visible across the net (RPC interface)Is visible across the net (RPC interface) Typical Entry:Typical Entry:

\Software\Microsoft\MSSQLServer\MSSQLServer\\Software\Microsoft\MSSQLServer\MSSQLServer\DefaultLogin = “GUEST”DefaultLogin = “GUEST”DefaultDomain = “REDMOND”DefaultDomain = “REDMOND”

Page 42: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

42

Cluster RegistryCluster Registry Separate from local NT Registry Separate from local NT Registry Replicated at each nodeReplicated at each node

Algorithms explained laterAlgorithms explained later

Maintains configuration information:Maintains configuration information: Cluster membersCluster members Cluster resourcesCluster resources Resource and group parameters (e.g. restart)Resource and group parameters (e.g. restart)

Stable storageStable storage Refreshed from “master” copy when node joins Refreshed from “master” copy when node joins

clustercluster

Page 43: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

43

Other Resource PropertiesOther Resource Properties NameName Restart policy (restart N times, failover…)Restart policy (restart N times, failover…) Startup parametersStartup parameters Private configuration info (resource type specific)Private configuration info (resource type specific)

Per-node as well, if necessaryPer-node as well, if necessary Poll Intervals (LooksAlive, IsAlive, TimeoutPoll Intervals (LooksAlive, IsAlive, Timeout))

These properties are all kept in Cluster RegistryThese properties are all kept in Cluster Registry

Page 44: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

44

General Resource Tab

Page 45: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

45

Advanced Resource Tab

Page 46: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

46

Resource GroupsResource Groups

Every resource belongs to a Every resource belongs to a resource group.resource group.

Resource groups Resource groups move move (failover) as a unit(failover) as a unit

Dependencies Dependencies NEVERNEVER cross cross groups. (Dependency groups. (Dependency treestrees contained within groups.)contained within groups.)

Group may contain Group may contain forestforest of of dependency treesdependency trees

ClusterCluster GroupGroup ResourceResource

Drive E:IP Address

SQLServer

Web Server

Drive F:

Payroll Group

Page 47: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

47

Moving a Resource Group

Page 48: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

48

Group PropertiesGroup Properties CurrentState:CurrentState: Online, Partially Online, Offline Online, Partially Online, Offline

Members:Members: resources that belong to group resources that belong to group members determine which nodes can host group.members determine which nodes can host group.

Preferred Owners:Preferred Owners: ordered list of host nodesordered list of host nodes

FailoverThreshold:FailoverThreshold: How many faults cause failover How many faults cause failover

FailoverPeriod:FailoverPeriod: Time window for failover threshold Time window for failover threshold

FailbackWindowsStart:FailbackWindowsStart: When can failback happen? When can failback happen?

FailbackWindowEnd:FailbackWindowEnd: When can failback happen? When can failback happen?

Everything (except CurrentState) is stored in registryEverything (except CurrentState) is stored in registry

Page 49: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

49

Failover and FailbackFailover and Failback Failover parametersFailover parameters

timeout on LooksAlive, IsAlivetimeout on LooksAlive, IsAlive # local restarts in failure window# local restarts in failure window

after this, offline. after this, offline.

Failback to preferred nodeFailback to preferred node (during failback window)(during failback window)

Do resource failures affect group?Do resource failures affect group?

ClusterService

name

IPaddr

ClusterService

Node \\BettyNode \\Alice

Failover

Failback

Page 50: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

50

Cluster ConceptsCluster ConceptsClustersClusters

ClusterCluster GroupGroup ResourceResource

GroupGroup

GroupGroup

GroupGroup ResourceResource

ResourceResource

ResourceResource

Page 51: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

51

Cluster PropertiesCluster Properties DefinedDefined Members: Members: nodes that can join the clusternodes that can join the cluster

ActiveActive Members: Members: nodes currently joined to clusternodes currently joined to cluster

Resource GroupsResource Groups:: groups in a clustergroups in a cluster

Quorum ResourceQuorum Resource::

Stores copy of cluster registry.Stores copy of cluster registry.

Used to form quorum.Used to form quorum.

NetworkNetwork:: Which network used for communicationWhich network used for communication

All properties kept in Cluster RegistryAll properties kept in Cluster Registry

Page 52: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

52

Cluster API FunctionsCluster API Functions(operations on nodes & groups)(operations on nodes & groups)

Find and communicate with ClusterFind and communicate with Cluster Query/Set Cluster propertiesQuery/Set Cluster properties Enumerate Cluster objectsEnumerate Cluster objects

NodesNodes Groups Groups Resources and Resource TypesResources and Resource Types

Cluster Event NotificationsCluster Event Notifications Node state and property changes Node state and property changes Group state and property changesGroup state and property changes Resource state and property changesResource state and property changes

Page 53: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

53

Cluster Management

Page 54: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

54

DemoDemo Server startup and shutdownServer startup and shutdown Installing applicationsInstalling applications Changing statusChanging status Failing overFailing over Transferring ownership of groups or resourcesTransferring ownership of groups or resources Deleting Groups and ResourcesDeleting Groups and Resources

Page 55: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

55

OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A

Page 56: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

56

ArchitectureArchitecture Top tier provides

cluster abstractions

Middle tier provides distributed operations

Bottom tier is NT and drivers

Windows NT Server

Membership

Global Update

Failover Manager

Cluster Registry

Resource Monitor

ClusterDisk Driver

ClusterNet Drivers

Quorum

Page 57: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

57

Membership and RegroupMembership and Regroup Membership:

Used for orderly addition and removal from{ active nodes }

Regroup: Used for failure detection

(via heartbeat messages)eartbeat messages) Forceful eviction from

{ active nodes }Windows NT Server

Regroup

Global Update

Failover Manager

Cluster Registry

Resource Monitor

ClusterDisk Driver

ClusterNet Drivers

Membership

Page 58: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

58

MembershipMembership Defined cluster = all nodes Active cluster:

Subset of defined cluster Includes Quorum Resource Stable (no regroup in progress)

Windows NT Server

Regroup

Global Update

Failover Manager

Cluster Registry

Resource Monitor

ClusterDisk Driver

ClusterNet Drivers

Membership

Page 59: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

59

Quorum ResourceQuorum Resource Usually (but not necessarily) a SCSI diskUsually (but not necessarily) a SCSI disk Requirements:Requirements:

Arbitrates for a resource by supporting the Arbitrates for a resource by supporting the challenge/defense protocolchallenge/defense protocol

Capable of Capable of storingstoring cluster registry and logs cluster registry and logs

Configuration Change LogsConfiguration Change Logs Tracks changes to configuration database when Tracks changes to configuration database when

any defined member missing (not active)any defined member missing (not active) Prevents configuration partitions in timePrevents configuration partitions in time

Page 60: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

60

Challenge/Defense ProtocolChallenge/Defense Protocol SCSI-2 has reserve/release verbsSCSI-2 has reserve/release verbs

Semaphore on disk controllerSemaphore on disk controller

Owner gets Owner gets lease lease on semaphore on semaphore Renews lease once every 3 secondsRenews lease once every 3 seconds To preempt ownership:To preempt ownership:

Challenger clears semaphore (SCSI bus reset)Challenger clears semaphore (SCSI bus reset) Waits 10 secondsWaits 10 seconds

• 3 seconds for renewal + 2 seconds bus settle time• x2 to give owner two chances to renew

If still clear, then former owner loses leaseIf still clear, then former owner loses lease Challenger issues reserve to acquire semaphoreChallenger issues reserve to acquire semaphore

Page 61: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

61

Challenge/Defense Protocol:Challenge/Defense Protocol:Successful DefenseSuccessful Defense

0 1 5432 6 7 111098 12 13 161514

Defender Node

Challenger Node

Reserve Reserve Reserve Reserve

Bus Reset

Reserve

Reservationdetected

Page 62: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

62

Challenger Node

Noreservation

detected

Challenge/Defense Protocol:Challenge/Defense Protocol:Successful ChallengeSuccessful Challenge

Defender Node

Reserve

Bus ResetReserve

0 1 5432 6 7 111098 12 13 161514

Page 63: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

63

RegroupRegroup Invariant:

All members agree on { members } Regroup re-computes { members } Each node sends Each node sends heartbeat heartbeat message message

to a peer (default is one per second)to a peer (default is one per second) Regroup if two lost heartbeat Regroup if two lost heartbeat

messagesmessages suspicionsuspicion that sender is dead that sender is dead failure detection in bounded timefailure detection in bounded time

Uses a 5-round protocol to agree.Uses a 5-round protocol to agree. Checks communication among nodes.Checks communication among nodes. Suspected missing node may survive.Suspected missing node may survive.

Upper levels (global update, etc.) informed of regroup event.

Windows NT Server

Regroup

Global Update

Failover Manager

Cluster Registry

Resource Monitor

ClusterDisk Driver

ClusterNet Drivers

Membership

Page 64: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

64

Membership State MachineMembership State MachineInitialize

Joining

MemberSearch

Sleeping

QuorumDisk Search

Regroup

Forming

Online

Start Cluster

FoundOnline

Member

Acquire (reserve)Quorum

Disk

JoinSucceeds

SynchronizeSucceeds

Search or Reserve Fails

Search Fails

Minority orno Quorum

Non-Minorityand Quorum

LostHeartbeat

Page 65: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

65

When a node starts up, it mounts and configures When a node starts up, it mounts and configures only local, non-cluster devices only local, non-cluster devices

Starts Cluster Service which Starts Cluster Service which looks in local (stale) registry for memberslooks in local (stale) registry for members Asks each member in turn to sponsor new node’s Asks each member in turn to sponsor new node’s

membership. (Stop when sponsor found.)membership. (Stop when sponsor found.)

Sponsor (any active member)Sponsor (any active member) Sponsor authenticates applicantSponsor authenticates applicant Broadcasts applicant to cluster membersBroadcasts applicant to cluster members Sponsor sends updated registry to applicantSponsor sends updated registry to applicant Applicant becomes a cluster memberApplicant becomes a cluster member

Joining a ClusterJoining a Cluster

Page 66: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

66

Use registry to find quorum resourceUse registry to find quorum resource Attach to (arbitrate for) quorum resourceAttach to (arbitrate for) quorum resource Update cluster registry from quorum resourceUpdate cluster registry from quorum resource

e.g. if we were down when it was in usee.g. if we were down when it was in use

Form new one-node clusterForm new one-node cluster Bring other cluster resources onlineBring other cluster resources online Let others join your clusterLet others join your cluster

Forming a ClusterForming a Cluster(when Joining fails)(when Joining fails)

Page 67: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

67

Leaving A Cluster (Gracefully)Leaving A Cluster (Gracefully) Pause:Pause:

Move all groups off this member.Move all groups off this member. Change to Change to pausedpaused state (remains a cluster member) state (remains a cluster member)

OfflineOffline:: Move all groups off this member.Move all groups off this member. Sends ClusterExit message all cluster membersSends ClusterExit message all cluster members

• Prevents regroup

• Prevents stalls during departure transitions Close Cluster connections Close Cluster connections

(now not an active cluster member)(now not an active cluster member) Cluster service stops on nodeCluster service stops on node

Evict:Evict: remove node from remove node from defined defined member listmember list

Page 68: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

68

Node (or communication) failure triggers RegroupNode (or communication) failure triggers Regroup If after regroup:If after regroup:

Minority group Minority group OROR no quorum device: no quorum device:

• group does NOT survive Non-minority group Non-minority group ANDAND quorum device: quorum device:

• group DOES survive

Non-Minority rule:Non-Minority rule: Number of new members >= 1/2 old Number of new members >= 1/2 old activeactive cluster cluster Prevents minority from seizing quorum device at the expense of a larger Prevents minority from seizing quorum device at the expense of a larger

potentially surviving clusterpotentially surviving cluster

Quorum guarantees Quorum guarantees correctnesscorrectness Prevents “split-brain”Prevents “split-brain”

• e.g. with newly forming cluster containing a single node

Leaving a Cluster (Node Failure)Leaving a Cluster (Node Failure)

Page 69: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

69

Global UpdateGlobal Update Propagates updates to all

nodes in cluster Used to maintain replicated

cluster registry Updates are atomic and

totally ordered Tolerates all benign failures. Depends on membership

all are up all can communicate

R. Carr, Tandem Systems Review. V1.2 R. Carr, Tandem Systems Review. V1.2 1985, sketches regroup and global update 1985, sketches regroup and global update protocol.protocol.

Windows NT Server

Regroup

Global Update

Failover Manager

Cluster Registry

Resource Monitor

ClusterDisk Driver

ClusterNet Drivers

Membership

Page 70: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

70

Global Update AlgorithmGlobal Update Algorithm Cluster has Cluster has lockerlocker node that regulates node that regulates

updates.updates. Oldest active node in clusterOldest active node in cluster

Send Update to locker nodeSend Update to locker node Update other (active) nodesUpdate other (active) nodes

in seniority order (e.g. locker first)in seniority order (e.g. locker first) this includes the updating nodethis includes the updating node

Failure of Failure of allall updated nodes: updated nodes: Update never happenedUpdate never happened Updated nodes will roll back on recoveryUpdated nodes will roll back on recovery

Survival of Survival of anyany updated nodes: updated nodes: New locker is oldest and so has update if any do.New locker is oldest and so has update if any do. New locker restarts updateNew locker restarts update

S

L

X=1

00!

L

ack

S

Page 71: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

71

Cluster RegistryCluster Registry Separate from local NT Registry Separate from local NT Registry Maintains cluster configuration Maintains cluster configuration

members, resources, restart members, resources, restart parameters, etc.parameters, etc.

Stable storageStable storage Replicated at each memberReplicated at each member

Global Update protocolGlobal Update protocol NT Registry keeps local copyNT Registry keeps local copy

Windows NT Server

Regroup

Global Update

Failover Manager

Cluster Registry

Resource Monitor

ClusterDisk Driver

ClusterNet Drivers

Membership

Page 72: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

72

Cluster Registry BootstrappingCluster Registry Bootstrapping Membership uses Cluster Membership uses Cluster

Registry for list of nodesRegistry for list of nodes ……Circular dependencyCircular dependency

Solution:Solution: Membership uses stale Membership uses stale

local cluster registry local cluster registry Refresh after joining or Refresh after joining or

forming clusterforming cluster Master is eitherMaster is either

• quorum device, or

• active members

Windows NT Server

Membership

Global Update

Failover Manager

Cluster Registry

Resource Monitor

ClusterDisk Driver

ClusterNet Drivers

Regroup

Page 73: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

73

Resource MonitorResource Monitor Polls resources:Polls resources:

IsAlive and LooksAliveIsAlive and LooksAlive

Detects failuresDetects failures polling failurepolling failure failure event from resourcefailure event from resource

Higher levels tell itHigher levels tell it Online, OfflineOnline, Offline RestartRestart Windows NT Server

Regroup

Global Update

Failover Manager

Cluster Registry

Resource Monitor

ClusterDisk Driver

ClusterNet Drivers

Membership

Page 74: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

74

Failover ManagerFailover Manager

Assigns groups to nodes based on Failover parameters Possible nodes for each

resource in group Preferred nodes for

resource groupWindows NT Server

Regroup

Global Update

Failover Manager

Cluster Registry

Resource Monitor

ClusterDisk Driver

ClusterNet Drivers

Membership

Page 75: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

75

FailoverFailover(Resource Goes Offline)(Resource Goes Offline)

Resource ManagerDetects resource error.

Attempt torestart resource.

Has the Resource Retry limit

been exceeded?

Yes

No

Switch resource(and Dependants)

Offline.

Notify Failover Manager.

Are Failoverconditions

withinConstraints?

Yes

No

Yes

No

Notify Failover Manageron the new system tobring resource Online.

Leave Group inpartially Online

state.

Wait forFailback Window

Can another owner be found?

(Arbitration)

Failover Manager checks:Failover Window andFailover Threshold

Page 76: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

76

Pushing a Group Pushing a Group (Resource Failure)(Resource Failure)

Resource Monitornotifies Resource Manager

of resource failure.

Resource Managerenumerates all objects in theDependency Tree of the failed

resource.

Resource Manager notifiesFailover Manager that the Dependency Tree is Offline

and needs to fail over.

Failover Manager on thenew owner node brings the

resources Online.

Failover Manager performsArbitration to locate a new

owner for the group.

Resource Manager takeseach depending resource Offline.

Anyresource has

“Affect the Group”True

NoLeave Group inpartially Online

state.

Yes

Page 77: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

77

Pulling a GroupPulling a Group(Node Failure)(Node Failure)

Cluster Servicenotifies Failover Manager

of node failure.

Failover Managerdetermines which groupswere owned by the failed

node.

Failover Manager on thenew owner(s) bring the

resources Online in dependency order.

Failover Manager performsArbitration to locate a new

owner for the groups.

Resource Manager notifiesFailover Manager that the

node is Offlineand the groups it owned

need to fail over.

Page 78: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

78

Failback to Preferred Owner NodeFailback to Preferred Owner Node

Preferred ownercomes back Online.

Is the time withinthe Failback Window?

Failover Manager on thePreferred Owner brings the resources Online.

Failover Manager performsArbitration to locate the

Preferred Owner of the group.

Resource Manager takeseach resource on thecurrent owner Offline.

Resource Manager notifiesFailover Manager that the

Group is Offlineand needs to fail over to the

Preferred Owner.

Group may have a Group may have a Preferred OwnerPreferred Owner Preferred Owner comes back onlinePreferred Owner comes back online Will only occur during the Failback WindowWill only occur during the Failback Window

(time slot, e.g. at night)(time slot, e.g. at night)

Page 79: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

79

OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A

Page 80: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

80

ClusterService

Process StructureProcess Structure Cluster ServiceCluster Service

Failover ManagerFailover Manager Cluster RegistryCluster Registry Global UpdateGlobal Update QuorumQuorum MembershipMembership

Resource MonitorResource Monitor Resource MonitorResource Monitor Resource DLLsResource DLLs

ResourcesResources ServicesServices ApplicationsApplications

A Node

ResourceMonitor

ResourceMonitor

DLL ResourcePrivatecalls

Privatecalls

Page 81: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

81

Resource ControlResource Control

ResourceMonitor

DLL

Resource

Privatecalls

CommandsCommands CreateResource()CreateResource() OnlineResource()OnlineResource() OfflineResource()OfflineResource() TerminateResource()TerminateResource() CloseResource()CloseResource() ShutdownProcess()ShutdownProcess()

And resource eventsAnd resource events

ResourceMonitor

PrivatecallsCluster

Service

A Node

Page 82: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

82

Resource DLLsResource DLLs

Calls to Resource DLLCalls to Resource DLL Open:Open: get handle get handle Online:Online: start offering service start offering service Offline:Offline: stop offering service stop offering service

• as a standby oras a standby or

• pair-is offlinepair-is offline LooksAlive:LooksAlive: Quick check Quick check IsAlive:IsAlive: Thorough check Thorough check Terminate:Terminate: Forceful Offline Forceful Offline Close:Close: release handle release handle

OnlinePending

Online

Failed

Offline

OfflinePending

Go

Online!

I’m

Online!

I’m

Off-line!

Go

Off-line!

I’mhere!

ResourceMonitor

DLL

Resource

Privatecalls

Stdcalls

Page 83: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

83

ClusterService

ResourceMonitors

ResourceMonitors

DCOM / RPC

Cluster CommunicationsCluster Communications

Managementapps

ClusterService

ResourceMonitors

ResourceMonitors

DCOM / RPC

DCOMDCOM

DCOM / RPC: adminUDP: Heartbeat

Most communication via DCOM /RPCMost communication via DCOM /RPC UDP used for membership heartbeat messagesUDP used for membership heartbeat messages Standard (e.g. Ethernet) interconnectsStandard (e.g. Ethernet) interconnects

Page 84: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

84

OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A

Page 85: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

85

Application SupportApplication Support Virtual ServersVirtual Servers Generic Resource DLLsGeneric Resource DLLs Resource DLL VC++ WizardResource DLL VC++ Wizard Cluster APICluster API

Page 86: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

86

Virtual Servers Virtual Servers Problem:Problem:

Client and Server Applications Client and Server Applications do not want node name to change do not want node name to change when server app moves to another node.when server app moves to another node.

A Virtual Server simulates an NT NodeA Virtual Server simulates an NT Node Resource Group (name, disks, databases,…)Resource Group (name, disks, databases,…) NetName and IP address NetName and IP address

(node: \\a keeps name and IP address as is moves)(node: \\a keeps name and IP address as is moves) Virtual Registry (registry “moves” (is replicated))Virtual Registry (registry “moves” (is replicated)) Virtual Service Control Virtual Service Control Virtual RPC service Virtual RPC service

Challenges:Challenges: Limit app to virtual server’s devices and services.Limit app to virtual server’s devices and services. Client reconnect on failover (easy if connectionless Client reconnect on failover (easy if connectionless

-- eg web-clients)-- eg web-clients)

VirtualServer

\\a:1.2.3.4

VirtualServer

\\a: 1.2.3.4

Page 87: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

87

Virtual Servers (before failover)Virtual Servers (before failover) Nodes \\Y and \\Z Nodes \\Y and \\Z

support virtual servers \\support virtual servers \\A and \\BA and \\B

Things that need to fail Things that need to fail over transparentlyover transparently Client connectionClient connection Server dependenciesServer dependencies Service names Service names Binding to local resourcesBinding to local resources Binding to local serversBinding to local servers

SAP

“SAP on A” “SAP on B”

\\A \\B

SAP

SQL SQL

T:\S:\

\\Y \\Z

Page 88: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

88

Virtual Servers Virtual Servers (just after failover)(just after failover) \\Y resources and groups \\Y resources and groups

(i.e. Virtual Server \\A)(i.e. Virtual Server \\A)moved to \\Zmoved to \\Z

A resources bind to each other A resources bind to each other and to local resources (e.g., and to local resources (e.g., local file system)local file system) RegistryRegistry Physical resourcePhysical resource Security domainSecurity domain TimeTime

Transactions used to make DB Transactions used to make DB state consistent.state consistent.

To “work”, local resources on \\To “work”, local resources on \\Y and \\Z have to be similarY and \\Z have to be similar E.g. time must remain monotonic E.g. time must remain monotonic

after failoverafter failover

SAP

SQL

S:\

SAP

SQL

T:\

“SAP on A” “SAP on B”

\\A \\B

\\Y \\Z

Page 89: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

89

Address Failover andAddress Failover andClient ReconnectionClient Reconnection

Name and Address rebind Name and Address rebind to new nodeto new node Details laterDetails later

Clients reconnectClients reconnect Failure not transparentFailure not transparent Must log on againMust log on again Client context lost Client context lost

(encourages connectionless)(encourages connectionless) Applications could maintain Applications could maintain

contextcontext

SAP

SQL

S:\

SAP

SQL

T:\

“SAP on A” “SAP on B”

\\A \\B

\\Y \\Z

Page 90: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

90

Mapping Local References to Mapping Local References to Group-Relative ReferencesGroup-Relative References

Send client requests to correct Send client requests to correct serverserver \\A\SAP refers to \\.\SQL\\A\SAP refers to \\.\SQL \\B\SAP refers to \\.\SQL\\B\SAP refers to \\.\SQL

Must remap references:Must remap references: \\A\SAP to \\.\SQL$A\\A\SAP to \\.\SQL$A \\B\SAP to \\.\SQL$B\\B\SAP to \\.\SQL$B

Also handles namespace collisionAlso handles namespace collision Done viaDone via

modifying server apps, ormodifying server apps, or DLLs to transparently renameDLLs to transparently rename

SAP

SQL

S:\

SAP

SQL

T:\

“SAP on A” “SAP on B”

\\A \\B

\\Y \\Z

Page 91: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

91

Services rely on the NT node name and - or IP address Services rely on the NT node name and - or IP address to advertise Shares, Printers, and Services.to advertise Shares, Printers, and Services. Applications register Applications register namesnames to advertise services to advertise services Example: \\Alice\SQL (i.e. <node><service>)Example: \\Alice\SQL (i.e. <node><service>) Example: 128.2.2.2:80 (=http://www.foo.com/)Example: 128.2.2.2:80 (=http://www.foo.com/)

BindingBinding Clients bind to an Clients bind to an address address (e.g. name->IP address)(e.g. name->IP address)

Thus the node name and IP address must failover along Thus the node name and IP address must failover along with the services with the services (preserve client bindings)(preserve client bindings)

Naming and Binding and FailoverNaming and Binding and Failover

Page 92: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

92

Client to Cluster CommunicationsClient to Cluster CommunicationsIP address mobility based on MAC rebindingIP address mobility based on MAC rebinding

Alice <-> 200.110.120.4Virtual Alice <-> 200.110.120.5

Betty <-> 200.110.120.6Virtual Betty <-> 200.110.120.7

ClientAlice <-> 200.110.12.4

Virtual Alice <-> 200.110.12.5Betty <-> 200.110.12.6

Virtual Betty <-> 200.110.12.7

Router:200.110.120.4 ->AliceMAC200.110.120.5 ->AliceMAC200.110.120.6 ->BettyMAC200.110.120.7 ->BettyMAC

WAN

Local Network

Cluster ClientsCluster Clients Must use IP (TCP, UDP, NBT,... )Must use IP (TCP, UDP, NBT,... ) Must Reconnect or Retry after failureMust Reconnect or Retry after failure

Cluster ServersCluster Servers All cluster nodes must be on same LAN segmentAll cluster nodes must be on same LAN segment

IP rebinds to failover MAC addrIP rebinds to failover MAC addr Transparent to client or serverTransparent to client or server Low-level ARP (address Low-level ARP (address

resolution protocol) rebinds IP resolution protocol) rebinds IP add to new MAC addr.add to new MAC addr.

Page 93: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

93

TimeTime

Time must increase monotonicallyTime must increase monotonically Otherwise applications get confusedOtherwise applications get confused e.g. make/nmake/builde.g. make/nmake/build

Time is maintained within failover resolutionTime is maintained within failover resolution Not hard, since failover on order of secondsNot hard, since failover on order of seconds

Time is a resource, so one node owns time resource Time is a resource, so one node owns time resource Other nodes periodically correct drift from owner’s timeOther nodes periodically correct drift from owner’s time

Page 94: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

94

Application Local Application Local NT Registry CheckpointingNT Registry Checkpointing

Resources can request that local NT registry sub-Resources can request that local NT registry sub-trees be replicatedtrees be replicated

Changes written out to quorum deviceChanges written out to quorum device Uses registry change notification interfaceUses registry change notification interface

Changes read and applied on fail-overChanges read and applied on fail-over

\\A on \\X

registry

QuorumDevice

registry

\\A on \\B

registryEach update

After Failover

Page 95: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

95

Registry Replication

Page 96: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

96

Application SupportApplication Support Virtual ServersVirtual Servers Generic Resource DLLsGeneric Resource DLLs Resource DLL VC++ WizardResource DLL VC++ Wizard Cluster APICluster API

Page 97: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

97

Generic Resource DLLsGeneric Resource DLLs Generic Application DLLGeneric Application DLL

Simplest: just starts, stops application, and Simplest: just starts, stops application, and makes sure process is alivemakes sure process is alive

Generic Service DLLGeneric Service DLL Translates DLL calls into equivalent NT Translates DLL calls into equivalent NT

Server callsServer calls• Online => Service StartOnline => Service Start

• Offline => Service StopOffline => Service Stop

• Looks/IsAlive => Service StatusLooks/IsAlive => Service StatusResourceMonitor

DLL

ResourcePrivate

callsStdcalls

Page 98: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

98

Generic Application

Page 99: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

99

Generic Service

Page 100: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

100

Application SupportApplication Support

Virtual ServersVirtual Servers Generic Resource DLLsGeneric Resource DLLs Resource DLL VC++ WizardResource DLL VC++ Wizard Cluster APICluster API

Page 101: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

101

Resource DLL VC++ WizardResource DLL VC++ Wizard

Asks for resource type nameAsks for resource type name Asks for optional service to controlAsks for optional service to control Asks for other parameters (and associated types)Asks for other parameters (and associated types) Generates DLL source codeGenerates DLL source code Source can be modified as necessarySource can be modified as necessary

E.g. additional checks for Looks/IsAliveE.g. additional checks for Looks/IsAlive

Page 102: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

102

Creating a New WorkspaceCreating a New Workspace

Page 103: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

103

Specifying Resource Type NameSpecifying Resource Type Name

Page 104: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

104

Specifying Resource ParametersSpecifying Resource Parameters

Page 105: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

105

Automatic Code GenerationAutomatic Code Generation

Page 106: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

106

Customizing The CodeCustomizing The Code

Page 107: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

107

Application SupportApplication Support

Virtual ServersVirtual Servers Generic Resource DLLsGeneric Resource DLLs Resource DLL VC++ WizardResource DLL VC++ Wizard Cluster APICluster API

Page 108: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

108

Cluster APICluster API Allows resources to:Allows resources to:

Examine dependenciesExamine dependencies Manage per-resource dataManage per-resource data Change parameters (e.g. failover)Change parameters (e.g. failover) Listen for cluster eventsListen for cluster events etc.etc.

Specs & API became public Sept 1996Specs & API became public Sept 1996 On all MSDN Level 3On all MSDN Level 3 On web site:On web site:

http://www.microsoft.com/clustering.htmhttp://www.microsoft.com/clustering.htm

Page 109: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

109

Cluster API DocumentationCluster API Documentation

Page 110: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

110

OutlineOutline Why FT and Why ClustersWhy FT and Why Clusters Cluster AbstractionsCluster Abstractions Cluster ArchitectureCluster Architecture Cluster ImplementationCluster Implementation Application SupportApplication Support Q&AQ&A

Page 111: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

111

Research Topics?Research Topics? Even easier to manageEven easier to manage Transparent failoverTransparent failover Instant failoverInstant failover Geographic distribution (disaster tolerance)Geographic distribution (disaster tolerance) Server pools (load-balanced pool of processes)Server pools (load-balanced pool of processes) Process pair (active/backup process)Process pair (active/backup process) 10,000 nodes?10,000 nodes? Better algorithmsBetter algorithms Shared memory or shared disk among nodesShared memory or shared disk among nodes

a truly bad idea?a truly bad idea?

Page 112: ©1996, 1997 Microsoft Corp. 1 FT NT: A Tutorial on Microsoft Cluster Server (formerly Wolfpack) Joe Barrera Jim Gray Microsoft Research {joebar, gray}

©1996, 1997 Microsoft Corp.

112

ReferencesReferencesMicrosoft NT site: Microsoft NT site: http://www.microsoft.com/ntserver/http://www.microsoft.com/ntserver/

BARC site (e.g. these BARC site (e.g. these slidesslides):):http://research.microsoft.com/~joebar/wolfpackhttp://research.microsoft.com/~joebar/wolfpack

Inside Windows NT, Inside Windows NT, H. Custer, Microsoft Pr, ISBN: 155615481 H. Custer, Microsoft Pr, ISBN: 155615481

Tandem Global Update ProtocolTandem Global Update Protocol, , R. Carr, Tandem Systems Review. V1.2 1985, sketches regroup and global update protocol.R. Carr, Tandem Systems Review. V1.2 1985, sketches regroup and global update protocol.

VAXclusters: a Closely Coupled Distributed System,VAXclusters: a Closely Coupled Distributed System, Kronenberg, N., Levey, H., Strecker, W., ACM TOCS, V 4.2 1986. A (the) shared disk Kronenberg, N., Levey, H., Strecker, W., ACM TOCS, V 4.2 1986. A (the) shared disk cluster.cluster.

In Search of Clusters : The Coming Battle in Lowly Parallel ComputingIn Search of Clusters : The Coming Battle in Lowly Parallel Computing, , Gregory F. Pfister, Prentice Hall, 1995, ISBN: 0134376250. Argues for shared nothing Gregory F. Pfister, Prentice Hall, 1995, ISBN: 0134376250. Argues for shared nothing

Transaction Processing Concepts and TechniquesTransaction Processing Concepts and Techniques ,, Gray, J., Reuter A., Morgan Kaufmann, 1994. ISBN 1558601902, survey of outages, Gray, J., Reuter A., Morgan Kaufmann, 1994. ISBN 1558601902, survey of outages, transaction techniques.transaction techniques.