introduction to windows cluster

29
Introduction to Clustering

Upload: dinesh-moorthy

Post on 10-Jul-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Windows Cluster

Introduction to Clustering

Page 2: Introduction to Windows Cluster

2

Prerequisites

Before starting this session, you should understand what fault tolerance and load balancing mean.

Page 3: Introduction to Windows Cluster

3

Industry Definition of Cluster• Cluster Definition:

– A group of computers and storage devices that work together and can yet be accessed as a single system.

• A Cluster provides:– Distribution of processing load – Automatic recovery from failure of one or more

components in the cluster

Page 4: Introduction to Windows Cluster

4

Availability, Scalability and Manageability

• Availability: – Measure of the amount of time a system or component

performs its specified function. • Scalability:

– The ability to incrementally add smaller, standard systems as needed to meet overall processing power requirements.

• Manageability:– The ease of administering a cluster solution to include

configuration, updates and/or patches, and new additions.

Page 5: Introduction to Windows Cluster

5

Availability OverviewNode1 Services

Web Clients

Node4 Provides access to SQL Database

Node1

Node2

Node3

Node4

Cluster Solution

Node2 Services Web Clients

Node4 Provides access to SQL Database

Node1

Node2

Node3

Node4

Cluster Solution

Before Node1 Failure After Node1 FailureNode1 Fails

Page 6: Introduction to Windows Cluster

6

Scalability Overview

• Scaling up:– Scaling up is achieved by adding more resources,

such as memory, processors, and disk drives to a system.

• Scaling Out:– Scaling out delivers high performance when the

throughput requirements of an application exceed the capabilities of an individual system.

Page 7: Introduction to Windows Cluster

7

Manageability Overview The following questions must be answered:• Setup

– How easy is it to install the cluster solution? • Configuration

– How easy is it to install applications into the cluster and administer the different aspects of the clustering software?

– How easy is it to dynamically increase or scale up the cluster solution when your business requirements exceed the current capacity?

• Disaster Recovery– How quickly and easily in the event of a complete and total disaster can you bring the cluster

solution back into production?• Application

– On the applications that you install into the cluster, what type of additional maintenance and administration is required above a stand-alone version of the application.

• Application updates – How easy is it to update the applications when the time comes for new features or security

updates?• Operating System patch management.

– How easy is it to update the Core OS on which the cluster server runs or update the cluster service due to security patches being released or patches to resolve bugs in the existing software.

Page 8: Introduction to Windows Cluster

8

Cluster Solution BenefitsFactors to be considered while planning a Cluster Deployment:

• Cost of hardware– Cost of the Computers or Nodes– Networking devices such as Switches or routers.– Shared or External Storage (SAN)

• Cost of the Cluster Software Product or Suite– This would be the OS, Clustering Software and applications that will be

used to run on the cluster.• Cost of ownership

– What you need to take into account is that the hardware might be cheaper but it will possibly take more man-hours from your administrative and developer staff to implement, design, create and maintain the cluster solution.

Page 9: Introduction to Windows Cluster

9

Cluster Models and Their Configurations

• Active/ Active• Active/ Passive

Page 10: Introduction to Windows Cluster

10

Active / Active

File/ Print Group 1

File/ Print Group 2

Server

Cluster

Server

Capacity to Failover Group 2

Capacity to Failover Group 1

Page 11: Introduction to Windows Cluster

11

Active / Passive

File/ Print Group 1

Server

Cluster

Server

Capacity to Failover Group 1

Page 12: Introduction to Windows Cluster

12

Active/Active Configuration

Node BNode ADisk 2Disk 1

Quorum

Capacity to failover Group 1

Cluster Service

\\Engineering

Group 1

\\Accounting

Group 2

Capacity to failover Group 2

Disk 1

Disk 2

Page 13: Introduction to Windows Cluster

13

Active/Passive Configuration

Node B

Quorum

Disk 1

Node A

Node A manages virtual server \\Accounting.Node B is configured as a hot spare and will take ownership of \\accounting if Node A goes offline

Cluster Service

\\Accounting

Group 1

Disk 1

Page 14: Introduction to Windows Cluster

14

Microsoft Technologies for Clustering

• Two Microsoft technologies for clustering:– Network Load Balancing (NLB)– Server Cluster (MSCS)

• NLB and MSCS must be installed on separate machines• Example

– Front-End NLB servers hosting IIS and communicating with a Backend MSCS Cluster for Database information

NLB Hosting IIS MSCS Hosting Database

Client

Page 15: Introduction to Windows Cluster

15

Microsoft Windows 2003 Server Cluster (MSCS)

Additional Capabilities provided by MSCS

• Every node has full connectivity and communication with the other nodes in the cluster through the following:– One or more shared SCSI, iSCSI or Fibre Channel buses for Block Level

storage. – A private network, or interconnect, that carries only internal cluster

communication. – One or more public networks.

• Every node in the cluster is aware when another system joins or leaves the cluster.

• Every node in the cluster is aware of the resources that are running locally as well as the resources that are running on all other cluster nodes.

Page 16: Introduction to Windows Cluster

16

Server Cluster and NLB ComparisonServer Cluster NLB

• Used for databases, e-mail services, line of business (LOB) applications, and custom applications

• Used for Network Services such as Web servers, FTP Servers, firewalls, and other networking services

• Included with Windows Server 2003, Enterprise Edition, and Windows Server 2003, Datacenter Edition

• Included with all four versions of Windows Server 2003

• Provides high availability, scalability and server consolidation

• Provides high availability and scalability

• Can be deployed on a single network or geographically distributed

• Generally deployed on a single network but can span multiple networks if properly configured

• Supports clusters up to eight nodes • Supports clusters up to 32 nodes

• Requires the use of shared or replicated storage

• Does not require any special hardware or software and works “out of the box”

Page 17: Introduction to Windows Cluster

17

Microsoft Server Cluster Terminology and Definitions (1)

A group of independent network servers that present themselves to a network as a single systemCLUSTER

A cluster node is a Microsoft Windows 2003 Server system that has a working installation of the Cluster service. NODE

Resources are physical or logical entities, such as a file share, that are managed by the Cluster serviceRESOURCES

All resources can have the following states: Online, Offline, Online pending, Offline pending and Failed.

RESOURCE STATES

A dependency is a two-way association between resources. RESOURCE DEPENDENCIES

Groups are a collection of resources that need to be managed as a single unit for configuration and recovery purposes. GROUPS

Failover is the process of moving a group of resources from one node to another in the case of a failure or for administrative tasks. FAILOVER

Page 18: Introduction to Windows Cluster

18

Microsoft Server Cluster Terminology and Definitions (2)

Groups that contain an IP Address resource and a network name resource and appear as individual servers to clientsVIRTUAL SERVER

All nodes must have a network link between them that they can use to communicate with each other.

CLUSTER NETWORK

The shared disks are logical devices that all the cluster nodes are attached to via the shared bus. SHARED DISKS

A group of independent network servers that present themselves to a network as a single system

QUORUM RESOURCE

Cluster service is the collection of software on each node that manages all cluster specific activity. CLUSTER SERVICE

A “cluster-aware” application is any application that has been designed to function on a cluster and ships with a resource DLL.

CLUSTER-AWARE AND CLUSTER-

ENABLED APPLICATIONS

Failback is the process of returning a group of resources to the node on which it was running before a failover occurred .FAILBACK

Page 19: Introduction to Windows Cluster

19

New Cluster Setup Features (1)The default installation of Clustering reduces the administrative overhead and also does not require a reboot.Installed by Default

Node eviction does not require a reboot. This results in increased availability and easier disaster recovery when there is a node failure.Node Eviction

Allows other nodes in the cluster to function while a node OS, is upgraded to a newer version.Rolling Upgrades

The cluster service can queue up changes that need to be completed if a node is offline.Queued Changes

Uninstalling Cluster Service from a node is now a one-step process of evicting the node.

Simpler Un-installation

Remote Administration allows full remote creation and configuration of the server cluster.

Remote Administration

Page 20: Introduction to Windows Cluster

20

New Cluster Setup Features (2)A pre-configuration analysis ensures that any known incompatibilities are detected prior to configuration.

Pre-configuration Analysis

Installation of cluster service now allows multiple nodes to be added to a server cluster in a single operation.Multi-Node Addition

The disk that needs to be used as the Quorum Resource is automatically configured on the smallest disk that is larger then 50 MB and formatted with NTFS.

Quorum Selection

If a node is not attached to a shared disk, it will automatically configure as a "Local Quorum" resource. Local Quorum

Page 21: Introduction to Windows Cluster

21

Administrative EnhancementsIn Windows Server 2003, you can change the Cluster Service account password without having to take the cluster offline. Password Change

Cluster Service now includes enhanced logic for Group Failover, when you have a cluster with three or more nodes.

Enhanced Node Failover

Group Affinity Support allows an application to describe itself as an N+I (N active nodes and I “spare” nodes)

Group Affinity Support

WMI allows server clusters to be managed as part of an overall WMI environment.WMI Support

Resources can be deleted in Cluster Administrator or with Cluster.exe without taking the resources offline first. Resource Deletion

Page 22: Introduction to Windows Cluster

22

Supporting and Troubleshooting Enhancements (1)

Software Tracing is a new method for debugging that allows debugging the Cluster Service without loading checked build versions of the dlls.Software Tracing

The use of Event Log allows event log parsing and management tools to be used to track successful failovers rather than just catastrophic failures.

Event Log

During configuration of Cluster Service, a separate setup log (%SystemRoot%\system32\Logfiles\Cluster\ClCfgSrv.log) is created to assist in troubleshooting.

Clcfgsrv.log

The use of the Chkdsk utility enables easier monitoring and troubleshooting.Chkdsk logging

Page 23: Introduction to Windows Cluster

23

Supporting and Troubleshooting Enhancements (2)

The cluster.log file has been changed to add logging levels (ERR, INFO, WARN) to entries in the log, thereby making it easier to locate problem sections in the log.

Cluster.log– new info

The cluster.obj file eliminates the need to open the registry to figure out the friendly name of the resource. Cluster.obj

The Offline/Failure Reason Codes allow the application to have different semantics if the applications has failed or some dependency of the application has failed

Offline/Failure Reason Codes

The Cluster diagnostic tool greatly assists in the analysis of cluster logs by capturing the Cluster.log file from each node.Clusdiag

Page 24: Introduction to Windows Cluster

24

Disaster Recovery Enhancements• NT-Backup / ASR• Confdisk and Clusterrecovery

Page 25: Introduction to Windows Cluster

25

Confdisk

Confdisk.exe -- is a tool that can be used to recover failed disks in a cluster. We need to use Confdisk.exe in conjunction with the Cluster Recovery and Cluster Administrator tools due to the nature of cluster troubleshooting.

Page 26: Introduction to Windows Cluster

26

Clusterrecovery

Page 27: Introduction to Windows Cluster

27

Microsoft Windows Server Cluster Benefits of Microsoft Clusters:• Support for automatic recovery of services in the event of failure of one or more

computers within the cluster.• Provision of data consistency across all nodes in the cluster.• Standard, cross-platform application programming interface (API) for developing and

supporting “cluster-aware” and “cluster-enabled” applications.• Standard set of clustering services for clusters from many different hardware vendors.• Increased scalability by allowing new components to be added as system load

increases without taking existing cluster services offline. • Ability to allow administrators to manage a cluster as a single system and to manage

applications as if they were running on a single server. • Improves the availability of client/server applications by increasing the availability of

server resources. • By clustering existing hardware with new computers, you protect your investment in

both hardware and software: Instead of replacing an existing computer with a new one of twice the capacity, you can simply add another computer of equal capacity.

Page 28: Introduction to Windows Cluster

28

Additional ReferencesThe following Microsoft articles provide information on Cluster, SAN

and Disk Management.• http://technet.microsoft.com/en-us/library/

aa996161%28v=exchg.65%29.aspx• http://blogs.technet.com/b/askcore/archive/2007/11/12/so-what-

does-cluster-recovery-actually-recover-anyway.aspx• http://support.microsoft.com/kb/323437• 280297: How to Configure Volume Mount Points on a Clustered

Server• 304736: How to Extend the Partition of a Cluster Shared Disk• 301647: Cluster Service improvements for Storage Area Networks

(SANs)

Page 29: Introduction to Windows Cluster

Q & A

Thank you