module 7: server cluster maintenance and troubleshooting

17
Module 7: Server Cluster Maintenance and Troubleshooting

Upload: liseli

Post on 11-Jan-2016

46 views

Category:

Documents


4 download

DESCRIPTION

Module 7: Server Cluster Maintenance and Troubleshooting. Overview. Cluster Maintenance Troubleshooting Cluster Service. Cluster Maintenance. Backup Restoring the First Node Restoring Cluster Disks Restoring the Second Node Evicting a Node. Backup. Backing Up the System State - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Module 7: Server Cluster Maintenance and Troubleshooting

Module 7: Server Cluster Maintenance and Troubleshooting

Page 2: Module 7: Server Cluster Maintenance and Troubleshooting

Overview

Cluster Maintenance

Troubleshooting Cluster Service

Page 3: Module 7: Server Cluster Maintenance and Troubleshooting

Backup

Restoring the First Node

Restoring Cluster Disks

Restoring the Second Node

Evicting a Node

Cluster Maintenance

Page 4: Module 7: Server Cluster Maintenance and Troubleshooting

Backup

Backing Up the System State

Backing Up the Local Disk

Backing Up the Cluster Disk

Page 5: Module 7: Server Cluster Maintenance and Troubleshooting

Restoring the First Node

Steps For Restoring a Server Cluster:

1. Restore the first node

2. Restore the cluster disks

3. Restore the second node

4. Perform node testing

Page 6: Module 7: Server Cluster Maintenance and Troubleshooting

Restoring Cluster Disks

Restoring Disk Signature Files

Restoring the Data on the Cluster Disk

Restoring the Cluster Configuration Files

Page 7: Module 7: Server Cluster Maintenance and Troubleshooting

Restoring the Second Node

Restoring the Remaining Node(s) of a Cluster

Perform Node Testing

Page 8: Module 7: Server Cluster Maintenance and Troubleshooting

Evicting a Node

Steps for Evicting a Node

1. Back up both nodes

2. Verify backup

3. Move all groups to the remaining node

4. Stop Cluster service on the node to be removed

5. Evict the node

6. Unplug the server from the shared bus

Page 9: Module 7: Server Cluster Maintenance and Troubleshooting

Troubleshooting Cluster Service

Troubleshooting Tools

Examining the Cluster Log

Troubleshooting Network Communications

SCSI Configuration Problems

Group and Resource Failures

Quorum Log Corruption

Page 10: Module 7: Server Cluster Maintenance and Troubleshooting

Troubleshooting Tools

Disk Manager

Task Manager

Performance Monitor

Network Monitor

Dr. Watson

Services Snap-in

Page 11: Module 7: Server Cluster Maintenance and Troubleshooting

Examining the Cluster Log

Copy of cluster - Wordpad

Creates a new cluster group

000003b8.000003b4::2000/10/02-19:44:12.946 [CS] Cluster Service started – Cluster Node Vers000003b8.000003b4::2000/10/02-19:44:12.946 OS Version 5.0.21

000003b8.000002f0::2000/10/02-19:44:12.957 [CS] Service Starting…000003b8.000002f0::2000/10/02-19:44:13.007 [EP] Initialization…000003b8.000002f0::2000/10/02-19:44:13.057 [DM]: Initialization000003b8.000002f0::2000/10/02-19:44:13.097 [DM]: Loading cluster database form D:\WINNT\clu000003b8.000002f0::2000/10/02-19:44:13.397 [DM] DmpStartFlusher: Entry000003b8.000002f0::2000/10/02-19:44:13.397 [DM] DmpStartFlusher: thread created000003b8.000002f0::2000/10/02-19:44:13.427 [NM] Initializing…000003b8.000002f0::2000/10/02-19:44:13.427 [NM] Local node name = SERVER1.000003b8.000002f0::2000/10/02-19:44:13.427 [NM] Local node ID = 1.000003b8.000002f0::2000/10/02-19:44:13.427 [NM] Creating object for node 1 (SERVER1)000003b8.000002f0::2000/10/02-19:44:13.437 [NM] Initializing networks.000003b8.000002f0::2000/10/02-19:44:13.447 [NM] Initializing network interfaces.000003b8.000002f0::2000/10/02-19:44:13.788 [NM] Initializing complete.000003b8.000002f0::2000/10/02-19:44:13.848 [NM] Starting worker thread…000003b8.000002f0::2000/10/02-19:44:13.848 [API] Initializing000003b8.000002f0::2000/10/02-19:44:13.848 [FM] Worker thread running000003b8.000002f0::2000/10/02-19:44:13.878 [LM] :LMInitialize Entry.000003b8.000002f0::2000/10/02-19:44:13.878 [LM] :TimerActInitialize Entry.000003b8.000002f0::2000/10/02-19:44:13.878 [CS] Service Domain Account = [email protected]::2000/10/02-19:44:13.878 [CS] Initializing RPC server.000003b8.000002f0::2000/10/02-19:44:14.038 [INIT] Attempting to join cluster MYCLUSTER000003b8.000002f0::2000/10/02-19:44:14.048 [JOIN] Spawning thread to connect to sponsor 10.000003b8.000002f0::2000/10/02-19:44:14.048 [JOIN] Spawning thread to connect to sponsor 169

File Edit View Insert Format Help

The IDs of the process and thread issuing the log entry

timestamp event description

event description

Page 12: Module 7: Server Cluster Maintenance and Troubleshooting

Troubleshooting Network Communications

Troubleshooting Node-to-Node Communication

Verify RPC Communication’s

Verify Cluster Heartbeats

Troubleshooting Client-to-Node Communications

Check NetBT Cache with Nbtstat

Ping IP Address

WINS Static Mappings

Page 13: Module 7: Server Cluster Maintenance and Troubleshooting

SCSI Configuration Problems

SCSI Controllers

SCSI Terminiation

SCSI Cabling

Page 14: Module 7: Server Cluster Maintenance and Troubleshooting

Group and Resource Failures

Cluster Administrator – [MYCLUSTER (MYCLUSTER)]

File View Window Help

For Help, press F1

MYCLUSTERGroups

Cluster GroupMygroupSQL Group

ResourcesCluster ConfigurationSERVER1SERVER2

Name State Owner Reso

Cluster IP Address Online SERVER2 IP AdCluster Name Online SERVER2 NetwDisk W: Online SERVER2 PhysiPrinter Spooler Online SERVER2 PrintPublic Failed SERVER2 File S

NUM

Page 15: Module 7: Server Cluster Maintenance and Troubleshooting

Quorum Log Corruption

Reset the Quorum Log

Clussvc –debug -resetquorumlog

Delete the Quorum Log

-noquorumlogging

Page 16: Module 7: Server Cluster Maintenance and Troubleshooting

Lab A: Cluster Maintenance

Page 17: Module 7: Server Cluster Maintenance and Troubleshooting

Review

Cluster Maintenance

Troubleshooting Cluster Service