ibm spectrum scale 4.2.2: problem determination guide · pdf filemonitoring the ibm spectrum...

650
IBM Spectrum Scale Version 4 Release 2.2 Problem Determination Guide GA76-0443-20 IBM

Upload: ngothu

Post on 25-Mar-2018

449 views

Category:

Documents


23 download

TRANSCRIPT

  • IBM Spectrum ScaleVersion 4 Release 2.2

    Problem Determination Guide

    GA76-0443-20

    IBM

  • IBM Spectrum ScaleVersion 4 Release 2.2

    Problem Determination Guide

    GA76-0443-20

    IBM

  • NoteBefore using this information and the product it supports, read the information in Notices on page 599.

    This edition applies to version 4 release 2 modification 2 of the following products, and to all subsequent releasesand modifications until otherwise indicated in new editions:v IBM Spectrum Scale ordered through Passport Advantage (product number 5725-Q01)v IBM Spectrum Scale ordered through AAS/eConfig (product number 5641-GPF)v IBM Spectrum Scale for Linux on z Systems (product number 5725-S28)

    Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of thechange.

    IBM welcomes your comments; see the topic How to send your comments on page xiv. When you sendinformation to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believesappropriate without incurring any obligation to you.

    Copyright IBM Corporation 2014, 2017.US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contractwith IBM Corp.

  • Contents

    Tables . . . . . . . . . . . . . . . ix

    About this information . . . . . . . . xiPrerequisite and related information . . . . . . xiiiConventions used in this information . . . . . xiiiHow to send your comments . . . . . . . . xiv

    Summary of changes . . . . . . . . xv

    Chapter 1. Performance monitoring . . . 1Network performance monitoring . . . . . . . 1Monitoring GPFS I/O performance with themmpmon command . . . . . . . . . . . . 3

    Overview of mmpmon . . . . . . . . . . 3Specifying input to the mmpmon command . . . 3Display I/O statistics per mounted file system . . 5Display I/O statistics for the entire node . . . . 6Understanding the node list facility. . . . . . 8Reset statistics to zero . . . . . . . . . . 15Understanding the request histogram facility . . 16Understanding the Remote Procedure Call (RPC)facility . . . . . . . . . . . . . . . 28Displaying mmpmon version . . . . . . . 31Example mmpmon scenarios and how to analyzeand interpret their results. . . . . . . . . 31Other information about mmpmon output . . . 40

    Performance monitoring tool overview . . . . . 41Configuring the performance monitoring tool . . 43Restarting the performance monitoring tool . . 67Configuring the metrics to collect performancedata . . . . . . . . . . . . . . . . 67Viewing and analyzing the performance data . . 67

    Performance monitoring using IBM Spectrum ScaleGUI . . . . . . . . . . . . . . . . . 76

    Configuring performance monitoring options inGUI . . . . . . . . . . . . . . . . 78Configuring performance metrics and displayoptions in the Statistics page of the GUI . . . . 79Configuring the dashboard to view performancecharts . . . . . . . . . . . . . . . 82Querying performance data shown in the GUIthrough CLI . . . . . . . . . . . . . 84Monitoring performance of nodes . . . . . . 84Monitoring performance of file systems . . . . 85Monitoring performance of NSDs . . . . . . 86

    Performance monitoring limitations . . . . . . 87

    Chapter 2. Monitoring system healthusing IBM Spectrum Scale GUI . . . . 89Monitoring events using GUI . . . . . . . . 89Set up event notifications . . . . . . . . . . 90

    Configuring email notifications . . . . . . . 90Configuring SNMP manager. . . . . . . . 91

    Chapter 3. Monitoring system health byusing the mmhealth command . . . . 97

    Chapter 4. Monitoring events throughcallbacks . . . . . . . . . . . . . 107

    Chapter 5. Monitoring capacitythrough GUI . . . . . . . . . . . . 109

    Chapter 6. Monitoring AFM and AFMDR . . . . . . . . . . . . . . . . 111Fileset states for AFM. . . . . . . . . . . 111Fileset states for AFM DR . . . . . . . . . 114Callback events for AFM and AFM DR . . . . . 118Monitoring with mmpmon . . . . . . . . . . 119Monitor prefetch . . . . . . . . . . . . 119Policies used for monitoring AFM and AFM DR 120

    Chapter 7. GPFS SNMP support . . . 123Installing Net-SNMP . . . . . . . . . . . 123Configuring Net-SNMP . . . . . . . . . . 124Configuring management applications . . . . . 124Installing MIB files on the collector node andmanagement node. . . . . . . . . . . . 125Collector node administration . . . . . . . . 125Starting and stopping the SNMP subagent. . . . 126The management and monitoring subagent . . . 126

    SNMP object IDs . . . . . . . . . . . 127MIB objects . . . . . . . . . . . . . 127Cluster status information . . . . . . . . 127Cluster configuration information . . . . . 127Node status information. . . . . . . . . 128Node configuration information . . . . . . 128File system status information . . . . . . . 129File system performance information . . . . 130Storage pool information . . . . . . . . 130Disk status information . . . . . . . . . 131Disk configuration information . . . . . . 131Disk performance information . . . . . . . 132Net-SNMP traps . . . . . . . . . . . 132

    Chapter 8. Monitoring the IBMSpectrum Scale system by using callhome . . . . . . . . . . . . . . . 135Understanding call home . . . . . . . . . 135Configuring call home to collect details. . . . . 138Sharing collected details with IBM Support . . . 143Use case . . . . . . . . . . . . . . . 144

    Copyright IBM Corp. 2014, 2017 iii

    ||

  • Chapter 9. Monitoring the health ofcloud services . . . . . . . . . . . 147

    Chapter 10. Best practices fortroubleshooting . . . . . . . . . . 149How to get started with troubleshooting . . . . 149Set up event notifications . . . . . . . . . 149Back up your data. . . . . . . . . . . . 150Resolve events in a timely manner . . . . . . 151Keep your software up to date . . . . . . . 151Subscribe to the support notification. . . . . . 151Know your IBM warranty and maintenanceagreement details . . . . . . . . . . . . 151Know how to report a problem . . . . . . . 151Other problem determination hints and tips . . . 152

    Which physical disk is associated with a logicalvolume in AIX systems? . . . . . . . . . 153Which nodes in my cluster are quorum nodes? 153What is stored in the /tmp/mmfs directory andwhy does it sometimes disappear? . . . . . 154Why does my system load increase significantlyduring the night? . . . . . . . . . . . 154What do I do if I receive message 6027-648? . . 154Why can't I see my newly mounted Windowsfile system? . . . . . . . . . . . . . 154Why is the file system mounted on the wrongdrive letter? . . . . . . . . . . . . . 154Why does the offline mmfsck command failwith "Error creating internal storage"? . . . . 155Why do I get timeout executing function errormessage? . . . . . . . . . . . . . . 155Questions related to active file management . . 155

    Chapter 11. Understanding the systemlimitations . . . . . . . . . . . . . 157

    Chapter 12. Collecting details of theissues . . . . . . . . . . . . . . 159Collecting details of issues by using logs, dumps,and traces . . . . . . . . . . . . . . 159

    Time stamp in GPFS log entries . . . . . . 159Logs . . . . . . . . . . . . . . . 160Setting up core dumps on a client system . . . 181Configuration changes required on protocolnodes to collect core dump data . . . . . . 182Trace facility. . . . . . . . . . . . . 183

    Collecting diagnostic data through GUI . . . . 196CLI commands for collecting issue details . . . . 197

    Using the gpfs.snap command. . . . . . . 197mmdumpperfdata command . . . . . . . 207mmfsadm command . . . . . . . . . . 209Commands for GPFS cluster state information 210GPFS file system and disk informationcommands . . . . . . . . . . . . . 214

    Collecting details of the issues from performancemonitoring tools . . . . . . . . . . . . 228Other problem determination tools . . . . . . 229

    Chapter 13. Managing deadlocks . . . 231Debug data for deadlocks . . . . . . . . . 231Automated deadlock detection . . . . . . . 232Automated deadlock data collection . . . . . . 233Automated deadlock breakup . . . . . . . . 234Deadlock breakup on demand. . . . . . . . 235

    Chapter 14. Installation andconfiguration issues . . . . . . . . 237Resolving most frequent problems related toinstallation, deployment, and upgrade . . . . . 237

    Finding deployment related error messagesmore easily and using them for failure analysis . 237Problems due to missing prerequisites . . . . 242Problems due to mixed operating system levelsin the cluster . . . . . . . . . . . . 245Problems due to using the spectrumscaleinstallation toolkit for functions orconfigurations not supported . . . . . . . 246Understanding supported upgrade functionswith spectrumscale installation toolkit . . . . 249

    Post installation and configuration problems . . . 250Cluster is crashed after reinstallation . . . . . 250Node cannot be added to the GPFS cluster . . . 251Problems with the /etc/hosts file. . . . . . . 251Linux configuration considerations . . . . . . 251Problems with running commands on other nodes 252

    Authorization problems . . . . . . . . . 252Connectivity problems . . . . . . . . . 253GPFS error messages for rsh problems . . . . 253

    Cluster configuration data file issues . . . . . 253GPFS cluster configuration data file issues. . . 253GPFS error messages for cluster configurationdata file problems . . . . . . . . . . . 254Recovery from loss of GPFS clusterconfiguration data file . . . . . . . . . 254Automatic backup of the GPFS cluster data . . 255

    GPFS application calls . . . . . . . . . . 255Error numbers specific to GPFS applicationscalls . . . . . . . . . . . . . . . 255

    GPFS modules cannot be loaded on Linux. . . . 256GPFS daemon issues . . . . . . . . . . . 256

    GPFS daemon will not come up . . . . . . 25