emc® vplex™ with ibm® db2® purescale · the powerha purescale caching facilities (also...

28
White Paper EMC® VPLEX™ WITH IBM® DB2® PURESCALE ™ Abstract This white paper provides best practices, planning, and use cases for implementing EMC VPLEX with the IBM DB2 pureScale feature. March 2015

Upload: lamnga

Post on 09-Apr-2018

223 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

 

White Paper

EMC® VPLEX™ WITH IBM® DB2® PURESCALE ™

Abstract

This white paper provides best practices, planning, and use cases for implementing EMC VPLEX with the IBM DB2 pureScale feature.

March 2015

Page 2: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

 

2 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Copyright © 2015 EMC Corporation. All rights reserved. Published in the USA.

Published March 2015

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Part Number h14034

Page 3: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Contents

 

3EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Contents

Chapter 1  Overview 5 Executive Summary .................................................................................................... 6 

Audience .................................................................................................................... 6 

What Is EMC VPLEX? ................................................................................................... 6 

What Is IBM DB2 pureScale? ....................................................................................... 6 

VPLEX Distributed Volumes and DB2 pureScale. ......................................................... 8 

Validated Configuration .............................................................................................. 8 

Chapter 2  Configuring VPLEX for use with DB2 pureScale 9 Site Requirements ....................................................................................................10 

Host Configuration ...................................................................................................10 

SAN Boot Considerations ..........................................................................................10 

Configuring IBM AIX to Use VPLEX Volumes ..............................................................10 

ODM fileset installation .......................................................................................11 

VPLEX Provisioning ...................................................................................................13 

Steps to Provision with VPLEX ..............................................................................13 

Additional VPLEX Documentation .............................................................................14 

Best Practices for Deploying DB2 pureScale with VPLEX ............................................14 

DB2 pureScale Setup with VPLEX ..............................................................................15 

Location of Cluster Nodes ....................................................................................15 

Quorum Setup .....................................................................................................15 

GPFS Setup ..........................................................................................................15 

Chapter 3  Site Preferences with VPLEX Distributed Virtual Volumes 18 Detach Rules ............................................................................................................19 

VPLEX Witness ..........................................................................................................20 

Chapter 4  Failure Scenarios 21 Terminology ..............................................................................................................22 

Conclusion ...............................................................................................................25 

Appendix A  References 27 References ...............................................................................................................28 

Page 4: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Contents

 

4 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Figures Figure 1.  Typical DB2 pureScale Cluster ............................................................... 7 

Figure 2.  Stretched DB2 pureScale Cluster with VPLEX Metro ............................... 8 

Figure 3.  Provision from Storage Volume Wizard ................................................13 

Figure 4.  VPLEX Witness Configuration ...............................................................20 

Tables Table 1.  Failure Scenarios and Impacts .............................................................22 

Page 5: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 1: Overview

 

5EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Chapter 1 Overview

This chapter presents the following topics:

Executive Summary ................................................................................................. 6 

Audience ................................................................................................................. 6 

What Is EMC VPLEX? ................................................................................................ 6 

What Is IBM DB2 pureScale? .................................................................................... 6 

VPLEX Distributed Volumes and DB2 pureScale. ....................................................... 8 

Validated Configuration ........................................................................................... 8 

Page 6: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 1: Overview

 

6 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Executive Summary

This white paper provides information about using EMC VPLEX Metro distributed virtual volumes with IBM DB2 pureScale clusters running on the IBM AIX operating system. Topics discussed include configuration, failure scenarios, and an in-depth description of VPLEX distributed virtual volumes.

Audience

These technical notes are for EMC field personnel and partners and customers who will be configuring, installing, and supporting VPLEX. An understanding of these technical notes requires an understanding of the following:

SAN technology and network design

Fiber Channel block storage concepts

VPLEX concepts and components

IBM DB2 pureScale

The AIX operating system

PowerVM virtualization

What Is EMC VPLEX?

EMC VPLEX is a federation solution that can be stretched across two geographically dispersed data centers separated by synchronous distances with a maximum round trip latency of 5 milliseconds. It provides simultaneous access to storage devices at two sites through creation of VPLEX distributed virtual volumes, supported on each side by a VPLEX Cluster.

Distributed virtual volumes are synchronized copies of data (RAID1 mirrors), exposed through two geographically separated VPLEX Clusters.

Each VPLEX Cluster is itself highly available, scaling from two directors per VPLEX Cluster up to eight directors per VPLEX Cluster. Furthermore, each director is supported by independent power supplies, fans, and interconnects. Each VPLEX Cluster has no single point of failure.

What Is IBM DB2 pureScale?

IBM DB2 pureScale ™ is a feature that clusters the IBM DB2 RDBMS with an Active-Active shared-disk database implementation based on the DB2 for z/OS data sharing architecture.

Page 7: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 1: Overview

 

7EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Figure 1. Typical DB2 pureScale Cluster

A typical DB2 pureScale cluster consists of:

Two or more DB2 pureScale members

Two pureScale servers (cluster caching facilities, CF)

SAN-attached cluster storage running IBM® General Parallel File System (GPFS™)

A high-speed, low-latency cluster interconnect such as InfiniBand (IB).

Each DB2 member represents a DB2 processing engine. The members cooperate with each other and with the PowerHA pureScale server to provide coherent access to the database from any member. The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data coherency. The Caching Facilities also act as a fast cache for DB2 pages.

Page 8: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 1: Overview

 

8 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

VPLEX Distributed Volumes and DB2 pureScale.

EMC VPLEX distributed devices transparently provide two-site data availability, removing the need for GPFS synchronous replication. Placing pureScale data on VPLEX distributed devices will allow a regular pureScale cluster to be extended geographically without the need of the GDPC feature. There is no need to license, install and maintain the additional GDPC feature.

Validated Configuration

Figure 2 illustrates a configuration validated by EMC using DB2 pureScale with a VPLEX Metro deployment.

Figure 2. Stretched DB2 pureScale Cluster with VPLEX Metro

Page 9: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 2: Configuring VPLEX for use with DB2 pureScale

 

9EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Chapter 2 Configuring VPLEX for use with DB2 pureScale

This chapter presents the following topics:

Site Requirements ................................................................................................. 10 

Host Configuration ................................................................................................. 10 

SAN Boot Considerations ....................................................................................... 10 

Configuring IBM AIX to Use VPLEX Volumes ............................................................ 10 

VPLEX Provisioning ................................................................................................ 13 

Additional VPLEX Documentation ........................................................................... 14 

Best Practices for Deploying DB2 pureScale with VPLEX ......................................... 14 

DB2 pureScale Setup with VPLEX ........................................................................... 15 

Page Break – DO NOT DELETE

Page 10: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 2: Configuring VPLEX for use with DB2 pureScale

 

10 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Site Requirements

The maximum round trip latency on the Fibre Channel network between the two sites must not exceed 5ms. The Fibre Channel network is required by inter-cluster links connecting the two VPLEX Clusters within VPLEX Metro.

In addition to the host cluster requirements, the maximum round trip latency on the IP network between the two sites should not exceed 5ms. The IP network supports the hosts and the VPLEX Management Console.

All latency and connectivity requirements specified for DB2 pureScale still apply, particularly for rDMA.

The use of VPLEX Witness is required to meet HA standards for Continuous Data Availability.

Host Configuration

Specific requirements:

AIX native MPIO is the only supported multipathing solution. EMC PowerPath and Veritas DMP are not supported.

VPLEX ODM fileset version must be 6.0.0.5 or higher.

Both standalone hosts and PowerVM LPARs are supported.

NPIV HBA virtualization is supported.

For full details consult the latest EMC® Host Connectivity Guide for IBM AIX (P/N 300-000-608)

Note: Configuration of PowerVM is beyond the scope of this document; see IBM’s PowerVM documentation for details.

SAN Boot Considerations

A preliminary configuration decision is whether to use SAN boot or local boot; that is, whether the root volume group (rootvg) is on SAN-attached or internal storage. Having rootvg on a VPLEX local or distributed device is supported for most use cases. However, for DB2 pureScale implementations it is recommended to have rootvg on an internal SCSI or vSCSI device. By having the rootvg on an internal device, the host can continue operation during an all-paths-down event, since GPFS will route I/O requests through other hosts in the GPFS cluster used by pureScale. If a host is booted from SAN, an all-paths-down event will take the system offline.

Configuring IBM AIX to Use VPLEX Volumes

Page 11: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 2: Configuring VPLEX for use with DB2 pureScale

 

11EMC VPLEX WITH IBM DB2 PURESCALE White Paper

VPLEX will present its volumes to the host with the device type "Invista" as the Invista array. Therefore, in this section, to be consistent with what you will see in the example outputs, "Invista devices" or "Invista" will be used when referring to devices that VPLEX present to the hosts. You must configure each IBM AIX host to recognize VPLEX Virtual Volumes. To do this, install the VPLEX ODM filesets by completing the following steps:

1. Log in to the host as root.

2. Remove all devices from the host configuration.

3. Obtain the ODM package from the EMC-FTP server:

ftp ftp.emc.com

cd/pub/elab/aix/ODM_DEFINITIONS

bin

get EMC.AIX.6.0.0.5.tar.Z

Note: Use the most recent fileset available. The ODM readme file lists the supported versions of AIX.

4. In the /tmp directory, uncompress the tar package:

uncompress EMC.AIX.6.0.0.5.tar.Z

5. Extract the tar package:

tar -xvf EMC.AIX.6.0.0.5.tar

6. Perform the remaining steps using either the command line or SMIT:

To use the command line:

installp -ac -gX -d . EMC.Invista.aix.rte

installp -ac -gX -d . EMC.Invista.fcp.MPIO.rte

Once completed, verify that the installation summary reports SUCCESS.

Note: The period between -d and EMC.Invista specifies the current directory.

To use SMIT, type:

smitty installp

a. Select Install and Update form Latest Available Software.

b. Select the input device to be the current working directory: directory ./

c. Select F4=List to list the available software.

ODM fileset installation

Page 12: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 2: Configuring VPLEX for use with DB2 pureScale

 

12 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

d. Select F7=Select to select

EMC INVISTA AIX Support Software and

EMC INVISTA FCP MPIO Support Software

Note: Do not attempt to install the “EMC INVISTA FCP Support Software” package

e. Select Enter.

f. Accept the default options and press Enter to initiate the installation.

g. Scroll to the bottom of the installation summary screen to verify that the SUCCESS message is displayed.

h. Exit SMIT.

7. Configure the Disks

# cfgmgr

8. Verify that all hdisks have been configured as Invista devices, with the expected paths:

# lsdev -Cc disk

hdisk0 Available 25-T1-01 EMC INVISTA FCP MPIO Disk

hdisk1 Available 25-T1-01 EMC INVISTA FCP MPIO Disk

hdisk2 Available 25-T1-01 EMC INVISTA FCP MPIO Disk

hdisk3 Available 25-T1-01 EMC INVISTA FCP MPIO Disk

hdisk4 Available 25-T1-01 EMC INVISTA FCP MPIO Disk

hdisk5 Available 25-T1-01 EMC INVISTA FCP MPIO Disk

hdisk6 Available 25-T1-01 EMC INVISTA FCP MPIO Disk

hdisk7 Available 25-T1-01 EMC INVISTA FCP MPIO Disk

# lspath -l hdisk0

Enabled hdisk0 fscsi0

Enabled hdisk0 fscsi0

Enabled hdisk0 fscsi1

Enabled hdisk0 fscsi1

Enabled hdisk0 fscsi1

Enabled hdisk0 fscsi1

9. Verify that the Fibre Channel fscsi devices have dynamic tracking enabled and FC Fabric Event Error RECOVERY Policy set to fast fail:

# lsattr -El fscsi0

Dyntrk yes Dynamic Tracking of FC Devices True

fc_err_recov Fabric Event Error RECOVERY Policy True

If the parameters are not set, use the chdev command to set them correctly. This will require a system reboot.

Page 13: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 2: Configuring VPLEX for use with DB2 pureScale

 

13EMC VPLEX WITH IBM DB2 PURESCALE White Paper

# chdev -l fscsiN -a fc_err_recov=fast_fail -P

# chdev -l fscsiN -a dyntrk=yes -P

Where “N” is the number of the controller (0, 1, 2, etc) and the “-P” parameter will make the change take effect at the next reboot.

VPLEX Provisioning

The Provision from Storage Volumes wizard shown in Figure 3 allows you to provision a virtual volume directly from a storage volume or preserve data on an existing storage volume that you want to expose to hosts. The wizard simplifies the provisioning process by automatically claiming storage (if not already claimed) and creating all of the underlying storage objects (extents and devices), and then creating a local or distributed virtual volume that is the total capacity of the selected storage volume. When provisioning from storage volumes, you can create only one virtual volume at a time. Each virtual volume created maps to a storage volume on the array. You can provision from storage volumes using integrated or non-integrated storage arrays. Refer to Provisioning Overview for more information on integrated and non-integrated storage.

Figure 3. Provision from Storage Volume Wizard

Creating a new virtual volume is as simple as six steps:

1. Select or create consistency group(s) for the volume(s)

2. Select protection or mirroring options (optional)

Steps to Provision with VPLEX

Page 14: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 2: Configuring VPLEX for use with DB2 pureScale

 

14 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

3. Select storage volume(s) to use to create the virtual volume(s)

4. Export the virtual volume(s) to host(s) by selecting or creating a storage view

5. Review your selections and submit the provisioning request

6. View Results from finished provisioning job(s)

Additional VPLEX Documentation

See the VPLEX Procedure Generator for information on array configuration and best practices; and for provisioning failure and troubleshooting information.

See the Unisphere for VPLEX Online Help for information on using the GUI to provision storage.

See the VPLEX CLI Guide for information on provisioning related commands and their usage.

See the EMC Simplified Support Matrix for VPLEX for information on the supported arrays, and Array Management Providers (AMPs).

Please see the Understanding the preferred VPLEX site for distributed virtual volumes section for best practices on distributed virtual volumes on VPLEX.

Best Practices for Deploying DB2 pureScale with VPLEX

The following additional points should be noted when provisioning VPLEX storage for DB2 pureScale use:

Register all host initiator-ports in VPLEX to use the “aix” host-type.

Distributed Storage used for DB2 pureScale must be provisioned to all hosts in the DB2 pureScale cluster.

All distributed storage used for DB2 pureScale should be in consistency groups; they must be in consistency groups for proper VPLEX Witness functionality.

To simplify administration, it is recommended that all VPLEX Distributed Storage used for DB2 pureScale data usage be in the same consistency group.

If a VPLEX distributed device is used for tiebreaker quorum device, it should be in the same consistency group as the data devices.

If Majority Node Set quorum is used, the site with the larger number of nodes should be defined as the preferred site for the VPLEX distributed virtual volumes or consistency groups used by the pureScale cluster.

Note: See the next section for more information on quorum.

Page 15: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 2: Configuring VPLEX for use with DB2 pureScale

 

15EMC VPLEX WITH IBM DB2 PURESCALE White Paper

DB2 pureScale Setup with VPLEX

For maximum availability, the component nodes in a cluster must be distributed as follows:

One CF and at least one member server must be at VPLEX Cluster-1 site

One CF and at least one member server must be at VPLEX Cluster-2 site

Host clusters have mechanisms for establishing a quorum of hosts during a cluster partition to prevent split-brain; DB2 pureScale has the user-selectable choice of tiebreaker disk or majority node set. Both have been tested successfully with VPLEX; the type appropriate for the customer’s specific setup should be determined in consultation with IBM support.

If majority node set quorum is used, the VPLEX preferred site for the pureScale distributed devices should be the site with the greater number of DB2 pureScale hosts. That is, if the site served by VPLEX cluster-1 has 3 total pureScale nodes and the site served by cluster-2 has 2 pureScale nodes, cluster-1 should be the preferred site for the VPLEX consistency group with the pureScale data devices.

If a tiebreaker disk is used, the tiebreaker disk should be a VPLEX distributed device in the same consistency group as the data devices.

For better control and insight into the configuration process, it is recommended to set up the GPFS cluster first, independently of the DB2 pureScale installation. Once the GPFS file system setup is complete, the pureScale instance can be installed as documented by IBM for a “User Managed File System”

After VPLEX provisioning and host configuration is complete, VPLEX volumes will appear as hdisks on each host. At this point, GPFS setup is the same as for any other storage device.

The key steps required at this stage are:

Creating the GPFS NSDs (Network Shared Disks) for all hosts in the cluster acting as servers for the NSD.

Enabling SCSI-3 persistent reserve support on the NSDs

Location of Cluster Nodes

Quorum Setup

GPFS Setup

Page 16: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 2: Configuring VPLEX for use with DB2 pureScale

 

16 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Full details can be found in the GPFS Administration and Programming Reference, or the on-line IBM Knowledge Center for GPFS. In summary, the following steps are needed:

a. Create the GPFS cluster with the mmcrcluster command and a node file. For example, the following node file will create a five-node GPFS cluster:

Sample nodefile.txt:

dsveh130.lss.emc.com:quorum

dsveh156.lss.emc.com:quorum

dsveh157.lss.emc.com

dsveh195.lss.emc.com

dsveh150.lss.emc.com

# mmcrcluster –N nodefile.txt

b. Define the VPLEX devices as NSDs (Network Shared Disk) with the mmcrnsd command and a stanza file.

For example, the following stanza file will create an NSD called nsd_data2 on hdisk6. This will format the hdisks for GPFS use.

Note: All hosts in the cluster will act as servers.

Sample stanza file nsd02_stanza

%nsd: device=/dev/hdisk6

nsd=nsd_data2

servers=dsveh130,dsveh195,dsveh156,dsveh157

usage=dataAndMetadata

# mmcrnsd –F nsd02_stanza

c. Create the file system

This example will create a GPFS file system at mount point /gpfs2 using the stanza from the previous step:

# mmcrfs –F nsd02_stanza –T /gpfs2

d. Enable SCSI-3 Persistent Reserve

Create the /var/mmfs/etc/prcapdevices file on each host in the DB2 pureScale environment with the following line. Add this line if the file already exists.

EMC:Invista:5400

Page 17: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 2: Configuring VPLEX for use with DB2 pureScale

 

17EMC VPLEX WITH IBM DB2 PURESCALE White Paper

e. Stop the GPFS cluster with the mmshutdown –a command. Execute the following command to enable persistent reserve:

/usr/lpp/mmfs/bin/mmchconfig usePersistentReserve=yes

f. Restarting the GPFS cluster, and verify that the devices show pr=yes by using the mmlsnsd command:

# mmlsnsd -X

DiskName NSDvolumeID Device Devtype NodeName Remarks

------------------------------------------------------

0A6C478254007F91 /dev/hdisk6 hdisk dsveh130 server node,pr=yes

0A6C478254007F91 /dev/hdisk6 hdisk dsveh150 server node,pr=yes

0A6C478254007F91 /dev/hdisk6 hdisk dsveh156 server node,pr=yes

0A6C478254007F91 /dev/hdisk6 hdisk dsveh157 server node,pr=yes

0A6C478254007F91 /dev/hdisk6 hdisk dsveh195 server node,pr=yes

g. Proceed with DB2 pureScale installation and instance creation as documented by IBM.

Page 18: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 3: Site Preferences with VPLEX Distributed Virtual Volumes

 

18 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Chapter 3 Site Preferences with VPLEX Distributed Virtual Volumes

This chapter presents the following topics:

Detach Rules ......................................................................................................... 19 

VPLEX Witness ....................................................................................................... 20 

Page Break – DO NOT DELETE

Page 19: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 3: Site Preferences with VPLEX Distributed Virtual Volumes

 

19EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Detach Rules

For each distributed virtual volume, VPLEX defines a detach rule. When there is a communication failure between the two clusters in VPLEX Metro, this detach rule identifies which VPLEX Cluster in a VPLEX Metro should detach its mirror leg, thereby allowing service to continue. The detach rule effectively defines a preferred site if VPLEX Clusters lose communication with each other. The purpose of having a defined preferred site is to ensure that there is no possibility of a “split brain” caused by both VPLEX Clusters continuing to allow I/O during communication failure.

After a complete communication failure between the two VPLEX Clusters, the preferred site continues to provide service to the distributed virtual volume. The other VPLEX Cluster will suspend I/O service to the volume and is referred to as the non-preferred site. The detach rule is at the distributed virtual volume level and hence any given site could be the preferred site for some distributed virtual volume and the non-preferred site for others. A VPLEX Metro instance can support several thousand distributed virtual volumes (see the EMC VPLEX with GeoSynchrony and Point Releases Release Notes for current limits), and each such volume has its own detach rule. It is therefore possible for the same VPLEX Cluster (and therefore the hosts connected to it) to be on the preferred site with respect to one distributed virtual volume but to be on the non-preferred site with respect to another distributed virtual volume.

It is a best practice for all host clusters to configure the host cluster disk resource definitions and the VPLEX detached rules in parallel. That is, the highest-priority (or owning) host cluster node for a host cluster disk resource should be at the same site as the VPLEX preferred site for the distributed virtual volume(s) in that disk resource. That is, if the preferred host (owning host) for a host cluster disk resource is located at Site 1, all the distributed virtual volumes defined to be in that cluster disk resource should have their detach rules set to have Site 1 as the preferred site. Failure to follow this best practice will increase the number of situations where manual intervention will be required to restore data availability after a system outage.

There are two conditions that can cause the VPLEX Clusters to lose communication:

Total VPLEX Cluster failure at one site (failure of all directors in a VPLEX Cluster): A complete VPLEX Cluster failure triggers the detach rule behaviors since the surviving VPLEX Cluster does not have the ability to distinguish between interlink communication loss and VPLEX Cluster failure.

As a result, distributed virtual volumes whose preferred site is the surviving VPLEX Cluster will continue to service I/O without interruption. The distributed virtual volumes, whose preferred site is the failed VPLEX Cluster site, will enter into I/O suspension until manual intervention is performed. That is, all I/O activity to the virtual volume will be suspended by the VPLEX Cluster.

Page 20: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 3: Site Preferences with VPLEX Distributed Virtual Volumes

 

20 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Failure of the inter-cluster communication links (VPLEX Cluster partition): The VPLEX Cluster partition case will also trigger the execution of the detach rule. Each distributed virtual volume will allow I/O to continue on its preferred site and suspend I/O on its non-preferred site.

When the VPLEX Cluster failure or VPLEX Cluster partition condition is resolved, the VPLEX Metro distributed virtual volume gets re-established, enabling I/O on both VPLEX Metro sites

VPLEX Witness

Starting in GeoSynchrony 5.0, VPLEX Witness helps multi-cluster VPLEX configurations automate the response to cluster failures and inter-cluster link outages. VPLEX Witness is a component installed as a VM on a customer host. The customer host must be deployed in a separate failure domain from either VPLEX cluster to eliminate the possibility of a single fault affecting both a cluster and VPLEX Witness.

VPLEX Witness connects to both VPLEX clusters over the management IP network. VPLEX Witness observes the state of the clusters, and thus can distinguish between an outage of the inter-cluster link and a cluster failure. VPLEX Witness uses this information to guide the clusters to either resume or suspend I/O.

Note: VPLEX Witness works in conjunction with consistency groups VPLEX Witness guidance does not apply to local volumes and distributed volumes that are not members of a consistency group. In Metro systems, VPLEX Witness provides seamless zero RTO fail-over for storage volumes in synchronous consistency groups

Figure 4. VPLEX Witness Configuration

Page 21: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 4: Failure Scenarios

 

21EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Chapter 4 Failure Scenarios

This chapter presents the following topics:

Terminology .......................................................................................................... 22 

Conclusion ............................................................................................................ 25 

Page Break – DO NOT DELETE

Page 22: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 4: Failure Scenarios

 

22 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Terminology

In Table 1, the following terminology is used:

Hosts running in the preferred site refers to hosts running on the preferred site for the Metro distributed virtual volume supporting the cluster disk resource for those hosts.

Hosts running in the non-preferred site refers to hosts running on the non-preferred site for the Metro distributed virtual volume supporting the cluster disk resource for those hosts.

All of these scenarios assume that each host in the cluster has been configured with supported multipathing and cluster software with the required settings for failover according to documented high-availability requirements.

Table 1. Failure Scenarios and Impacts

Scenario VPLEX behavior pureScale cluster impact

Single VPLEX back-end (BE) path failure

VPLEX will switch to alternate paths to the same BE array and continue to provide access to the Metro distributed virtual volumes exposed to the hosts.

None.

Single VPLEX front-end (FE) path failure

VPLEX will continue to provide access to the Metro distributed virtual volume via alternate paths to the same VPLEX Cluster from the cluster host. The cluster host multipathing software will be expected to fail over to the alternate paths.

None.

BE array failure (preferred site for a Metro distributed virtual volume)

VPLEX will continue to provide access to the Metro distributed virtual volume through the non-preferred site BE array. When access to the array is restored, the storage volumes from the preferred site BE array will be resynchronized automatically.

None.

BE array failure (non-preferred site for a Metro distributed virtual volume)

VPLEX will continue to provide access to the Metro distributed virtual volume using the preferred site BE array. When access to the array is restored, the storage volumes from the non-preferred site BE array will be resynchronized automatically.

None.

VPLEX director failure

VPLEX will continue to provide access to the Metro distributed virtual volume through front-end paths available through other directors on the same VPLEX Cluster.

None.

Page 23: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 4: Failure Scenarios

 

23EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Scenario VPLEX behavior pureScale cluster impact

Complete VPLEX site failure at preferred site: Without VPLEX Witness

VPLEX will suspend I/O on the Metro distributed virtual volume on the non-preferred site. Once it is determined by the administrator that the site has failed, and it is not a case of inter-site communication failure, the volumes on the non-preferred site can be unsuspended ("resumed") using the choose-winner command. Note that this process is manual intentionally. While the automated resumption of I/O works in the site failure, it does not work in the VPLEX Cluster partition case. Warning: Issuing the unsuspend command automatically on the non-preferred site would cause both sites to become simultaneously read-writeable, creating a potential split brain condition.

This scenario will result in a cluster-wide data unavailability, requiring manual intervention to resolve. Hosts running in the preferred site: I/O will fail. Hosts running in the non-preferred site: I/O will fail until the administrator resumes access. Restarting GPFS and pureScale will be required after access is restored at the non-preferred sit.

Complete VPLEX site failure at preferred site: With VPLEX Witness

With a Witness implementation, VPLEX will not suspend I/O on the Metro distributed virtual volumes on the non-preferred site. All I/O will continue at the non-preferred site.

Hosts running in the preferred site: FC Paths will fail. I/O continues via GPFS nsd servers on non- preferred site. Hosts running in the non-preferred site: Normal I/O will continue without interruption.

Complete VPLEX site failure at non-preferred site.

VPLEX will continue to provide I/O access to the preferred site.

Hosts running in the preferred site: No impact. Hosts running in the non-preferred site: I/O continues via GPFS nsd servers on preferred site.

Single cluster host and a VPLEX director failure at the same site

The surviving VPLEX directors on the VPLEX Cluster with the failed director will continue to provide access to the Metro distributed virtual volumes.

The surviving hosts will lose a path, but I/O will continue.

Single director and back-end path failure at the same site

The surviving VPLEX directors on the VPLEX Cluster with the failed director will continue to provide access to the virtual volumes. VPLEX will switch to alternate paths (if available) to the same back end and continue to provide access to the Metro distributed virtual volumes.

The surviving hosts will lose a path, but I/O will continue. Path loss will be logged by errpt on the affected hosts.

Page 24: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 4: Failure Scenarios

 

24 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Scenario VPLEX behavior pureScale cluster impact

Cluster host all paths down (encountered when the cluster host loses access to its storage volumes, that is, VPLEX volumes in this case)

None.

I/O continues on all hosts; on host with all paths down I/O is through GPFS nsd servers on hosts that still have connectivity.

VPLEX inter-site link failure, host cluster heartbeat network intact

VPLEX will transition distributed virtual volumes on the non-preferred site to the I/O suspension state. On the preferred site, the distributed virtual volumes will continue to provide access. Note that in this case, I/O at the non-preferred site should not be manually unsuspended. In this case, given that both VPLEX Clusters survive, the preferred site will continue to allow I/O. Unsuspending I/O on the non-preferred site will result in the same distributed virtual volume to be read-writeable on both legs, creating a potential split brain condition. By restoring the inter-site links, the distributed virtual volume will become unsuspended on the non-preferred site.

Hosts running in the preferred site: No impact. Hosts running in the non-preferred site: I/O continues via GPFS nsd servers on preferred site.

Complete dual site failure

Upon power on of a single VPLEX Cluster, VPLEX will intentionally keep all distributed virtual volumes in the suspended state even if it is the preferred site until such time as it is able to reconnect to the other site or unless the administrator manually resumes I/Os on these volumes using the device resume-link-down command. This behavior is to account for the possibility that I/Os have continued on the other site (either automatically, if the other site was preferred, or manually, if the other site was non-preferred) and thereby protect against data corruption.

VPLEX storage will return to a normal running state when both sites are up. The DB2cluster will have to be restarted manually, e.g. by executing mmmount <filesystem> on the GPFS file systems, and d then executing DB2start.

Director failure at one site (preferred site for a given distributed virtual volume) and BE array failure at the other site (secondary site for a given

The surviving VPLEX directors within the VPLEX Cluster with the failed director will continue to provide access to the Metro distributed virtual volumes. VPLEX will continue to provide access to the Metro distributed virtual volumes using the preferred site BE array.

None.

Page 25: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 4: Failure Scenarios

 

25EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Scenario VPLEX behavior pureScale cluster impact

distributed virtual volume)

Host cluster network partition (in InfiniBand network) but VPLEX WAN links remain intact

None.

The nodes in the pureScale cluster will go offline; the CFs will go into an Error state and the member servers will attempt to restart on other nodes. When the network is restored, the nodes will return to normal.

Host cluster inter-site network as well as VPLEX inter-site network partition

VPLEX will suspend I/O on the non-preferred site for a given distributed virtual volume. The volumes will continue to have access on the distributed virtual volume on its preferred site. Note that in this case, I/O at the non-preferred site should not be manually unsuspended. In this case, given that both VPLEX Clusters survive, the preferred site will continue to allow I/O. Unsuspending I/O on the non-preferred site will result in the same Metro distributed virtual volume to be read-writeable on both legs, creating a potential split brain condition. By restoring the inter-site networks, the distributed virtual volume will become unsuspended on the non-preferred site.

The nodes in the pureScale cluster will go offline; the CFs will go into an Error state and the member servers will attempt to restart on other nodes. When the network is restored, the nodes will return to normal. Hosts running in the non-preferred site: Paths will fail. I/O is still possible via GPFS nsd servers on non- preferred site. Hosts running in the preferred site: I/O will continue.

In the failure modes described involving VPLEX Cluster failures, after the cluster is recovered, it joins back into the VPLEX Metro instance. If a distributed virtual volume was running I/O on the peer site (either because this was the preferred site or because the administrator had manually chosen to resume I/Os), the joining VPLEX Cluster will recognize this and immediately provide the latest data back to the hosts accessing the same distributed virtual volume through the joining VPLEX Cluster. Any stale data in the joining VPLEX Cluster is discarded and/or overwritten.

Conclusion

By using VPLEX distributed virtual volumes, data can be transparently made available to all nodes in a DB2 pureScale cluster divided across two physical locations. The

Page 26: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Chapter 4: Failure Scenarios

 

26 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

host cluster can be defined as a local cluster, without needing the GDPC add-on feature.

Section Break, DO NOT DELETE

Page 27: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

EMC Confidential Appendix A: References

 

27EMC VPLEX WITH IBM DB2 PURESCALE White Paper

Appendix A References

This appendix presents the following topics:

References............................................................................................................. 28 

Page Break – DO NOT DELETE

Page 28: EMC® VPLEX™ WITH IBM® DB2® PURESCALE · The PowerHA pureScale Caching Facilities (also referred to as CF) provide a scalable and centralized locking mechanism to ensure data

Appendix A: References EMC Confidential

 

28 EMC VPLEX WITH IBM DB2 PURESCALE White Paper

References

VPLEX white papers can be on EMC.com and EMC Online Support

Other resources include:

EMC Host Connectivity Guide for AIX

EMC VPLEX with IBM AIX Virtualization and Clustering (EMC white paper)

IBM Developer Works, Deploying the DB2 pureScale Feature.

GPFS V3.5.0.11: Administration and Programming Reference (Document SA23-2221-08)

GPFS V3.5.0.11: Concepts, Planning, and Installation Guide (Document GA76-0413-08)

IBM Knowledge Center:

http://www-01.ibm.com/support/knowledgecenter/