idugna2014-g05-a db2 dbas guide to purescale

1

#IDUG

A DB2 DBA's Guide to pureScaleKelly SchlambExecutive IT Specialist, IBM Canada Ltd.Session Code: G05Wednesday, May 14, 2014, 8:00am - 9:00am | Platform: DB2 for LUW

If you’ve been considering moving forward with the pureScale feature then I’m sure you’re asking yourself some questions about what’s involved. Sure, DB2 is DB2, but how similar is a pureScale environment from a non-pureScale one? Do your current operational practices still apply? Is there anything new that you have to know or do to manage the system? In this presentation you’ll be assured by the many similarities, but you will also learn what’s new or slightly different in the pureScale world.

Biography: Kelly Schlamb has worked with DB2 for LUW for over 19 years in various roles at IBM. He is currently an Executive IT Specialist within the Worldwide Information Management Technical Sales organization focusing on DB2 High Availability, DB2 pureScale, and IBM PureData System for Transactions. Prior to this, Kelly was a long time member of the DB2 Kernel development group in the IBM Toronto Lab. He started on the Buffer Pool Services & Storage Management development team, working on various new features and capabilities in this area. This included leading the group that introduced Automatic Storage and adding various enhancements to it in the DB2 releases that followed. Subsequently, Kelly spent over five years as the technical lead and manager of the DB2 pureScale Recovery development team. This team was responsible for designing and implementing logging, backup/restore, crash recovery, rollforward, and transaction management in the DB2 pureScale environment.

2

#IDUG

Topics Covered Today

• pureScale Overview• Prerequisites and Preparing for pureScale• Configuration• Backup, Logging & Recovery• Storage Management• Monitoring

2

These are the topics that we'll be covering in today's presentation.

3

#IDUG

pureScale Overview

3

<no speaker notes>

4

#IDUG

DB2 pureScaleScalability, Performance, and Always Available Transactions

• DB2 pureScale• Robust infrastructure for OLTP workloads • Provides improved availability, performance and scalability• Application transparency • Scales to >100 members• Leverages z/OS cluster technology

• Highlights of pureScale enhancementsin DB2 10.5

• Rich disaster recovery options, now includingintegrated HADR support

• Backup and restore between pureScale and non-pureScale environments• Online database fix pack updates (in addition to system & OS updates) • Add members online for additional capacity• Included in Advanced Workgroup and Advanced Enterprise editions

4

pureScale is a DB2 feature that reduces the risk and cost of business growth by providing extreme capacity, continuous availability, and application transparency. DB2 pureScale uses the same architecture as the undisputed gold standard of reliability – System Z. This is a shared data, active/active architecture and businesses trust this architecture to run their most critical systems.

5

#IDUG

Main thing to remember… pureScale is DB2

• pureScale looks and feels very much like "regular DB2"• Same code base shared by DB2, DPF, and pureScale• In DB2 10.1 and 10.5, pureScale is just an installable feature of DB2

• Immediate productivity from DBAs and application developers• Single system view for utilities

• Act and behave exactly like they do in non-pureScale• Backup, restore, rollforward, reorg, load, …

• Applications don’t need to know about or care about the fact that aremultiple members

• In general, can run SQL statements or command on any member• SQL, data access methods, and isolation levels are the same• Backup/recovery processes are the same• Database security is managed in the same way• Environment (even the CFs) still managed by database manager and database

configuration parameters

5

I've sometimes had people comment to me that by introducing pureScale into their environment, their DB2 DBAs have to learn a whole new database platform. That couldn't be further from the truth. As the slide title says, pureScale is DB2. pureScale is just a deployment option of DB2, like the Database Partitioning Feature (DPF) is. It's all one code base and as part of the installation of DB2 you choose whether it's going to be a pureScale environment or not.

All of the skills that a DBA has with DB2 is immediately transferable to pureScale. It has the same look and feel, the utilities work in the same way, the same SQL and commands exist, security is the same, etc, etc.

Even though there are new concepts in pureScale (which will be discussed over the next few slides) things have been made to work as seamlessly as possible within the DB2 framework. For instance, you have the concept of the CF (Cluster Caching Facility) in pureScale and this new structure can be configured to suit your needs. However, rather than introducing a new interface for configuring the CF, the existing database manager configuration and database configuration methods are used. So, if you know how to view and change these configuration parameters, you'll be easily able to do the same for the new CF-related ones.

6

#IDUG

DB2 pureScale Architecture• Multiple DB2 members for scalable and

available database environment

• Client application connects into any DB2 member to execute transactions

• Automatic workload balancing

• Shared storage for database data and transaction logs

• Cluster caching facilities (CF) provide centralized global locking and pagecache management for highest levels of availability and scalability

• Duplexed, for no single point of failure

• High speed, low latency interconnect for efficient and scalable communication between members and CFs

• DB2 Cluster Services provides integrated failure detection, recovery automation and the clustered file system

Shared Storage

Database

Logs Logs LogsLogs

Cluster Interconnect

MemberCS

MemberCS

MemberCS

MemberCS

Primary CF

CFCSSecondary CF

CFCS

Clients

DB2 pureScale Cluster (Instance)

6

DB2 pureScale keeps your critical systems available all the time, giving you uninterrupted access to your data, making sure that your business is up all the time, and your line of business gets the high level of availability they need, through planned and unplanned outages. DB2 pureScale provides continuous availability of data through the use of the highly reliable cluster caching facility (for redundancy purposes and no single point of failure, there are two of these "CFs"). The CF provides centralized locking and cache management and is very different from architectures in competitive products that use a distributed model. With the distributed locking architecture, lock ownership is distributed across multiple nodes and recovery from a node failure requires lock redistribution across the surviving nodes. Also during node failure and recovery, I/O is frozen until the database system is able to determine what data pages need recovery. This can be a relatively lengthy process, which has a significant impact on data availability. None of this is required with DB2 pureScale because that information is centrally managed and is not impacted by DB2 member failures (and there are two CFs, in case a node hosting a CF fails or needs to be brought down for maintenance).

This slide describes the architecture of a DB2 pureScale cluster (also referred to as a pureScale instance). It is an active/active data sharing environment in which multiple nodes – called DB2 members – handle the transactional workload of the system and they also have equal and shared access to a single copy of the database on disk. Clients are able to connect into any of the members and there are default workload balancing capabilities that will distribute the workload across all of the members. If a member fails, client connections are automatically rerouted to healthy members. The members communicate with the aforementioned CFs for the purpose of global locking and data page caching. This communication takes place over a high speed, low latency interconnect. Specifically, it is done via the RDMA (Remote Direct Memory Access) protocol over either InfiniBand or RDMA-capable 10 Gigabit Ethernet.

Integrated into DB2 pureScale are various other IBM products, including the General Parallel File System (GPFS), Reliable Scalable Cluster Technology (RSCT), and Tivoli Systems Automation for Multiplatform (TSA MP). Collectively, they are known within pureScale as DB2 Cluster Services. These products provide the clustered file system on which the database data and transaction logs reside, as well as the cluster domain management, heart-beating, and recovery automation. Together, all of these components allow for very fast recovery when node failures occur. These products are fully integrated into DB2 pureScale, in that they are all installed together as part of a single installation process, they are configured as part of installation, and they are managed and maintained together all within pureScale.

7

#IDUG

What in pureScale is new to a DB2 DBA?

• CFs provide centralized locking and cache management• Global Lock Manager and Group Buffer Pool

• Other components automatically installed andconfigured as part of pureScale install• RSCT provides heart beating, domain management• TSA defines resources and dependencies, drives recovery• GPFS provides a clustered file system on which shared data resides

• Online, automated member/CF recovery

• Automatic workload balancing across all members

• Multiple log streams are maintained, one per member• Automatically merged when necessary (e.g. rollforward, group crash recovery)

• Monitoring includes members and CFs• Storage management primarily done through DB2 interfaces to GPFS• Cluster management done through DB2 interfaces to RSCT/TSA

7

This slide talks to some of the new components and concepts, as well as some slight differences between pureScale and traditional DB2 environments. As you can see here – and will learn more about later – they're really not that big a deal from a skills perspective.

pureScale is a multiple-server environment made up of members and CFs. Within each CF are components such as the group buffer pool and the global lock manager.

DB2 keeps track of changes to data in the database through its transaction logs and in a pureScale environment each member has its own set of logs (referred to as a log stream). And if DB2 needs to perform some sort of operation that requires multiple log streams, it will automatically merge them as necessary, invisible to the user.

In addition to the database manager itself, a pureScale environment also includes other IBM technologies such as RSCT (heart beating, domain management), TSA (recovery automation), and GPFS (the clustered file system). These components are all a part of a single integrated install (as is the case with fix packs as well). And rather than having to learn the commands and operations for these different products, we've abstracted out many of the common operations into a DB2 command (db2cluster). Through this command you can configure/manage/view the storage and cluster manager.

8

#IDUG

Centralized Lock Management

• Local lock manager (LLM) exists on each member• Responsible for granting locks locally to individual application transactions

• Global lock manager (GLM) exists on the CF• Grants locks to members upon request (if not already held by another member, or

currently held by the member in a compatible mode)• Maintains global lock state (what member has what lock, in what mode, who's waiting)

• When member needs a lock it doesn't already hold, the LLM coordinates with the GLM to get it

• Via fast RDMA requests and lock negotiation

• Locks can be:• Logical/transactional (e.g. row locks and table locks)• Physical (e.g. page locks)

8

In a pureScale environment there are two locking components that work together to manage data access across the cluster. The first is the Local Lock Manager (LLM) that resides on each member and the second is the Global Lock Manager (GLM) that resides on the CF. The LLM handles things at a local level within a member and understands what locks are being held and requested by applications/transactions within a member. The GLM has a cluster-wide understanding of the locks being held and requested within the cluster. It doesn’t know specifics about applications or transactions, it only knows members are involved.

If a transaction running on a member requires a lock (e.g. a row lock) then the LLM will first see if the member is already holding a lock that can satisfy the request. If it doesn’t then it must request it from the GLM. Communication between the LLM and GLM involve “set lock state” requests and responses, as well as notifications and negotiations if the GLM can’t immediately satisfy a request. The messages and responses takeplace using RDMA. RDMA (Remote Direct Memory Access) allows one server to reach into the memory of another server and read or write bytes of data. No interrupt processing takes place, there is no context switching, and the target server spends no CPU cycles on the operation. This is increasingly important as the size of your cluster grows.

In pureScale, locks are categorized as either logical or physical. Logical locks are simply transactional locks, such as the row or table locks you might get while executing a transaction. These are the types of locks you have in non-pureScale. Physical locks are unique to pureScale and are used for concurrency control on "physical" objects, most notably pages (sometimes you'll hear them referred to as "P-Locks"). These physical locks are not transactional in nature. That means that one of these locks can be given up prior to the end of a transaction (i.e. you don't have to commit the transaction to give up the lock). You'll see more about page negotiation (a.k.a. page reclaim) via the page locks in the next few slides.

9

#IDUG

Physical Locks vs. Logical Locks

• Locks that are held on rows, tables, etc. are referred toas logical locks

• Associated with applications and transactions

• Physical locks (P-Locks) are used for serialization of a physical resource, like a page• For example, a member will hold a P-Lock in X mode when it is in the middle of updating a

page and can release the lock when it is done

• P-Locks are not transactional in nature• Associated with a member• Requested as needed, freed when no longer needed• Typically held until end of transaction, but protocol doesn’t strictly require them being

held until commit or rollback time• Can be negotiated away if another member wants them

• Use of P-Locks is mostly invisible within the database• Not included in monitor metrics• Cannot result in deadlocks• Simply an internal database primitive that is built on the locking infrastructure

4

Give me your lunch money!

9

Physical Locks (P-Locks) are not something that you typically hear much about, as their use by DB2 is mostly invisible to users. However, they are commonly talked about when describing pureScale internals (such as page reclaim/negotiation) and so they are explained here.

A physical lock is used for concurrency/serialization of physical resources, such as a page. They are not transactional in nature (unlike logical locks, which are locks that are associated with rows, tables, etc.). They are requested and freed as necessary.

Although internal, the “physical lock” concept and term is commonly used when describing page access within a pureScale database and therefore it is described here.

10

#IDUG

Two Levels of Data Page Caching

• Local buffer pools (LBP) exist on each member• LBPs cache modified pages and clean pages

• A single group buffer pool (GBP) exists in the CF per database• Global cache, containing modified pages written from member LBPs• Page registry tracks which pages exist in the member LBPs

• Allows for invalidation of old versions of page on other members when a page is updated and those changes have been committed

• Pages are written from the LBP to the GBP at:• Transaction commit time (referred to as "force at commit")• During page reclaim (a.k.a. page negotiation)

• Member has modified a page, another member wants to use that page

• Pages can be read by members very quickly from the GBP• Avoids costly reads from disk• Speeds up member recovery in case of a failure

10

There are two levels of data page caching that occur within a DB2 pureScale cluster. This first is the caching of data in the buffer pools found on the members. These are referred to as local buffer pools (LBP) in a pureScale environment. These are the buffer pools you're familiar with in non-pureScale DB2.

With pureScale, there is also a global buffer pool (GBP), which resides within the CF (and is duplexed across both the primary and secondary CF).

When transactions are executing on a member, pages are being read into the local buffer pool and modifications are made locally. When a transaction commits, all of the pages that were modified by the transaction get sent to the GBP (on both the primary and secondary CF). This is pureScale's "force at commit" protocol. When this happens, older copies of the page that might exist in local buffer pools on other members are invalidated. This is accomplished via the page registry. This registry resides with the GBP and keeps track of all of the pages that exist in the local buffer pools and the GBP. RDMA is used to do this invalidation in a very efficient and scalable manner.

Pages may also be written to the GBP under other circumstances. For instance, during page negotiation. This is when a page has been modified on one member (as part of an uncommitted transaction that updates a row, for example) and another member wants to update the same page (perhaps to update a different row on that page). The page is locked exclusively (X) by the first member and when the request comes in from the second member, the lock is used to negotiate the page away from the first member. As previously mentioned, page locks are not transactional in nature and so they don't need to be held until commit time. Therefore, concurrency is not an issue.

For those pages that are in the GBP, if a member wants to read or update them, they can be quickly read from the GBP into an LBP using RDMA. Reading it from the GBP is orders of magnitude faster than reading a page from disk (tens of microseconds vs. single digit milliseconds).

Various monitor metrics can be used to monitor the caching activity that is occurring at both levels in the LBPs and GBP.

11

#IDUG

Page Reclaim

Member 0

CF

P

LBP

Member 1

LBP

Member 2

LBP

Member 3

LBP

P

P

P

1. Uncommitted update modifies page P on M0 (page locked X)

2. Transaction on M3 wants to modify page P; requests page lock in X

GBP RegistryGLM

P: M1-X P: M0, M1, M3

3. Page P negotiated away

(via page lock); page P written to GBP; lock given up

P

P

4. Old copies of page P are invalidated

5. X lock on page P granted to M3

P: M3-X

6. With lock acquired, page P found to be invalid, so latest copy is requested from GBP

P

P

All lock requests, page transfers, and invalidation is done using Remote Direct Memory Access (RDMA). This allows for high performance and high scalability, even as the cluster size grows

11

Although difficult to fully describe through graphics on a single slide, the intention here is to give you an idea of how page reclaim (a.k.a. page negotiation) works. The different steps are described with the text and the graphics on this slide.

As mentioned at the bottom of the slide, all of the communication between the members and CFs – including lock requests, writing pages to the GBP, reading pages from the GBP, and page invalidation (sometimes referred to as silent invalidation or cross invalidation) – is done via Remote Direct Memory Access (RDMA). RDMA allows one server to reach into the memory of another server and read or write bytes of data (it's actually accomplished using the network adapter cards). With RDMA, no interrupt processing takes place, there is no context switching, and the target server spends no CPU cycles on the operation. This is increasingly important as the size of your cluster grows.

pureScale's exploitation of RDMA is unique and is one of the reasons that pureScale performs so well and can scale up without needing to make applications cluster aware.

12

#IDUG

pureScale Client Configuration

• Workload Balancing (WLB)• Application requests balanced across all members or

subsets of members• Takes server load of members into consideration• Connection-level or transaction-level balancing

• Client Affinity• Direct different groups of clients or workloads to specific

members in the cluster• Consolidate separate workloads/applications on same

database infrastructure• Define list of members for failover purposes

• Automatic Client Reroute (ACR)• Client automatically connected to healthy member in case

of member failure• May be seamless in that no error messages returned

to client• Application may have to re-execute the transaction X X

12

Database applications running in a DB2 pureScale environment can use the DB2 transaction-level or connection-level workload balancing (WLB) functionality. WLB balances application requests among all members of the DB2 pureScale cluster. When WLB is enabled the DB2 clients distribute workload or application request based on the capacity (that is, the priority or weight) values in a server list that the DB2 pureScale server returns. These capacity values indicate the current load on a DB2 pureScale member. A member with a capacity value below that of the other members in the server list is considered busier than other members.

The “db2pd –serverlist” command can be used to see the relative load (priority or weight) of a member. A member (A) having a higher value compared with another member (B) indicates to the client that more work should be directed at member A. CPU load average (over recent time) and memory utilization (based on looking at swap space and paging) is used to determine the relative load of a member.

The client affinities feature allows you to define an ordered list of DB2 pureScale members to which a DB2 client can connect; different clients can implement a different ordered list. In certain situations, you might want to direct application requests from a DB2 client to a particular DB2 pureScale member on the list. If that DB2 pureScale member goes down because of a planned or unplanned outage, the DB2 client can direct the client application requests to another DB2 pureScale member on the list. If that member is unavailable, the DB2 client can work through the list of all DB2 pureScale members to find an available member. This feature is typically used in an environment where the applications and data are inherently segregated and particular servers are targeted to service requests of particular applications. With client affinities, you can also control whether application requests fail back to the failure primary server after it comes back online. The primary server is the DB2 pureScale member that the application originally connected to. If you set up the client in this manner, you can choose how often the DB2 client should check whether the primary server is back online.

Automatic client reroute (ACR) is a feature in DB2 clients that takes application requests that are directed toward an offline DB2 pureScale member and reroutes them to active DB2 pureScale members. ACR is automatically enabled with WLB or client affinities so no additional steps are required to specify which member the application should connect to upon encountering an outage. In some cases, after an outage, clients are seamlessly routed to another DB2 pureScale member, and no error messages are returned to the application because the failure is transparent to the application. For more details on when failures are seamless, see the DB2 Information Center information about seamless client reroute. However, in some situations, the DB2 client cannot replay the statement execution environment on the new connection after automatic client reroute occurs. In such a situation, the transaction is rolled back, and SQLCODE -30108 (or SQLCODE -4498 for Java applications) is returned to the application after the connection is rerouted from the failing member to a surviving member. If this occurs, applications must replay the statement execution environment and redo the statements, but applications do not have to explicitly reconnect to the database because ACR automatically reconnects the applications.

13

#IDUG

Online System and Database Maintenance

• Transparently perform maintenance to thecluster in an online rolling fashion

• DB2 pureScale fix packs (DB2 10.5)• System updates such as operating system

fixes, firmware updates, etc.

• No outage experienced by applications

• DB2 fix pack install involves a single installFixPack command to be run on each member/CF

• Quiesces member• Existing transactions allowed to finish• New transactions sent to other members

• Installs binaries• Updates instance

• Member still behaves as if running on previous fix pack level• Unquiesces member

• Final installFixPack command to complete and commit updates• Instance now running at new fix pack level

CFCF

13

System maintenance, such as hardware updates, operating system fixes, firmware updates, etc. can be performed in an online rolling fashion.

Previously, when applying DB2 pureScale fix packs it was necessary to completely stop the entire cluster so that the fix pack could be applied to all of the members and CFs. This is no longer the case in DB2 10.5 and you can perform this kind of maintenance without bringing the cluster down.

It is termed a "rolling update" because you can perform the maintenance on one host at a time, rolling through each of the members and CFs. During the application of the fix pack to a particular host, only one member or CF is offline but at least one other member and CF should still be online and so the cluster itself is still online and able to perform work on behalf of applications.

The newly enhanced installFixPack command is used to do the update of the binaries on an individual host by host basis. The installFixPack command also has new options for committing the changes and for doing a pre-commit check first.

14

#IDUG

Disaster Recovery Options for pureScale

• HADR• Replication between a database in a primary pureScale cluster

and a standby pureScale cluster with a matching member topology• Active/passive DR

• Storage Replication• Synchronous or asynchronous disk-based replication between a database in a primary

pureScale cluster and a standby pureScale cluster with a matching member topology• Active/passive DR

• Q Replication / InfoSphere Change Data Capture (CDC)• Logical replication between a pureScale database and a pureScale or non-pureScale

standby (bidirectional supported)• Can be active/active DR

• Geographically Dispersed pureScale Cluster (GDPC)• Single pureScale cluster "stretched" over two sites with half of members/CFs at each site• Active/active DR

14

These are various different disaster recovery options that are currently supported in pureScale. Given the variety of the options, there is a solution to meet the needs of all pureScale users.

15

#IDUG

Prerequisites and Preparing for pureScale

15

<no speaker notes>

16

#IDUG

Setting Up Your DB2 pureScale EnvironmentLike the Boy Scouts always say… "Be Prepared"

• The DB2 Information Center provides plenty of information including

• Pre-requisites for hardwareand software

• Preparation and planning steps• Pre-installation "cheat sheets"• Installation instructions

16

In this next section we'll cover some topics around installation and instance/database movement to pureScale. When preparing for and then subsequently performing these tasks, it is highly recommended that you read through the relevant sections of the Information Center. As shown in the screen capture on the slide, there is a great deal of information on these topics in there.

17

#IDUG

DB2 pureScale Supported Hardware and OS

GPFS compatible storage(ideally storage that supportsSCSI-3 PR fast I/O fencing)

IBM and Non-IBMRack Mounted Servers

OR

RHEL 5.9RHEL 6.1

SLES 10 SP4SLES 11 SP2

BladeCenterH22/HS23

High speed, low latencyinterconnect

• InfiniBand• 10 GE (RoCE)

POWER6POWER7/7+Flex

Flex

17

pureScale is supported on both IBM Power and Intel-x86 based systems. This chart shows the hardware and software requirements at a high level (current as of April 2014). For more information on the specific hardware requirements, please see the following sections of the Information Center:

Power/AIX: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r0054850.htmlIntel x86/Linux: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r0057441.html

18

18

#IDUG

Installing DB2 pureScale

• Components installed• Database members• Cluster caching facilities (CFs)• DB2 cluster services, which includes TSA, RSCT, and GPFS

• Methods• DB2 Setup Wizard (db2setup): User friendly GUI for interactive deployment

• Allows setup of a DB2 pureScale environment across multiple hosts• Ensures that all necessary components are installed and configured on all hosts according to best

practices• db2_install command: Command line install process (deprecated)• Response file: Automated install

• Install and instance creation includes• Installing binaries

• Occurs across all hosts – does not require installation separately on each one• All components are installed and configured as part of the single install process

• Cluster domain created• sqllib_shared file system created and mounted on all hosts

When installing DB2 pureScale it will do the installation and setup of the database member(s), CF(s), and DB2 Cluster Services (which includes TSA, RSCT, and GPFS).

You can install via the db2setup tool (interactive setup wizard GUI or response file) or the db2_install command. Note that the db2_install command has been deprecated and so it might be removed in a future release. It is suggested to use db2setup with a response file as an alternative.

If you want to setup your instance with more than one CF and member to start with then the GUI is a good method to use. It is also very easy to use.

Install processing involves installing the binaries, which are copied to all hosts in the cluster, creating the cluster domain, and created the shared file system on which the instance shared files go (assuming you're performing an install that is also creating an instance at the same time).

19

#IDUG

Check Prerequisites using db2prereqcheck

• Checks whether your system meets prerequisites for the installation of a specific version of DB2 before installing it

• Including the pureScale prerequisites• Includes OS level, Linux distribution, AIX technology level, C library, and uDAPL

• Prerequisites contained within an XML file• Contains prerequisites for DB2 9.8, 10.1, and 10.5• Command and file located in <installPath>/cfg/DB2prereqs.xml

• Examples

db2prereqcheck –i –p

db2prereqcheck –p –v 10.5.0.0

db2prereqcheck –u –v 10.5.0.0

Check pureScale requirements for latest version of DB2 described in XML file

Check pureScale requirementsfor DB2 10.5.0.0

Check only pureScale uDAPL requirements for DB2 10.5.0.0

19

The db2prereqcheck command checks whether your system meets the prerequisites for the installation of a specific version of DB2 for LUW, including the prerequisites for the pureScale feature. By using this command, you can determine whether your system satisfies the prerequisites before you start the installation process. The prerequisites checked include the operating system level, Linux distribution, AIX Technology Level, C library and runtime patches, uDAPL, and other DB2 pureScale specific requirements. Note that it isn't exhaustive. Some things may still need to be checked manually.

The db2prereqcheck command uses a resource XML file that contains the prerequisites, covering DB2 9.8 FP2 up to the latest fix pack of DB2 10.5. The file will be updated with each version and fix pack. The default path of the XML file is located in <DB2 installation path>/cfg/DB2prereqs.xml. You should never modify the contents of the XML file.

You use the –p option to check the prerequisites for pureScale. You can specify the –i option to specify that you want to check the prerequisites for the latest DB2 version that is defined in the resource XML file. If you want to check for a specific version then you can specify that via the –v option.

20

#IDUG

Pre-Installation Cheat Sheets

AIX: http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.qb.server.doc/doc/r0056077.html Linux: http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.qb.server.doc/doc/r0057204.html

20

Within the pre-installation checklist sections of the Information Center (links shown here) there are a couple of cheat sheets that can be used to help identify and record installation-related information prior to the installation process.

21

#IDUG

Converting an Existing DB2 Instance to pureScale

• Update the instance to a DB2 pureScale instance using one of db2iupdt (command line) or db2isetup (GUI)• e.g. db2iupdt –d -m coralpib154 –mnet coralpib154-ib0

-cf coralpib153 –cfnet coralpib153-ib0-instance_shared_dev /dev/hdisk2-tbdev /dev/hdisk4 demoin1

> cat sqllib/db2nodes.cfg0 coralpib154.torolab.ibm.com 0 coralpib154-ib0 - MEMBER128 coralpib153.torolab.ibm.com 0 coralpib153-ib0 - CF

• If using db2iupdt, instance will start with one CF andone member• Add additional members and a second CF using db2iupdt -add

21

To convert the instance to a DB2 pureScale instance, you can use either the db2iupdt command or db2isetup (GUI). You must run them as root.

The example on this slide shows how you could use db2iupdt. Note that you can specify either “-instance_shared_dev” or “-instance_shared_dir”. If you have not yet created a GPFS file system for the instance shared file system (sqllib_shared) then specify the first option and a disk device. If you have already created a GPFS file system for it using db2cluster_prepare then use the second option along with the path name.

At this point after having used db2iupt, the instance will have one CF and one member. Typically you will want two CFs and at least two members, so add members and a CF as necessary.

22

#IDUG

Moving Existing Databases to pureScale

• DB2 pureScale database prerequisites• All table spaces must be managed by Automatic Storage• Database and logs must reside on GPFS

• Can setup file systems in advance and move database to it • If pureScale not yet installed, use db2cluster_prepare command to create DB2 managed

GPFS file systems

• Run db2checkSD to verify that a database is ready to be movedto pureScale• Verifies that unsupported features are not used, all table spaces are automatic

storage, etc.• For example: db2checkSD SAMPLE -l db2checkSD.log

• Able to backup a non-pureScale database and restore into pureScale• Source and target version of DB2 must both be at 10.5

pureScale

22

In pureScale, the database and logs must reside on a GPFS file system (so that it can be shared across multiple hosts). The database needs to be moved to GPFS before performing the upgrade – and can be done at any point in time prior to this. The db2cluster_prepare tool can be used to create a DB2 managed GPFS file system in advance of installing pureScale. Create as many file systems as is necessary (for database storage paths and the log files). Note that the GPFS licensing that comes included with pureScale only allows it's use with pureScale. You are not licensed to create GPFS file systems for non-pureScale purposes.

You can run the db2checkSD command against a database before moving it to pureScale to determine whether you must take actions before using the database in a DB2 pureScale environment. The db2checkSD command generates a script file containing information about any issues found.

In DB2 10.5 you can backup and restore between non-pureScale and pureScale instances (at the same version level – you cannot restore a DB2 10.1 non-pureScale backup image into a DB2 10.5 pureScale instance).

23

#IDUG

Migrating Databases to Automatic Storage

• Enable database for automatic storage by creating a storage group• Existing table spaces are not impacted at this point

• Drop temporary table spaces and recreate as AS

• Convert DMS table spaces to AS using one of• ALTER TABLESPACE• Redirected restore of table spaces

• Note that SMS cannot be converted to AS• Fundamentally different architectures• Must rebuild the table space (e.g. ADMIN_MOVE_TABLE)

• If SYSCATSPACE is SMS then two options exist• Rebuild the database (could use db2look and db2move)• Might be able to use Transportable Schema feature to move all existing table spaces into

a new automatic database (with AS SYSCATSPACE)

23

DB2 pureScale requires that all databases are using automatic storage (AS) exclusively. That means that every table space in a pureScale database must be using automatic storage.

This slide is for those who plan on moving a database to pureScale but the database isn't using automatic storage yet. If the database was created in DB2 9.1 or any release since then, automatic storage is on by default – even if no table spaces are using it. For those databases that were created earlier than that or where automatic storage was specifically not enabled at database creation time, it is very simple to enable it. Starting with DB2 10.1, this can be done by creating a storage group for the database (which becomes the default storage group for the database).

For all temporary table spaces, you must drop them and recreate them as automatic storage.

DMS table spaces can be converted to automatic storage very easily – and in an online way. The ALTER TABLESPACE statement is used to do the conversion and to subsequently rebalance the data from the old non-AS containers to the new AS ones. Alternatively, if a restore is being performed for a database where automatic storage is enabled, you can do a redirected restore of the non-AS table spaces, converting them to AS in the process.

SMS table spaces cannot be converted to automatic storage as they have fundamentally different architectures under the covers. For these table spaces you must rebuild them, potentially using something like online table move (ADMIN_MOVE_TABLE).

However, if your system catalog table space (SYSCATSPACE) is SMS then you can't just rebuild it. In this case you must recreate the database, potentially using tools like db2look and db2move to help make it easier. Another alternative – if none of the documented restrictions come into play – is to create a new empty database, which will be enabled for automatic storage, and then use DB2's transportable schema feature. With this feature, table spaces can be moved from the existing database (via a backup image) into the new database. See the Information Center for details on this feature and things that might restrict using this as an option here.

24

#IDUG

Configuration

24

<no speaker notes>

25

#IDUG

Database Member Process Model

• DB2 engine address space• i.e. a db2sysc process and its threads• Watched by the watchdog (db2wdog) process

• Similar to single partition DB2 server• Contains local buffer pools, lock list, database heap,

log buffer, sort heap, etc.• Various EDUs such as agents, page cleaners, prefetchers,

log writer, agents, etc.

• pureScale-specific EDUs exist, including somerelated to lock management, CF interactions, etc.

• Use db2pd –edus to see list of EDUs running ona member

• No concept of a “catalog node/partition”• All data (including system catalog tables) accessible

by all members

db2 agents andother threads

bufferpool(s)

log buffer, dbheap, andother heaps

Pri CF

CFCS

MemberCS

MemberCS

MemberCS

Sec CF

CFCS

Member

25

A database member in pureScale is very much like a single instance in a single partition DB2 server in that there is a db2sysc process and within it various threads are running (e.g. agents, page cleaners). Within the memory space of the process are the typical memory heaps and structures that you would find in DB2 including buffer pools, the lock list, database heap, etc.

There are also various new EDUs (Engine Dispatchable Units – or threads) that have been introduced to support a pureScale environment. Most notably are those associated with lock management and CF interactions. It isn't necessary to get into all of the details around these new EDUs here, but for more information you can look at the “DB2 Process Model” section in the Information Center.

The “db2pd –edus” command can be used to display all of the EDUs associated with the db2sysc process for a member.

In a DPF environment there is the concept of a catalog node/partition. However, this does not exist in pureScale as each member has equal access to the shared data on disk.

26

26

#IDUG

Configuring Members

• Database manager and database configuration parameters are categorized as either having a global or local (per-member) scope

• Global parameters• Same value must be used across all members• Used for parameters for which per-member values would have little value

and/or may create functional issues• e.g. CF_DB_MEM_SZ, ALT_COLLATE, PAGE_AGE_TRGT_GCR

UPDATE DATABASE MANAGER CONFIGURATIONUSING <parameter name> <parameter value>

UPDATE DATABASE CONFIGURATION FOR <database name>USING <parameter name> <parameter value>

In pureScale there are some database configuration parameters that have a global scope and others that have a local per-member scope. Even for per-member parameters, if a specific member number isn’t provided as part of the UPDATE DATABASE CONFIGURATION or UPDATE DATABASE MANAGER CONFIGURATION command then it is appliedglobally (i.e. to each member).

Typically, those parameters that are global are those for which per-member values would have little value and/or might create a functional issue.

Global database configuration parameters are stored in the global database configuration file (which is located in the global database path).

27

27

#IDUG

Configuring Members (cont.)

• Local (per-member) parameters• Allows for per-member settings where different resources per member, or

affinitized workload tuning may be useful• e.g. instance memory, database heap size

• Parameter updates are applied to all members by default (second example below)• Kept consistent unless otherwise overridden

• Use the MEMBER option to override a value for a specific member:

UPDATE DATABASE CONFIGURATION FOR <database name>MEMBER <member num> USING <parameter name> <parameter value>

• Can still apply a change to all members at the same time:

UPDATE DATABASE CONFIGURATION FOR <database name>USING <parameter name> <parameter value>

Local database configuration parameters can be set on a per-member basis (i.e. they can have different values across the different members). This allows for member-specific settings where affinitized workload tuning might be useful. For example, setting the size of the sort heap or the package cache. This is done by specifying the MEMBER option and a member number on the UPDATE DATABASE CONFIGURATION command.

By default, if the MEMBER option is not specified as part of the UPDATE DATABASE CONFIGURATION command then all members will be updated. This provides single-system view database configuration management

Per-member database configuration parameters are stored in the local database directories and there is one configuration file per member.

Note that the GET DATABASE CONFIGURATION command returns information only for the member on which it is executed.

28

#IDUG

Cluster Caching Facility (CF)

• Software technology that assists in global buffercoherency management and global locking

• Shared lineage with System z Parallel Sysplex

• Software based

• Services provided include • Group Bufferpool (GBP)• Global Lock Management (GLM)• Shared Communication Area (SCA)

• Members duplex GBP, GLM, SCAstate to both a primary and secondary

• Done synchronously• Set up automatically, by default• Having a secondary CF is optional (but recommended)

Pri CF

CFCS

MemberCS

MemberCS

MemberCS

Sec CF

CFCS

Cluster CachingFacility (CF)

SCA

CF worker threads

GLM

GBP

28

This slide describes what a CF (Cluster Caching Facility) is in a DB2 pureScale cluster. It’s primary responsibilities are to manage locking across the cluster (via the GLM) and to manage data page access across the members in the cluster (via the GBP). But it has various other roles and responsibilities as well.

It is recommended (especially for production systems) to have two CFs defined (a primary and a secondary) so that there is no single point of failure in case a planned or unplanned outage occurs.

29

29

#IDUG

Configuring the CF• CF configuration is managed through traditional DB2 configuration interfaces

• GET/UPDATE DB/DBM CFG commands and APIs• Parameter values applied to both primary and secondary CFs

• CFs should be hosted on equally configured hardware/LPARs, so no need for different values

• Two configuration categories• CF server configuration (maintained through DBM configuration)• CF structure configuration (maintained through DB configuration)

• CF Server Configuration

AUTOMATIC

(based on # members, workers, etc.)Initial size of CF connection pool for each memberCF_NUM_CONNS

AUTOMATIC(typically 70-90% of machine memory)

Controls total amount of memory used by the CFCF_MEM_SZ

AUTOMATIC(# cores – 1)

Number of worker threads started by the CFCF_NUM_WORKERS

NULL(<INSTHOME>/sqllib/db2dump/ $m)

Fully qualified path for the cfdiag.*.log filesCF_DIAGPATH

2(all errors)

Specifies types of diagnostic errors that will be recorded in the cfdiag.*.log file

CF_DIAGLEVEL

Default ValueDescriptionDBM Parameter

The CFs are configured using the traditional DB2 configuration interfaces. There are two configuration categories: CF server and CF structure. The CF server configuration can be updated using UPDATE DBM CFG and the CF structure configuration can be updated using UPDATE DB CFG.

The CF server configuration parameters are listed on this slide.

The default value for the CF_DIAGPATH in DB2 9.8 was NULL, which meant to use the value of DIAGPATH. And by default that was "<INSTHOME>/sqllib/db2dump". In DB2 10.1, the default for both DIAGPATH and CF_DIAGPATH have changed to "<INSTHOME>/sqllib/db2dump/ $m" (which evaluates to "<INSTHOME>/sqllib/db2dump/DIAG#####" where "####" is the member/CF number). Starting in DB2 10.1, the CF diagnostic data directory path writes to a private db2diag.log for each CF by default. To revert to the behavior of previous releases, in which the diagnostic data for the CF is written to the same directory, specify CF_DIAGPATH with a pathname and no token.

The default CF_NUM_WORKERS value (if AUTOMATIC) is the number of logical CPUs (cores) – 1. Note that on Power, each hardware thread is seen to the OS as a CPU core. For very small CF configurations, recovery time performance can be helped by having 2 free hardware threads on the CF (i.e. CF_NUM_WORKERS = (logical CPUs – 2)).

The default for CF_MEM_SZ value (if AUTOMATIC) is 70%-90% of the total available memory on the CF (depends on whether CF and members co-exist).

When you set CF_NUM_CONNS to AUTOMATIC (the default), DB2 creates an initial number of CF connections for each member with each CF at start time. This initial number is based on the number of worker threads, number of connections per worker thread, and the number of members in the cluster.

30

30

#IDUG

Configuring the CF (cont.)

• CF Structure Configuration

AUTOMATIC

(based on CF_MEM_SZ and # activedatabases settings)

Total amount of memory used by the CF (includes GBP+GLM+SCA)

CF_DB_MEM_SZ

AUTOMATIC(Remainder of memory from CF_DB_MEM_SZ)

Amount of memory used by the Group Buffer Pool (GBP) in the CF

CF_GBP_SZ

15Target time in minutes for catch up to bring a newly restarted CF into peer state

CF_CATCHUP_TRGT

AUTOMATIC(5-20% of CF_DB_MEM_SZ)

Amount of memory used by the Shared Communication Area (SCA) in the CF

CF_SCA_SZ

AUTOMATIC(15% of CF_DB_MEM_SZ)

Amount of memory used by the Global Lock Manager (GLM) in the CF

CF_LOCK_SZ

Default ValueDescriptionDBM Parameter

The CF structure configuration parameters are listed on this slide. These parameters specify the size of the various structures that get created in the CF when a database is activated.

The cluster caching facility structure memory used for Group Buffer Pool (GBP), lock usage (GLM), and Shared Communication Area (SCA) is allocated for the cluster caching facility during the first database activation on any member and remains allocated until deactivation on the last member. These parameters have a default value set to AUTOMATIC. When set to AUTOMATIC, DB2 computes appropriate sizes for these parameters during database activation. Because these values are closely related and dependent on one another, manually setting at least one of the parameters causes none of the parameters to be calculated during activation even if some parameters remain set to AUTOMATIC. Their values are what the most recent automatically calculated values were.

The ONLINE option is also supported for structure parameters. Any updates to CF memory parameters are applied immediately. Update requests are synchronous and are not returned until the new value is set by the CF server.

31

#IDUG

Castout• The process of writing dirty pages from the GBP

out to disk is called castout• Similar in concept to page cleaning

• Two purposes• Maintain a specific recovery window by ensuring that

no pages in the GBP are older than a certain age• To keep GBP from getting full, allowing free space for

new pages being stored there

• Page writes not performed by the CFs directly• Members sent pages to castout from CF via RDMA

(into private buffers on the members)• Specialized page cleaner threads running on the

members write the pages out to disk

• Configuring castout:• NUM_IOCLEANERS: Number of castout page

cleaner threads per member (default is AUTOMATIC)• PAGE_AGE_TRGT_GCR: Age of pages in GBP

before castout to disk (default 240 seconds)

GBP

CF

Bufferpool(s)

Member

Local Castout Buffers

Castout Page Cleaner Threads

31

Bufferpool(s)

Member

Local Castout Buffers

Castout Page Cleaner Threads

Castout is the process of writing dirty pages from the GBP out to disk. If you're familiar with page cleaning in non-pureScale DB2 where the page cleaners write out dirty pages from the buffer pool(s) to disk then it's easy to understand what castout is. However, rather than cleaning the pages out to disk from the local buffer pools (which is still done via page cleaning in pureScale), castout is the writing of the pages from the GBP to disk.

Castout is important for two reasons: 1) To write out dirty pages to disk and to ensure there are enough clean directory entries and data elements in the GBP to use for new page registrations and writes. 2) To maintain a specific recovery window (for Group Restart) by ensuring that no pages in the GBP are older than a certain age. This reduces the number of log records that must be replayed in the case of a Group Crash Recovery during Group Restart processing.

The page writes are not actually performed by the CFs directly. Instead, pages are passed from the CF to the members via RDMA and it is the members that do the write I/O. The local buffer pools are not used during this process. Instead, dedicated memory buffers within the members are used. Special page cleaner threads called "castout engines" are used to do the I/O.

The number of threads used in castout per member is determined by the NUM_IOCLEANERS database configuration parameter. When this is set to a value (or DB2 derives a vale for AUTOMATIC) then there are this many regular page cleaners and this many castout page cleaners created at database activation time.

PAGE_AGE_TRGT_GCR will be covered in more detail later.

32

#IDUG

Backup, Logging and Recovery

32

<no speaker notes>

33

#IDUG

pureScale Recovery Basics

• Data is shared, but each member maintains its own set of logs• Commonly referred to as a log stream

• Logs must be on the clustered file system• Members only write to their own log stream, but can read from others

• For example: during merged log recovery

• Failures may require memberrecovery or group recoveryfor database

• Single system view backup,restore, and rollforward

DB2 DB2 DB2

SharedData

Logs Logs Logs

Clustered File System

33

In a pureScale database there is a single partition of data that each member has access to and can modify. However, each member has its own dedicated set of log files, commonly referred to as a log stream. Like the data in the database, the logs must reside on a GPFS clustered file system – and it’s recommended that the logs be on their own file system, separate from the data. Each member will only ever write to its own set of log files but it may need to read the logs from other members. For instance, during a database rollforward or group crash recovery where log merging takes place, the log files from all members are read by a single member to perform the operation (log merging will be described in more detail later on in this presentation).

Different types of failures can occur while a pureScale database is up and running (e.g. power outage to the cluster, hardware failure on one machine). Recovery from failures may involve just recovering and replaying through the logs of one member or of all the members in the cluster. This too will be discussed in more detail later on.

Unlike in DPF where commands like BACKUP and RESTORE are performed on individual nodes (database partitions), in pureScale you just execute these commands from any member and work is performed against the entire database.

34

#IDUG

Backup and Restore

• No differences in command syntax or usage in pureScale• Continue to utilize autonomic settings for buffers and parallelism• Can continue to take advantage of compression, including logs, etc.

• Single system view backup and restore operations are executed on and performed by a single member

• Unlike DPF where backup needs to be done on each database partition (remember, there’s only one data partition in pureScale)

• BACKUP command can be executed from any member• Can be as simple as: db2 backup database proddb

• RESTORE of an image can be done on any member• Can be as simple as: db2 restore database proddb on …

• Can set UTIL_HEAP_SZ by member if dedicated backup member

• 3rd party backup products require that backups/log archives can be retrieved by any host in the cluster

• Use proxy nodes with TSM

34

There are no differences in the command syntax of BACKUP and RESTORE or in how they are used between non-pureScale DB2 and DB2 with pureScale. If you have experience running these commands outside of pureScale then you’ll know how to use them in pureScale.

If you are familiar with backup and restore in a DPF environment, you will know that you have to backup each database partition (although we make this easy through the “single system view” feature of backup by allowing you to backup all partitions using a single command – but a backup image is in fact generated for each of the partitions). In pureScale, you only have to run the BACKUP command on one member (any of them) and the resulting image will contain everything for that database. And if you ever have to restore the database, you can execute the RESTORE command on any of the members as well… it doesn’t have to be the one where the backup image was generated.

35

#IDUG

Backup and Restore (cont.)

• Single backup image gets generated for the database and includes• Description of members (topology) at time of backup• Global/cluster metadata files (e.g. recovery history file, global config files)• Member-specific metadata files for every member (e.g. member LFH files, local

config files)• Log files from each active member for online backups• Table space data

• Only one copy/partition of data exists, regardless of the number of members

DB2Member

DB2Member

BACKUP DATABASE TESTDBTESTDB.0.db2inst.DBPART000.20120922191905.001

Shared DataLogs Logs

LocalConfig

LocalConfig

GlobalConfig

35

The backup image that gets generated contains everything associated with the database including the data itself, a description of the topology of the database, global/cluster metadata files, and per-member metadata files. It doesn’t matter on which member the database is backed up or restored since we have everything we need within the backup image.

36

#IDUG

Topology-Changing Backup and Restore• Backup and restore between topologies with differing numbers of members

Data

Member 0 Member 1 Member 2 Member 3

CFCF

4 member instance

Data

Member 0 Member 1

CFCF

2 member instance

BackupImage

Online Backup

Restore

Data


CFCF

4 member instance

Data

Member 0 Member 1 Member 2

CFCF

3 member instance

Backup

Restore

BackupImage

To superset of members

To subset of members

36

You can restore a pureScale database backup to a different number of members. Also, you can restore a non-DB2 pureScale backup image to a DB2 pureScale instance (and vice-versa). The next slide covers the latter.

All of this applies to snapshot backups as well (keeping in mind that snapshot backups are typically online, but can be taken offline).

37

#IDUG

Backup and Restore To/From pureScale• Backup and restore from pureScale to non-pureScale (and vice-versa)

Data


CFCF

4 member pureScale instance

Data

DB2

Non-pureScale instance

Data

Member 0 Member 1 Member 2

CFCF

3 member pureScale instance

Data

DB2

Non-pureScale instance

Backup

Restore

BackupImage

Backup

Restore

BackupImage

To non-pureScale

To pureScale

37

You can also restore a non-DB2 pureScale backup image to a DB2 pureScale instance (and vice-versa). In the case of restoring from non-DB2 pureScale to DB2 pureScale, per the DB2 pureScale prerequisites the database must be using automatic storage for all of the table spaces (restore will fail otherwise). The target DB2 pureScale storage must be on GPFS but it does not matter what kind of file system was being used on the original non-pureScale source system.

This top example on this slide shows a situation where we are moving a database from a pureScale instance to a non-pureScale instance. The bottom example shows the reverse of this.

38

#IDUG

Set Write Suspend and Snapshot Backups

• SET WRITE SUSPEND and SET WRITE RESUME• Suspends and resumes DB2 writes to the database

• Executed on one member, writes are suspended across all members

• Can use snapshot as a backup image, clone, or standby database• Works in conjunction with db2inidb command

• Additional GPFS and db2cluster steps are required on top of SET WRITE SUSPEND/RESUME• See Information Center for full list of steps

• State available through database configuration parameter: SUSPEND_IO• Values are YES, NO, or IN_PROGRESS

• History file record is not generated for backup

38

The SET WRITE SUSPEND and SET WRITE RESUME commands are used as part of the process for taking snapshot (a.k.a. split mirror or flash copy) backups of the database. When the SET WRITE SUSPEND command is issued on a member it distributes the request to all of the members in the cluster, stopping writes to the database across all of those members. SET WRITE RESUME works in a similar way to reverse the process, allowing writes to be done against the database again.

In a pureScale environment there are other steps that must be followed to perform a snapshot backup and subsequently use that copy of the database for the purposes of a clone, standby image, or as a backup. These steps are documented in various sections in the Information Center.

In older versions of DB2, you could tell if writes were suspended by looking at table spaces states. Now, there is a database configuration parameter called SUSPEND_IO. This is an informational parameter (i.e. it cannot be explicitly set using UPDATE DB CFG) and will show one of YES, NO, or IN_PROGRESS.

39

#IDUG

BACKUP DATABASE PRODDATA USE SNAPSHOT SCRIPT '/scripts/snapshot.sh'

RESTORE DATABASE PRODDATA USE SNAPSHOT SCRIPT '/scripts/snapshot.sh'TAKEN AT 20140307183200

Snapshot Backup Scripts

• Allows for integrated snapshot backup capabilitiesfor those storage devices not supported byDB2 Advanced Copy Services (ACS)

• Custom script implements the DB2 ACS API• Users or storage vendors can write their own scripts• Write operations to the database are automatically suspended and resumed by DB2

during the backup process

• Benefits include• Wider storage support• Avoids need for manual snapshot backup process in pureScale

• Manually running SET WRITE SUSPEND, SET WRITE RESUME, db2inidb, and storage vendor commands can be error prone

• History file record is generated

39

If you were performing a snapshot operation in DB2 10.1, you either had to use storage hardware that provided a vendor library that supported the DB2 ACS API (for non-pureScale only), or you had to write your own script -- which included having to suspend and resume writes to the database and call the underlying storage commands to take the snapshot. There are some drawbacks to writing a script like this. For instance, they can be difficult and error-prone to write, especially in regards to suspending and resuming database operations. Also, they do not generate a history file entry, so you cannot monitor the progress and success of the snapshot operation.

In DB2 10.5, these trade-offs have been eliminated. The DB2 ACS API is now wrapped in the library for DB2 ACS. The library invokes a custom script to perform the snapshot operation. DB2 takes over the error-prone actions like issuing the SET WRITE SUSPEND, SET WRITE RESUME, and db2inidb commands at the correct time. At the same time, because the DB2 ACS API is being used as part of a true DB2 backup operation, an entry is made in the recovery history file for every snapshot operation, allowing you to monitor successful and unsuccessful backups.

We document the API requirements within the Information Center and it is possible for DB2 users to write their own scripts. However, it is suggested that people reach out to their storage vendor to have them write and provide a script instead.

40

#IDUG

Logging Configuration Parameters

• In pureScale, each member maintains its own set of log files

• Most logging related parameters are global in scope• The following parameters have member (local) scope

• BLK_LOG_DSK_FUL, LOGBUFSZ, MAX_LOG, NUM_LOG_SPAN

• Log paths are global but each member has its own subdirectory within it• <logPath>/NODE0000/LOGSTREAM####• Applies to the log path, mirror log path, and overflow log path

• Default log path is in the global database directory• <dbPath>/<instance>/NODE0000/SQL#####/LOGSTREAM####

40

In a pureScale cluster, there is a single partition of data but each member maintains its own set of log files.

Most of the logging related parameters are global in scope. This means that you can not set them to different values on different members. However, there are a few (shown above) that have a member scope and can be updated to different values across members (UPDATE DB CFG FOR <dbName> MEMBER <#> …)

As previously mentioned, each member has its own set of log files (i.e. log stream). You specify a single log path for the cluster but a subdirectory is created under that log path for each log stream. The same is true for the mirror log path and the overflow log path configuration parameters.

Prior to pureScale, the default log path was <dbPath>/<instance>/SQL#####/SQLOGDIR. This was changed in 10.1 to what is shown on the slide (even for non-pureScale instances of DB2).

41

#IDUG

Log File Management• Log files are archived independently on each member

• Archive targets have member-specific directories/name space• <ArchPath>/<instance>/<databaseName>/NODE0000/LOGSTREAM####

• Member performing a log merge operation (e.g. rollforward) retrieves logs from all members as needed

DB2Member

DB2Member

ARCHIVE (/archivePath)

LOG PATH (/logPath)

/logPath/NODE0000/

LOGSTREAM0000/S0000000.LOGS0000001.LOGS0000002.LOG...

LOGSTREAM0001/S0000000.LOGS0000001.LOGS0000002.LOG...

/archivePath/<instance>/

<databaseName>/NODE0000/

LOGSTREAM0000/C0000000/

S0000000.LOGS0000001.LOGS0000002.LOG

LOGSTREAM0001/C0000000/

S0000000.LOGS0000001.LOGS0000002.LOG

Member 0

Member 1

41

In a pureScale cluster, there is a single partition of data but each member maintains its own set of log files. Assuming that log archiving is enabled, as log files are filled on a member, they are archived independently of what is happening on the other members. If the archive location is a disk location then there will be a directory per log stream (LOGSTREAM####). In the case of an archive location like TSM, there is a database partition number that makes up part of the name space and in pureScale this value represents the log stream (member) number.

For log merge operations (such as rollforward), when reading archived log files that are owned by other members, a member might need to retrieve log files into its own log path or overflow log path. A set of subdirectories is created in the member’s log path for retrieved log files.

42

#IDUG

Member vs. Group Crash Recovery

• Two types of crash recovery (determined by DB2, based on state of cluster)

• Member crash recovery (MCR)• When one or more members fail with at least one CF remaining available• Only requires one log stream (per recovering member) to perform recovery• All data available during MCR (except in flight data)

• Group crash recovery (GCR)• Simultaneous failure on both CFs• Similar to crash recovery without pureScale except that log streams are merged

• Database will open for connections when recovery completes

db2syscCFp

CFs

db2sysc

Log Log

db2syscCFp

CFs

db2sysc

Log Log

CFp

CFs

db2sysc

Log

db2sysc

LogLog

db2syscCFp

CFs

db2sysc

Log Log

db2sysc

Log

db2sysc

Log

CFp

CFs

db2syscdb2sysc

Log Log

CFs

CFp

CFs

CFp

Log Log

42

This slide summarizes the differences between the two types of crash recovery in DB2 pureScale: member crash recovery (MCR) and group crash recovery (GCR).

MCR occurs when one or more members fail with at least one CF remaining available. It only requires reading and recovering through one log stream (per member requiring recovery). The database is online and accessible on other members. All data is available on those other members except for in-flight data (data that was in the process of being inserted/updated/deleted when the member went down).

GCR occurs when there are simultaneous failures of both CFs. This is similar to crash recovery in non-pureScale environments in that the database is offline and inaccessible. However, in pureScale all of the log streams are read and merged before replay occurs. Once this is done the database is open for business. Note that for a GCR to occur, it requires a double failure – both CFs (which should be on different hosts) coming down – and this should be a very rare occurrence.

43

#IDUG

Tuning Crash Recovery in pureScale

• Force at Commit protocol means that pages are typically being "persisted" to GBP much more quickly than to disk in non-pureScale

• Member crash recovery typically very fast as a result

• Group crash recovery impacted by rate of castout (page cleaning) from GBP to disk

• SOFTMAX database configuration parameter deprecated in DB2 10.5• Replaced by PAGE_AGE_TRGT_MCR and PAGE_AGT_TRGT_GCR• SOFTMAX=0 means use new parameters (set by default for new databases)

• PAGE_AGE_TRGT_MCR• Target duration (in seconds) for changed pages to be kept in the local buffer pool before

being persisted to disk or to the group buffer pool (GBP)

• PAGE_AGE_TRGT_GCR• Target duration (in seconds) for changed pages to be kept in the GBP before being

persisted (castout) to disk

43

Member crash recovery requires reading through the log files for the member that failed, redoing log records in the member's recovery window to ensure that committed changes persisted, and undoing log records to rollback transactions that were still running at the time of failure. Member crash recovery is typically very fast. One of the reasons for this is pureScale's "force at commit" protocol. When transactions are executing on a member, pages are being read into the local buffer pool and modifications are made locally. When a transaction commits, all of the pages that were modified by the transaction get sent to the GBP (on both the primary and secondary CF). This means that during redo processing of member crash recovery, for all transactions that have committed we are going to find that the updates have already been made and there is no need to actually redo the work. Plus, we are quite likely going to find the pages we need to look at in the GBP and the member can get them very quickly via RDMA – which is much faster than reading pages from disk. For both of these reasons, member crash recovery is typically very fast and there's not a lot of configuration needed to control it (for longer transactions, PAGE_AGE_TRGT_MCR comes into play more – see below).

Externally, a group crash recovery is like crash recovery in non-pureScale in that it redoes all committed transaction updates that have not yet been written to disk (remember, if we're doing a GCR then we've lost the contents of the GBP and that’s why GCR is needed) and it undoes all of the work associated with in-flight (uncommitted) transactions. Internally, though, there are a few differences. For one, the logs from all of the members’ log streams are merged into a single logical log stream that is replayed during the redo phase (this merging takes place in memory and does not get written to disk). And at run-time, while individual members maintain their own MinBuffLSN values, there is a cluster-wide global MinBuffLSN that is also maintained. Also, there is a concept of a cluster-wide current LSN, which is essentially the highest LSN that has been consumed across the cluster. These values can then be used to determine the range of log records in the merged log stream that corresponds to pages that haven’t been written out to disk yet (i.e. dirty pages that are currently sitting in the GBP or in local buffer pools). In pureScale, it is this range within the merged log stream that gets compared to SOFTMAX (deprecated) / PAGE_AGE_TRGT_GCR to determine when to persist the changes to disk.

When the global MinBuffLSN falls outside of that SOFTMAX / PAGE_AGE_TRGT_GCR range, the old pages in the GBP need to be written out to disk. This is performed by castout threads (also known as castout page cleaners or castout engines) running on the members. They read the old pages from the CF and write them out to disk.

PAGE_AGT_TRGT_MCR configures the target duration (in seconds) for changed pages to be kept in a local buffer pool before being persisted to disk or to the group buffer pool (GBP) (via page cleaning). PAGE_AGE_TRGT_MCR applies to non-pureScale DB2 as well. In pureScale, the default value for PAGE_AGT_TRGT_MCR is 120 seconds.

PAGE_AGE_TRGT_GCR configures the target duration (in seconds) for changed pages to be kept in the GBP before being persisted (castout) to disk. PAGE_AGE_TRGT_GCR is applicable to pureScale only. The default value for PAGE_AGE_TRGT_GCR is 240 seconds.

44

#IDUG

Archive Location

Rollforward

• ROLLFORWARD DATABASE command can be executed on any member• All processing performed by that single member• An interrupted or stopped rollforward it can be started again from any member• Applies to both database rollforward and table space rollforward

• Log files retrieved from archive location andmerged for replay purposes

DB2Member

DB2Member

ROLLFORWARD DATABASE

Database(shared data)

Log Stream 0 Log Stream 1

Redo

44

Database and table space-level rollforward are both supported.

When issuing a ROLLFORWARD DATABASE command it can be done from any member. All processing will be performed by that single member and if it happens to get interrupted or stopped then it can be started again from any other member (or the same one, it doesn’t matter).

Logs are merged for replay purposes to ensure that all of the work done across the cluster is replayed and in the correct order. Log files will be retrieved from the archive location if they are not local.

45

#IDUG

Rollforward (cont.)

• Rollforward to point-in-time, end-of-logs, or end-of-backup• Point-in-time operation stops when it encounters the first log record from any

of the log streams whose timestamp is greater than the specified time stamp• Important that member clocks are synchronized as close as possible

• For table space rollforward, point-in-time must be greater than or equal to the minimum recovery time (MRT) for the table space

• Rollforward status shows logs replayed per member (log stream)

Input database alias = TESTDBNumber of members have returned status = 3

Member Rollforward Next log Log files processed Last committed transactionID status to be read------ ----------- ------------ ------------------------- --------------------------

0 DB working S0000014.LOG S0000005.LOG-S0000013.LOG 2014-03-12-14.39.23.000000 UTC1 DB working S0000004.LOG S0000002.LOG-S0000003.LOG 2014-03-12-14.39.18.000000 UTC2 DB working S0000011.LOG S0000001.LOG-S0000010.LOG 2014-03-12-14.39.25.000000 UTC

45

You can rollforward a database to a point-in-time, to end-of-logs, or to end-of-backup. A point-in-time operation will stop when it encounters the first log record from any of the log streams whose timestamp is greater than the specified timestamp. The caveat here is that the member clocks should be synchronized as close as possible across the cluster. If one of the members is actually significantly ahead of the rest then it’s possible that you might not actually get as far as you would like. The use of NTP (Network Time Protocol) with pureScale ensures that the machine clocks are kept in very close synch across the cluster.

When a rollforward completes, or when querying the rollforward status of a database, you will see the status from each member log stream. This includes the log files processed and the last committed transaction encountered. This is similar to what is shown in a DPF environment when rolling forward multiple database partitions.

46

#IDUG

Storage Management

46

<no speaker notes>

47

#IDUG

IBM General Parallel File System (GPFS) and pureScale

• GPFS is a scalable, highly-available, high performance filesystem optimized for multi-petabyte storage management

• Shipped with, installed and configured as part of pureScale• DB2 pureScale license includes GPFS for use with cluster

• GPFS provides concurrent access from all hosts in the instance to instance meta data and database files on disk• Provision shared LUNs from storage administrator• System administrator changes ownership of LUNs to DB2 instance owner• Instance owner can create GPFS file systems with the storage LUNs

• Best practice:• One (or more) GPFS file system for automatic storage and table spaces• One GPFS file system for database metadata (database path) and log files

• db2cluster -cfs command used for typical GPFSmanagement activities

47

The IBM General Parallel File System – or GPFS – is a scalable, highly-available, high performance file system optimized for multi-petabyte storage management. It has historically been used extensively in high performance computing (HPC) environments. As previously mentioned, it is shipped with, installed, configured, and updated as part of pureScale.

GPFS provides concurrent access from all of the hosts in the instance to the instance’s meta data files (sqllib_shared) and to the database’s files (storage paths, table space containers, and logs).

Physical disks within a disk subsystem are not usually directly accessible by host systems, such as DB2 database servers, and they are not directly visible to DBAs. Storage administrators provision units of storage as logical unit numbers (LUNs), which appear to host systems as SCSI disks. A LUN, however, is a completely virtual entity that can map to any combination of physical disks. An individual LUN can be a single RAID array, a portion of a RAID array, a single physical disk, a portion of a disk, or a meta of multiple RAID arrays.

In pureScale, these LUNs must be made visible to all of the hosts in the cluster so that the GPFS file systems that are created on them can be visible to all hosts.

To start with, these shared LUNs must be created. You will need to ask the storage administrator to do something called LUN masking or LUN zoning and make these LUNs visible to all of the hosts in the instance (unlike in ESE where only one host uses a particular LUN).

The file systems are then created on these LUNs and they are mounted so that they are equally visible on all of the hosts.

GPFS has its own volume manager and so it does not use the logical volume manager on AIX. Therefore, the system administrator does not need to create logical volumes. Instead, GPFS consumes these LUNs directly.

However, the system administrator does have to change the ownership of the LUNs/devices (e.g. /dev/hdisk1) to the instance owner ID. Once that is done, the DBA can use the db2cluster command to create the file systems, mount them, and so on.

Some enterprises might not actually be comfortable doing this. In their opinion, anything that smells like a file system is something that needs to be handled by a system administrator. That is fine. The system administrator doesn’t have to change the permissions on the LUNs to create the file system. He will just have to create it as root or a super user ID using either GPFS commands or the db2cluster command. Finally, he can then change the permissions on the file system to that of the instance owner so that it can be used by DB2.

48

#IDUG

Instance Directory Structure

…../sqllib/bin/../sqllib/bnd/../sqllib/lib64/…../sqllib/ctrl/../sqllib/db2dump/../sqllib/db2nodes.cfg../sqllib/db2systm../sqllib/function/…../sqllib/adm/…


…../sqllib_shared/ctrl/../sqllib_shared/db2dump/../sqllib_shared/db2nodes.cfg../sqllib_shared/db2systm../sqllib_shared/function/…

<shared file system>/<instance>/

…../bin/../bnd/../lib64/…

Local install directory on each host(e.g. /opt/IBM/db2/V10.5)







Member/local sqllib directories

48

This is the instance directory structure (a.k.a. the “sqllib” directory structure). This is just showing some of the main subdirectories and files that are found in there. If you do an “ls -l”command in this directory you will see a lot more.

First off, you will notice that it looks an awful lot like the directory structure in non-pureScale instances of DB2. In those cases, you have some directories and files that are local within the sqllib directory and some that point back to the installation directory. In pureScale, each member has its own sqllib directory and there is a combination of local directories and files, links that point back to the local installation directory (remember that the pureScale binaries get installed locally on each host as part of the installation process), and links that point to the sqllib_shared directory which gets created on the cluster shared file system (these are common across all hosts in the cluster).

Having a structure like this provides many benefits. Common information in the sqllib_shared directory provides for single-system view instance management but having local binaries and libraries allows for things like rolling fix pack updates.

49

#IDUG

Database Directory Layout

• Database directory resides in <dbpath>/<instance_name>/NODE0000

sqldbdir/

SQL00001/ <-- database tokendb2event/ <-- global event monitorsdb2rhist.asc/bak <-- history filesLOGSTREAM####/ <-- log directory (one per member)SQLOGCTL.GLFH.1,2 <-- global LFH filesSQLDBCONF <-- global configuration fileSQLSGF.1,2 <-- storage group control filesSQLSPCS.1,2 <-- table space control filesMEMBER0000MEMBER0001…MEMBER0002

…SQL00002/

…SQL0000X/

…

DBNAME1T#######/ <-- AS table space containers for DBNAME1DBNAME2T#######/ <-- AS table space containers for DBNAME2…

sqldbdir/

SQL00001/ <-- database tokendb2event/ <-- global event monitorsdb2rhist.asc/bak <-- history filesLOGSTREAM####/ <-- log directory (one per member)SQLOGCTL.GLFH.1,2 <-- global LFH filesSQLDBCONF <-- global configuration fileSQLSGF.1,2 <-- storage group control filesSQLSPCS.1,2 <-- table space control filesMEMBER0000MEMBER0001…MEMBER0002

…SQL00002/

…SQL0000X/

…

DBNAME1T#######/ <-- AS table space containers for DBNAME1DBNAME2T#######/ <-- AS table space containers for DBNAME2…

MEMBER####/ <-- Member-local directorydb2event/ <-- local event monitorsSQLDBCONF <-- local DB configSQLBP.1,2 <-- local BP configSQLOGCTL.LFH.1,2 <-- local LFH files SQLOGMIR.LFH <-- mirror LFH fileSQLINSLK,SQLTMPLK <-- lock files

MEMBER####/ <-- Member-local directorydb2event/ <-- local event monitorsSQLDBCONF <-- local DB configSQLBP.1,2 <-- local BP configSQLOGCTL.LFH.1,2 <-- local LFH files SQLOGMIR.LFH <-- mirror LFH fileSQLINSLK,SQLTMPLK <-- lock files

Partition-Global Directory

Member-Local Directory

49

The layout of the database directory structure was changed to accommodate multi-member pureScale databases and this applies to non-pureScale DB2 as well.

There is a partition-global directory that contains metadata files that are global to the database and this is found at <dbpath>/<instance_name>/NODE0000/SQL##### (remember that there is just a single database partition in a pureScale database and it is called partition 0 – hence the files are found in NODE0000). There is an SQL##### directory for each database in the instance (where the number is assigned when the database is created).

There are also member-specific metadata files. These are found within the MEMBER#### directories under <dbpath>/<instance_name>/NODE0000/SQL#####. Each member has a corresponding MEMBER#### directory (e.g. member 0 is in MEMBER0000). The instance database lock files, SQLINSLK,and SQLTMPLK, help to ensure that a database is used by only one instance of the database manager on each member.

50

#IDUG

Storage Management

• Storage management in pureScale is mainly performed through GPFS• E.g. Add storage to database by adding disks to existing file system(s)

• db2cluster –cfs is used to perform cluster file system management operations

• Supported options• create: Create a shared file system• add: Add disks to existing shared file system• remove: Remove disks from existing shared file system• delete: Delete shared file system• set: Set various configuration options• list: Returns details about configuration and tie-breaker disk• verify: Verifies configuration of file system cluster• mount/unmount: Mounts/unmounts a file system• rebalance: Restripes the data on disk across all disks in the file system• enter/exit -maintenance: Puts host into/takes host out of

maintenance mode

50

In pureScale, storage management is mainly performed through GPFS. For instance, you would typically create a GPFS file system (or more than one) to place a database on. If you require additional space to be added to the database then you wouldn't create new file systems and add them. Instead, you would add new storage into the existing file system(s).

But rather than making you learn GPFS commands, you can use DB2's db2cluster command with the –cfs option (there is also a –cm option used to manage the cluster manager).

The options of the db2cluster command that you can use depend on your authorization level. Some options can only be specified by the DB2 cluster services administrator, others can only be specified if you are part of the SYSADM, SYSCTL or SYSMAINT group, and a smaller subset of commands can be run by any user ID on the system.

51

#IDUG

Example Commands

• List all file systems in cluster

$> db2cluster -cfs -list -filesystemFILE SYSTEM NAME MOUNT_POINT--------------------------------- -------------------------db2fs1 /db2sd_20140112025840db2sddata /db2sdfs/db2sd_datadb2sdlogs /db2sdfs/db2sd_log

• List disks associated with a specific file system

$> db2cluster -cfs -list -filesystem db2sddata -diskPATH ON LOCAL HOST OTHER KNOWN PATHS--------------------------------- -------------------------/dev/hdisk5

• List configuration of a specific file system

> db2cluster -cfs -list -filesystem db2sddata –configurationOPTION VALUEblockSize 1048576defaultMountPoint /db2sdfs/db2sd_data…

51

These are examples of the “db2cluster –cfs –list” command.

The first example shows the command to list all of the file systems in the cluster. The second example shows all of the disks associated within the db2sddata file system. The third example shows the configuration of the db2sdddata file system.

52

#IDUG

Monitoring

52

<no speaker notes>

53

#IDUG

0 host0 0 - MEMBER1 host1 0 - MEMBER2 host2 0 - MEMBER3 host3 0 - MEMBER4 host4 0 - CF5 host5 0 - CF

db2nodes.cfg Host status

Instance status

> db2start08/24/2008 00:52:59 0 0 SQL1063N DB2START processing was successful. 08/24/2008 00:53:00 1 0 SQL1063N DB2START processing was successful. 08/24/2008 00:53:01 2 0 SQL1063N DB2START processing was successful.08/24/2008 00:53:01 3 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful.

> db2instance -list

ID TYPE STATE HOME_HOST CURRENT_HOST ALERT

0 MEMBER STARTED host0 host0 NO1 MEMBER STARTED host1 host1 NO2 MEMBER STARTED host2 host2 NO3 MEMBER STARTED host3 host3 NO4 CF PRIMARY host4 host4 NO5 CF PEER host5 host5 NO

HOST_NAME STATE INSTANCE_STOPPED ALERT

host0 ACTIVE NO NOhost1 ACTIVE NO NOhost2 ACTIVE NO NOhost3 ACTIVE NO NOhost4 ACTIVE NO NOhost5 ACTIVE NO NO

Instance and Host Status

0 host0 0 - MEMBER1 host1 0 - MEMBER2 host2 0 - MEMBER3 host3 0 - MEMBER4 host4 0 - CF5 host5 0 - CF

db2nodes.cfg

DB2 DB2 DB2 DB2

Single Database View

CF CF

Shared Data

host1host0 host3host2

host5

Clients

host4

53

The db2instance –list command is the most important status reporting tool in DB2 pureScale.

The output splits status information into two sections, instance status and host status.

The instance status section tells you the state of each member and CF as well as the current host that the member or CF resides on.

The host status section tells you the state of the hosts that the members and CFs are running on.

Alert conditions (which impact the availability of a host or functionality of a member or CF) are reported through this status interface.

54

#IDUG

Managing and Monitoring pureScale Using Optim Tooling

Optim Configuration Manager

• Full support for tracking and reporting of configurationchanges across clients on servers

Configuration Tracking and

Client Management

Data Studio• Full support for developing Java, C, and .NET applicationsagainst a DB2 pureScale environment

Application Development

DB2 High Performance Unload

• Support for high speed unload utility

Optim Query Workload Tuner

• Full support for query, statistics, and tuning advice forapplications on pureScale systems

Query Tuning

Data Studio Web Console

Optim Performance Manager

• Integrated alerting and notification• Seamless view of status and statistics across all members

and CFs

System Monitoring

Data Studio • Ability to perform common administration tasks across members and CF

• Integrated navigation through shared data instances

Database Administration

ProductpureScale SupportTask

54

This slide shows the type of system management and monitoring that you can do against a pureScale system using the IBM Optim tools suite and related products.

55

#IDUG

Monitoring the CF Environment• ENV_CF_SYS_RESOURCES admin view

• Shows system resources used by the CFs on the system• Includes physical memory, virtual memory, CPU load

• MON_GET_CF table function• Shows CF status information, including current, configured, and target sizes for CF

memory and structures

• DB2_CF admin view• Returns information about host name, state, and whether alerts exist for the CFs• Similar to what you would see in the output of db2instance -list

• db2pd options available as well(e.g. –cfinfo, –cfpool)

SELECT ID, NAME, VALUE, UNIT FROM SYSIBMADM.ENV_CF_SYS_RESOURCES ORDER BY ID

SELECT ID, CURRENT_HOST, STATE, ALERT FROM SYSIBMADM.DB2_CF ORDER BY ID

SELECT ID, HOST_NAME, DB_NAME, CURRENT_CF_MEM_SIZE, CURRENT_CG_GBP_SIZE,CURRENT_CF_LOCK_SIZE, CURRENT_CF_SCA_SIZE

FROM TABLE(MON_GET_CF(CAST(NULL AS INTEGER))) AS CFINFO ORDER BY ID

States include: STOPPED, RESTARTING, BECOMING_PRIMARY, PRIMARY, CATCHUP, PEER, ERROR

Here are some of the routines that can be used to monitor the CFs in the cluster, including some example queries (not all available columns are shown in these queries).

The ENV_CF_SYS_RESOURCES admin view returns a list of system resources used by the CFs on the system. For example, physical memory, virtual memory, and CPU load.

The MON_GET_CF table function shows information such as the current, configured, and target sizes for the CF and its structures (e.g. GBP, GLM, SCA). The current memory size represents the amount of a type of memory (e.g. GBP) currently in use by the system. The configured memory size represents the total amount of that type of memory that is currently configured by the database as the maximum. The value for current memory can never exceed that of configured memory. The target memory size represents a new configured maximum value for that type of memory. Usually, the target size is the same as the configured size. However, if the target and configured sizes differ, that means that that particular type of memory is undergoing an online change in its configured size. The process of allocating memory takes place over time. At any point during this resizing process, the configured memory represents the maximum amount of that type of memory that can be used at that specific point in time. Eventually, the configured memory becomes the same as target memory.

The DB2_CF admin view returns information about the host name, the state of the CF (e.g. STOPPED, RESTARTING, BECOMING_PRIMARY, PRIMARY, CATCHUP, PEER, ERROR) and whether there are any alerts for the CF. This is similar to the output that you would see in the output of the "db2instance –list" command. The DB2_GET_INSTANCE_INFO table function (not shown here) returns the same information as the DB2_MEMBER and DB2_CF administrative views, but enables you to filter the information returned by passing input parameters, such as the current host.

The db2pd –cfinfo command displays information about the CF that can be useful when diagnosing performance problems or just when generally looking at the state of the CF.

56

#IDUG

Monitoring the Member Environment• ENV_GET_SYSTEM_RESOURCES table function

• Returns OS, CPU, memory, and other information related to members on the system• Only returned from members that are active

• ENV_GET_DB2_SYSTEM_RESOURCES table function• Returns CPU usage (user and system) for DB2 processes running under the

specified members

• DB2_MEMBER admin view• Returns information about host name, state, and whether alerts exist for the members• Similar to what you would see in the output of db2instance -list

SELECT MEMBER, HOST_NAME, OS_NAME, OS_VERSION, OS_RELEASEFROM TABLE(SYSPROC.ENV_GET_SYSTEM_RESOURCES(-2)) ORDER BY MEMBER

SELECT MEMBER, DB2_PROCESS_NAME, DB2_PROCESS_ID, CPU_USER, CPU_SYSTEMFROM TABLE(SYSPROC.ENV_GET_DB2_SYSTEM_RESOURCES(-2)) ORDER BY MEMBER

SELECT ID, HOME_HOST, CURRENT_HOST, STATE, ALERT FROM SYSIBMADM.DB2_CFORDER BY ID

States include: STARTED, STOPPED, RESTARTING, WAITING_FOR_FAILBACK, ERROR

The ENV_GET_SYSTEM_RESOURCES table function returns operating system, CPU, memory, and other information that is related to members on the system. Data is returned only from members where the database that issued the command is active.

The ENV_GET_DB2_SYSTEM_RESOURCES table function returns CPU usage and DB2 process information for specified members in the current instance. The main process of interest will be the DB2 system controller process (db2sysc).

The DB2_CF member view returns information about the host name (the "home host" on which the member will reside when everything is healthy, and the "current host" that the member is actually running on (which will be different if things aren't healthy with this member)), the state of the member (e.g. STARTED, STOPPED, RESTARTING, WAITING_FOR_FAILBACK, and ERROR) and whether there are any alerts for the member. This is similar to the output that you would see in the output of the "db2instance –list" command. The DB2_GET_INSTANCE_INFO table function (not shown here) returns the same information as the DB2_MEMBER and DB2_CF administrative views, but enables you to filter the information returned by passing input parameters, such as the current host.

57

#IDUG

Viewing Alerts

• Existence of alerts shown in output of db2instance –list, DB2_MEMBER, and DB2_CF

• Very detailed alert messages available, including the impact and action to take

• Two options to list the alerts:

• Example output:

SELECT MESSAGE, ACTION, IMPACT FROM SYSIBMADM.DB2_INSTANCE_ALERTS

db2cluster –cm –list -alert

MESSAGE----------------------------------------------------------------------------------------------------Could not restart light DB2 member '0' on hosts 'hostA'. Check the db2diag.log for messages concerning a restart light or database crash recovery failure on the indicated hosts for DB2 member '0'.

ALERT_ACTION----------------------------------------------------------------------------------------------------This alert must be cleared manually with the command: 'db2cluster –clear -alert -member 0'

IMPACT----------------------------------------------------------------------------------------------------DB2 member '0' will not be able to restart light on host 'hostC' until this alert has been cleared.

There is an ALERT column in the output of the db2instance –list command in the DB2_MEMBER and DB2_CF admin views. If a member, CF, or host has a value of YES for an alert then you can query more information about it. This can be done using the db2cluster –cm –list –alert command, or you can query from the DB2_INSTANCE_ALERTS admin view. For each alert there is a message, an alert action, and an impact. Some alerts will be cleared automatically whenever the root cause of the problem has been resolved and others may require manual intervention. The text will state what the manual action is, if any.

58

#IDUG

Monitoring Logging

• Each member has its own log stream and therefore monitoring is typically viewed per-member and not in aggregate

• The various admin views, table functions, and procedures that return logging information allow you to query for all members or returninformation for all members

• SYSIBMADM.LOG_UTILIZATION• SYSIBMADM.SNAPDB• SYSIBMADM.SNAPDETAILLOG• MON_GET_UNIT_OF_WORK• MON_GET_WORKLOAD

58

In a pureScale database, there is a single shared copy of the database but each member has its own log stream. Monitoring is typically viewed per-member and not in aggregate (similar to DPF).

There are various administrative views, table functions, and stored procedures that are related to monitoring. All of the ones listed here report some level of logging activity. Some of these things accept a MEMBER or DBPARTITIONNUM parameter that allows you to get the information for a specific member, the currently connected to member, or all members. Others return a row per member or database partition number as part of the result set.

59

59

#IDUG

Monitoring Locking

• Monitoring locking in pureScale involves reviewing both locks held within a member and locks between members

• Higher level monitor elements report data on locking in the instance, both within and between members

• locks_held• lock_wait_time• lock_wait_time_top• lock_waits• lock_timeouts

• Monitor elements are returned in various MON_* table functions and snapshots• Includes locks held, waits, time outs, escalations, deadlocks, etc.• MON_GET_UNIT_OF_WORK, MON_GET_CONNECTION,

MON_LOCKWAITS, MON_GET_LOCKS• SNAPDB admin view

As with traditional DB2 environments, lock management in a pureScale environment is essential for maintaining both data integrity and high levels of concurrency. Locking across members in a DB2 pureScale environment is managed by the global lock manager (GLM) component of the cluster caching facility. Monitoring locking in a DB2 pureScale environment involves reviewing not only locks that might be held within a member, but also lock waits between members.

In a DB2 pureScale environment, the fact that different members work with the same data introduces the possibility of another type of contention for data: when two members want to update the same object. When a member needs a lock for an object, the local lock manager (LLM) component within the member works with the global lock manager (GLM): if the LLM does not already hold a lock for the object in question, the LLM requests a lock from the GLM. In this way, the GLM mediates requests for locks from different members.

When viewed at the global level for the DB2 pureScale instance, monitor elements such as locks_held, or lock_wait_time report data on all locks in the instance, both within and between members. Monitor elements added specifically for pureScale can be used to examine just the lock waits between members.

You can use the MON_GET_UNIT_OF_WORK and MON_GET_CONNECTION table functions to return information about locks held. MON_LOCKWAITS is good for understanding what lock waits are happening in the system and MON_GET_LOCKS can be used to get holder and waiter information for all of the locks.

60

#IDUG

Monitoring Lock Waits Across Members

• Example: Lock held by transaction on member 0 while transactions onmembers 1 and 2 are waiting on it

select lw.lock_wait_start_time, lw.lock_name, lw.req_application_handle,lw.req_member, lk.application_handle as hld_application_handle, lw.hld_member

from table(mon_get_appl_lockwait(null,-2)) as lw,lateral(select lock_name, member, lock_status, application_handlefrom table(mon_get_locks(CLOB('<lock_name>'||lw.lock_name||'</lock_name>'),

lw.hld_member)) as lockswhere locks.lock_status='G' fetch first row only) as lk

LOCK_WAIT_START_TIME LOCK_NAME REQ_APPLICATION_HANDLE REQ_MEMBER HLD_APPLICATION_HANDLE HLD_MEMBER-------------------------- --------------------------- ---------------------- ---------- ---------------------- ----------2011-12-15-13.13.31.625771 00030006000000000000000452 65589 1 87 02011-12-15-13.13.38.763436 00030006000000000000000452 131151 2 87 0

Waiter Holder

60

The example here involves an application executing a transaction on member 0 that gets a lock (e.g. updating a record). An application then executes a transaction on member 1 that tries to get the same lock (e.g. trying to update the same record). Finally, a third application then does the same thing but on member 2. The lock has been granted to the transaction on member 0 and the transactions on member 1 and member 2 are both waiting on that lock.

61

#IDUG

Monitoring Page Reclaims

• As described earlier, if a member is modifying a page and another member needs that copy of the page, the second member reclaims the page from the first one

• MON_GET_PAGE_ACCESS_INFO table function returns information about page reclaims that have taken place

• Page reclaims for exclusive (X) or shared (S) access• Type of object (e.g. TABLE, XML, INDEX)• Wait time for reclaims• Differentiates between initiator or target of reclaim

• Reclaim wait time also available in other monitoring table functions such as MON_GET_PKG_CACHE_STMT, MON_GET_UNIT_OF_WORK, MON_GET_WORKLOAD

SELECT TABSCHEMA, TABNAME, OBJTYPE, DATA_PARTITION_ID, IID, MEMBER,PAGE_RECLAIMS_X, PAGE_RECLAIMS_S,PAGE_RECLAIMS_INITIATED_X, PAGE_RECLAIMS_INITIATED_S,RECLAIM_WAIT_TIME

FROM TABLE(MON_GET_PAGE_ACCESS_INFO('', '', -2))WHERE PAGE_RECLAIMS_X !=0 OR PAGE_RECLAIMS_S !=0ORDER BY MEMBER ASC, PAGE_RECLAIMS_X ASC

When examining where a particular application or statement is spending its time, in addition to the time spent waiting for locks, the applications, or statements running in a pureScale environment might need to wait for a page to become available when it is in use by another member. You can use page reclaiming monitor elements to view the extent to which this type of wait might be affecting throughput on your system.

To view page reclaim statistics, use the MON_GET_PAGE_ACCESS_INFO table function. This table function returns object-level information about the extent to which members both request pages currently in use by other members, and the extent to which members release those pages at the request of other members. You can also retrieve the wait times involved.

The example query on this slide shows all of the tables, indexes, or XML objects that have been involved in a page reclaim for either shared or exclusive lock requests.

Additionally, if monitoring statements, transactions, or workloads, you can see how much reclaim time is being spent as part of them. MON_GET_PKG_CACHE_STMT, MON_GET_UNIT_OF_WORK, and MON_GET_WORKLOAD all have a RECLAIM_WAIT_TIME monitor element. Using MON_GET_PKG_CACHE_STMT, for example, you can do things like show the 10 statements that are causing the highest number of pages reclaimed, along with the average wait time for each execution of each statement:

SELECT SUBSTR(STMT_TEXT,1,75) AS STMT_TEXT, NUM_EXECUTIONS, RECLAIM_WAIT_TIME, DEC(FLOAT(RECLAIM_WAIT_TIME)/FLOAT(NUM_EXECUTIONS),10,8) AS AVG_WAIT_PEREXECFROM TABLE(MON_GET_PKG_CACHE_STMT('D',NULL,NULL,-2))TABLEWHERE RECLAIM_WAIT_TIME > 0 ORDER BY AVG_WAIT_PEREXEC DESC FETCH

FIRST 10 ROWS ONLY

62

#IDUG

Monitoring CF Command Execution Time

• MON_GET_CF_CMD table function• Returns information about the processing times for CF commands (per CF, not per member)• Commands include: ReadAndRegister, WriteAndRegister, SetLockState, CrossInvalidate

• MON_GET_CF_WAIT_TIME table function• Returns the total time measurement for CF commands (one row per member)• Includes the network transport time to and from the CF and the execution time on the CF

SELECT ID, HOSTNAME, CF_CMD_NAME, TOTAL_CF_REQUESTS, TOTAL_CF_CMD_TIME_MICROFROM TABLE(MON_GET_CF_CMD(NULL)) AS CFINFO ORDER BY ID, TOTAL_CF_REQUESTS

SELECT ID, MEMBER, HOSTNAME, CF_CMD_NAME, TOTAL_CF_REQUESTS, TOTAL_CF_CMD_TIME_MICRO

FROM TABLE(MON_GET_CF_WAIT_TIME(-2)) AS CFINFO ORDER BY MEMBER, ID

CF1. Member to CF

command request

3. CF responds to member

2. Worker threadexecutes command

Member

MON_GET_CF_CMD time MON_GET_CF_WAIT_TIME time

The MON_GET_CF_CMD table function returns information about the processing times for CF commands. Note that this is reporting information for each CF without regard for which member requested it. In other words, this is purely keeping track of the work that each CF is doing. There are dozens of different commands that the member may reach out to the CF to do. A few of them are listed on the slide.

The MON_GET_CF_WAIT_TIME table function returns the total time measurement for cluster caching facility (CF) commands. This time includes the network transport time to and from the CF and the execution time of the command within the CF. In this case, the metrics are reporting things based on a member-by-member basis so you can tell how long each member is waiting on the different commands (and these are the same commands that are reported on in MON_GET_CF_CMD).

The diagram on this slide is intended to show what times are being included for each of these two functions. The time reported in MON_GET_CF_CMD is for #2 only. The time reported in MON_GET_CF_WAIT_TIME is for all of the steps here, #1 - #3.

63

#IDUG

Buffer Pool Monitoring

• MON_GET_GROUP_BUFFERPOOL table function• Returns statistics about the group buffer pool (GBP) in the CF• Today, this is just the number of times that the GBP ran into a full condition

• MON_GET_BUFFERPOOL and MON_GET_TABLESPACE table functions• Includes monitor metrics for local buffer pools and the group buffer pool• Castout activity is included as part of the asynchronous disk writes metrics• See next page for list of pureScale-related monitor elements

SELECT SUM(T.NUM_GBP_FULL) AS NUM_GBP_FULL FROM TABLE(MON_GET_GROUP_BUFFERPOOL(-2)) AS T

The example on this slide returns the number of times that a group buffer pool full error condition was encountered by all of the members. A group buffer pool full condition is not good because it means that there are no free slots to read pages into. Before we can put more pages into the GBP we first need to clean out pages to disk (castout) and agents needing to write into the GBP will need to wait while this is happening.

There are monitor metrics that you want to look at if monitoring buffer pools in pureScale. The list can be found on the next slide.

64

#IDUG

Buffer Pool Monitoring (cont.)

• POOL_DATA_GBP_L_READS• Number of times a data page was attempted to be read from the Group Buffer Pool (GBP)

because the page was either invalid or not present in the Local Buffer Pool (LBP)

• POOL_DATA_GBP_P_READS• Number of times a data page was read into the LBP from disk because it was not found in

the GBP

• POOL_DATA_GBP_INVALID_PAGES• Number of times a data page was invalid in the local buffer pool and was read from the

group buffer pool instead.

• POOL_DATA_GBP_INDEP_PAGES_FOUND_IN_LBP• Number of GBP-independent data pages found in a LBP by an agent.

• POOL_DATA_LBP_PAGES_FOUND• Number of times a data page was present in the LBP

• Additional elements for INDEX and XML also exist• Asynchronous versions of these elements (e.g. POOL_ASYNC_XXXX) exist as well

These are some of the buffer pool monitor elements related to pureScale. GBP refers to the Group Buffer Pool on the CF. LBP is the Local Buffer Pool on a member.

A GBP-independent page is a page that is only ever accessed through a LBP of a member, and never exists in the GBP. Pages might be GBP-independent because the operations using the page, or the objects where the pages come from, are only accessed by the local member. POOL_DATA_GBP_INDEP_PAGES_FOUND_IN_LBP is a monitor element related to GBP-independent pages. All of the rest of the GBP related elements shown here are for GBP-dependent pages (those that can be written to the GBP).

The elements shown on this slide all have DATA in their name. There is an identical set of elements for INDEX and XML pages as well. Also, for each of these elements shown (including the INDEX and XML ones that aren't shown) there is an asynchronous version as well (e.g. there is a POOL_ASYNC_DATA_GBP_L_READS).

65

#IDUG

Monitoring MCR and GCR Progress

• LIST UTILITIES SHOW DETAIL

ID = 1Type = GROUP CRASH RECOVERY (or MEMBER)Database Name = SAMPLEMember Number = 0Description = Group Crash Recovery (or Member)Start Time = 03/12/2014 15:20:05.646020State = ExecutingInvocation Type = UserProgress Monitoring:

Estimated Percentage Complete = 75Phase Number [Current] = 1

Description = ForwardTotal Work = 163834 bytesCompleted Work = 124145 bytesStart Time = 03/12/2014 15:20:05.646121

Phase Number = 2Description = BackwardTotal Work = 163834 bytesCompleted Work = 0 bytesStart Time = Not Started

• Similar information also available through db2pd -recovery

65

The LIST UTILITIES SHOW DETAIL command can be used during Member Crash Recovery or Group Crash Recovery processing to show how much work is remaining in each of the two phases (forward/redo and backward/undo).

This information is also available using the db2pd –recovery command.

66

#IDUG

Additional DB2 pureScale Material to Get You Started

DB2 pureScale Redbookhttp://www.redbooks.ibm.com/abstracts/sg248018.html

DB2 pureScale Bookhttp://public.dhe.ibm.com/common/ssi/ecm/en/imm14079usen/IMM14079USEN.PDF

IBM DB2 pureScale Product Informationhttp://www-01.ibm.com/software/data/db2/linux-unix-windows/purescale/

DB2 Information Centerhttp://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.licensing.doc/doc/c0057442.html

developerWorks Articleshttp://www.ibm.com/search/csass/search/?q=purescale&sn=dw&dws=dw

66

If you are interested in reading more on DB2 pureScale then give these things a look.

67

#IDUG

Optimized solution stack

Servers

Storage

Networking

SystemManagement

Data Management

DeploymentDatabases

• Factory integrated and optimized • Server, storage, networking and software

• Smart workload management• Pre-defined cluster and database patterns• Automatic failure detection and recovery• Automated provisioning and load balancing• Automated tuning and query optimization• Intelligent use of solid state and disk storage

• Integrated system management• Single console to manage all resources and

workloads running on the system• Integrated maintenance• Single point of contact and support

Another Option: IBM PureData Systems for TransactionsIntegrated design, built-in expertise and simplified experience with DB2 pureScale built-in

67

If you are interested in using pureScale, but don't want to build the clustered database environment yourself, you have another option: IBM PureData System for Transactions. This is a workload-optimized, expert-integrated system that allows you to very quickly and easily deploy DB2 pureScale instances within it.

IBM PureData Systems is a member of the IBM PureSystems family. Common to PureSystems is:

• Built-in database management expertise that enables the system to do many of the ongoing administration tasks automatically, freeing up database staff from routine work• Integration by design that results in factory-optimized systems designed for high reliability and scalability out of the box, streamlining system integration efforts • A simplified experience from design to purchase to maintenance, which helps reduce total cost of operations

Specifically, PureData System for Transactions is a purpose-built system designed for transaction processing (OLTP) workloads. It represents the future of data services delivery and management. It combines the simplicity of a workload-optimized appliance with the flexibility of tailor-made systems, providing both fast time-to-value and customization to meet specific business needs.

All hardware components (servers, storage, networking) are already installed and integrated by IBM prior to it being delivered to a client. At delivery time the system simply needs to be plugged into the power, cabled into the corporate network, and a small of amount of network configuration is required (by IBM). This can be completed within just a few hours. After that the system's GUI console can be used to deploy pureScale clusters and databases (very quickly and easily).

For more information see the IBM PureData Systems external site at http://www-01.ibm.com/software/data/puredata/

68

#IDUG

Kelly SchlambExecutive IT Specialist, IBM Canada [email protected]

Session: G05Title: A DB2 DBA's Guide to pureScale

68

Thanks for attending/reading. If you have any further questions that haven't been addressed here today, please feel free to reach out to me at the email address provided.

idugna2014-g05-a db2 dbas guide to purescale

Documents