implement cdc

187
Informatica PowerExchange (Version 9.0) CDC Guide for Linux, UNIX, and Windows

Upload: naveen9009

Post on 28-Nov-2014

3.855 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Implement CDC

Informatica PowerExchange (Version 9.0)

CDC Guide for Linux, UNIX, and Windows

Page 2: Implement CDC

Informatica PowerExchange CDC Guide for Linux, UNIX, and Windows

Version 9 .0December 2009

Copyright (c) 1998-2009 Informatica. All rights reserved.

This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreementcontaining restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited.No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or international Patents and other PatentsPending.

Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable softwarelicense agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013©(1)(ii) (OCT 1988), FAR12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.

The information in this product or documentation is subject to change without notice. If you find any problems in this product ordocumentation, please report them to us in writing.

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter DataAnalyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B DataTransformation, Informatica B2B Data Exchange and Informatica On Demand are trademarks or registered trademarks of InformaticaCorporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names ortrademarks of their respective owners.

Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: CopyrightDataDirect Technologies. All rights reserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All RightsReserved. Copyright © Ordinal Technology Corp. All rights reserved.Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc.All rights reserved. Copyright 2007 Isomorphic Software. All rights reserved. Copyright © Meta Integration Technology, Inc. All rightsreserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems Incorporated. Allrights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © MicrosoftCorporation. All rights reserved. Copyright © Rouge Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rightsreserved. Copyright © Yahoo! Inc. All rights reserved. Copyright © Glyph & Cog, LLC. All rights reserved.

This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which islicensed under the Apache License, Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "ASIS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specificlanguage governing permissions and limitations under the License.

This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, allrights reserved; software copyright © 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under theGNU Lesser General Public License Agreement, which may be found at http://www.gnu.org/licenses/lgpl.html. The materials are providedfree of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but not limited to the impliedwarranties of merchantability and fitness for a particular purpose.

The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at WashingtonUniversity, University of California, Irvine, and Vanderbilt University, Copyright (©) 1993-2006, all rights reserved.

This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. AllRights Reserved) and redistribution of this software is subject to terms available at http://www.openssl.org.

This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <[email protected]>. All Rights Reserved.Permissions and limitations regarding this software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission touse, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyrightnotice and this permission notice appear in all copies.

The product includes software copyright 2001-2005 (©) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding thissoftware are subject to terms available at http://www.dom4j.org/ license.html.

The product includes software copyright © 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regardingthis software are subject to terms available at http:// svn.dojotoolkit.org/dojo/trunk/LICENSE.

This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved.Permissions and limitations regarding this software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.

This product includes software copyright © 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in thelicense which may be found at http://www.gnu.org/software/ kawa/Software-License.html.

This product includes OSSP UUID software which is Copyright © 2002 Ralf S. Engelschall, Copyright © 2002 The OSSP ProjectCopyright © 2002 Cable & Wireless Deutschland. Permissions and limitations regarding this software are subject to terms available athttp://www.opensource.org/licenses/mit-license.php.

Page 3: Implement CDC

This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions andlimitations regarding this software are subject to terms available at http:/ /www.boost.org/LICENSE_1_0.txt.

This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software aresubject to terms available at http://www.pcre.org/license.txt.

This product includes software copyright © 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding thissoftware are subject to terms available at http:// www.eclipse.org/org/documents/epl-v10.php.

This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT,http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org, http://slf4j.org/license.html, and http://www.sente.ch/software/OpenSourceLicense.htm.

This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the CommonDevelopment and Distribution License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php) and the BSD License (http://www.opensource.org/licenses/bsd-license.php).

This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions andlimitations regarding this software are subject to terms available at http://xstream.codehaus.org/license.html. This product includessoftware developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/.

This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374;6,092,086; 6,208,990; 6,339,775; 6,640,226; 6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,254,590; 7,281,001; 7,421,458; and 7,584,422, international Patents and other Patents Pending..

DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied,including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. InformaticaCorporation does not warrant that this software or documentation is error free. The information provided in this software or documentationmay include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change atany time without notice.

NOTICES

This Informatica product (the “Software”) includes certain drivers (the “DataDirect Drivers”) from DataDirect Technologies, an operating company of Progress SoftwareCorporation (“DataDirect”) which are subject to the following terms and conditions:

1.THE DATADIRECT DRIVERS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOTLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OFTHE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACHOF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.

Part Number: PWX-CCl-900-0001

Page 4: Implement CDC
Page 5: Implement CDC

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viInformatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Part I: PowerExchange CDC Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 1: Change Data Capture Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2PowerExchange CDC Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Change Data Capture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Change Data Extraction and Apply. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

PowerExchange CDC Data Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

DB2 for Linux, UNIX, and Windows Data Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Microsoft SQL Server Data Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Oracle Data Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

i5/OS and z/OS Data Sources with Offload Processing. . . . . . . . . . . . . . . . . . . . . . . . . . 5

PowerExchange CDC Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

PowerExchange Listener. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

PowerExchange Logger for Linux, UNIX, and Windows. . . . . . . . . . . . . . . . . . . . . . . . . . 6

PowerExchange Navigator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

PowerExchange Integration with PowerCenter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

PowerExchange CDC Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Summary of CDC Implementation Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Part II: PowerExchange CDC Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 2: PowerExchange Listener. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12PowerExchange Listener Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Customizing the dbmover.cfg File for CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

CAPI_CONNECTION Statements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Starting the PowerExchange Listener. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Stopping the PowerExchange Listener. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Displaying Active PowerExchange Listener Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Table of Contents i

Page 6: Implement CDC

Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows. . . . . . . . . . . . . 19PowerExchange Logger Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

PowerExchange Logger Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

PowerExchange Logger Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

CDCT File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

PowerExchange Logger Log Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Checkpoint Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Cache Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Lock Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Message Log Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

File Switches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

PowerExchange Logger Operational Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Continuous Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Batch Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

PowerExchange Logger Considerations on Linux and UNIX. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

PowerExchange Logger Memory Requirement on Linux or UNIX. . . . . . . . . . . . . . . . . . . 27

Running the PowerExchange Logger in Background Mode. . . . . . . . . . . . . . . . . . . . . . . 27

Configuring the PowerExchange Logger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Enabling a Capture Registration for PowerExchange Logger Use. . . . . . . . . . . . . . . . . . . 27

Customizing the PowerExchange Logger Configuration File. . . . . . . . . . . . . . . . . . . . . . 28

Customizing dbmover.cfg for the PowerExchange Logger. . . . . . . . . . . . . . . . . . . . . . . . 43

Using PowerExchange Logger Group Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Starting the PowerExchange Logger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

PWXCCL Syntax and Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

How the PowerExchange Logger Determines the Start Point for a Cold Start. . . . . . . . . . . 48

Cold Starting the PowerExchange Logger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Managing the PowerExchange Logger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Commands for Controlling and Stopping PowerExchange Logger Processing. . . . . . . . . . . 49

Assessing PowerExchange Logger Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Maintaining the PowerExchange Logger CDCT File and Log Files. . . . . . . . . . . . . . . . . . 53

Backing Up PowerExchange Logger Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Re-creating the CDCT File After a Failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Part III: PowerExchange CDC Data Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Chapter 4: DB2 for Linux, UNIX, and Windows Change Data Capture. . . . . . . . . . . 56DB2 for Linux, UNIX, and Windows CDC Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Planning for DB2 CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Required User Authority. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

CDC Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Configuring DB2 for CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

ii Table of Contents

Page 7: Implement CDC

Configuring PowerExchange for DB2 CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Configuring PowerExchange CDC without the PowerExchange Logger. . . . . . . . . . . . . . . 59

Configuring PowerExchange CDC with the PowerExchange Logger. . . . . . . . . . . . . . . . . 60

Creating the Capture Catalog Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Initializing the Capture Catalog Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Customizing dbmover.cfg for DB2 CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Using a DB2 Data Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Task Flow for DB2 Data Map Use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Managing DB2 CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Stopping DB2 CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Changing a DB2 Source Table Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Reconfiguring a Partitioned Database or Database Partition Group. . . . . . . . . . . . . . . . . . 67

DB2 for Linux, UNIX, and Windows CDC Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Workaround for SQL1224 Error on AIX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

IBM APARs for Specific Issues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Chapter 5: Microsoft SQL Server Change Data Capture. . . . . . . . . . . . . . . . . . . . . . 70Microsoft SQL Server CDC Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Planning for SQL Server CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

SQL Server CDC Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Required User Authority for SQL Server CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Datatypes Supported for SQL Server CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

SQL Server CDC Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Configuring SQL Server for CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Configuring PowerExchange for SQL Server CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Configuring PowerExchange CDC without the PowerExchange Logger. . . . . . . . . . . . . . . 74

Configuring PowerExchange CDC with the PowerExchange Logger. . . . . . . . . . . . . . . . . 75

Customizing dbmover.cfg for SQL Server CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Managing SQL Server CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Disabling Publication of Change Data for a SQL Server Source. . . . . . . . . . . . . . . . . . . . 78

Changing a SQL Server Source Table Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Chapter 6: Oracle Change Data Capture with Oracle LogMiner. . . . . . . . . . . . . . . . 80Overview of Oracle LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Planning for Oracle LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Requirements and Restrictions for Oracle LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . 81

Datatypes Supported for Oracle LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

SQL*Loader Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Performance Considerations for Oracle LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . 83

Oracle Configuration for LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Configuration Script Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Configuring Oracle for LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Configuration in an Oracle RAC Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Table of Contents iii

Page 8: Implement CDC

PowerExchange Configuration for Oracle LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Configuring Oracle LogMiner CDC without the PowerExchange Logger. . . . . . . . . . . . . . . 88

Configuring Oracle LogMiner CDC with the PowerExchange Logger. . . . . . . . . . . . . . . . . 89

Customizing dbmover.cfg for Oracle LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Management of Oracle LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Stopping Oracle LogMiner CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Changing a Source Table Definition Used in Oracle LogMiner CDC. . . . . . . . . . . . . . . . . 102

Part IV: Change Data Extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Chapter 7: Introduction to Change Data Extraction. . . . . . . . . . . . . . . . . . . . . . . . . 105Change Data Extraction Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Extraction Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

PowerExchange-Generated Columns in Extraction Maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

Restart Tokens and the Restart Token File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Generating Restart Tokens. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Restart Token File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Recovery and Restart Processing for CDC Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

PowerCenter Recovery Tables for Relational Targets. . . . . . . . . . . . . . . . . . . . . . . . . . 111

PowerCenter Recovery Files for Nonrelational Targets. . . . . . . . . . . . . . . . . . . . . . . . . 112

Application Names. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Restart Processing for CDC Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Group Source Processing in PowerExchange. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Using Group Source with Nonrelational Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Using Group Source with CDC Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Commit Processing with PWXPC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Controlling Commit Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Maximum and Minimum Rows per Commit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Target Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Examples of Commit Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Offload Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

CDC Offload Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Multithreaded Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Chapter 8: Extracting Change Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125Overview of Extracting Change Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Task Flow for Extracting Change Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Testing a Change Data Extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Configuring PowerCenter CDC Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Changing Default Values for Session and Connection Attributes. . . . . . . . . . . . . . . . . . . 128

Configuring Application Connection Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Creating Restart Tokens for Extractions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Displaying Restart Tokens. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

iv Table of Contents

Page 9: Implement CDC

Configuring the Restart Token File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Restart Token File Statements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Restart Token File - Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Chapter 9: Managing Change Data Extractions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 140Starting PowerCenter CDC Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Cold Start Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Warm Start Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Recovery Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Stopping PowerCenter CDC Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Stop Command Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Terminating Conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Changing PowerCenter CDC Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Examples of Creating a Restart Point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Recovering PowerCenter CDC Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Example of Session Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Chapter 10: Monitoring and Tuning Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148Monitoring Change Data Extractions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Monitoring CDC Sessions in PowerExchange. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Monitoring CDC Sessions in PowerCenter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Tuning Change Data Extractions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Using PowerExchange Parameters to Tune CDC Sessions. . . . . . . . . . . . . . . . . . . . . . 155

Using Connection Options to Tune CDC Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

CDC Offload and Multithreaded Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Planning for CDC Offload and Multithreaded Processing. . . . . . . . . . . . . . . . . . . . . . . . 160

Enabling Offload and Multithreaded Processing for CDC Sessions. . . . . . . . . . . . . . . . . 161

Configuring PowerExchange to Capture Change Data on a Remote System. . . . . . . . . . . 162

Extracting Change Data Captured on a Remote System. . . . . . . . . . . . . . . . . . . . . . . . 168

Configuration File Examples for CDC Offload Processing. . . . . . . . . . . . . . . . . . . . . . . 168

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Table of Contents v

Page 10: Implement CDC

PrefaceThis guide describes how to configure, implement, and manage PowerExchange Change Data Capture (CDC) onLinux, UNIX, and Windows systems.

This guide applies to the CDC option of the following PowerExchange products:

¨ PowerExchange for DB2® for Linux®, UNIX®, and Windows®

¨ PowerExchange for Oracle®

¨ PowerExchange for SQL Server®

Note: If you use the offloading feature, some PowerExchange CDC processing for DB2 for i5/OS data sources andz/OS data sources can also run on Linux, UNIX, or Windows.

Before implementing change data capture, verify that you have installed the required PowerExchange components.

Informatica Resources

Informatica Customer PortalAs an Informatica customer, you can access the Informatica Customer Portal site at http://my.informatica.com. Thesite contains product information, user group information, newsletters, access to the Informatica customer supportcase management system (ATLAS), the Informatica How-To Library, the Informatica Knowledge Base, theInformatica Multimedia Knowledge Base, Informatica Documentation Center, and access to the Informatica usercommunity.

Informatica DocumentationThe Informatica Documentation team takes every effort to create accurate, usable documentation. If you havequestions, comments, or ideas about this documentation, contact the Informatica Documentation team throughemail at [email protected]. We will use your feedback to improve our documentation. Let usknow if we can contact you regarding your comments.

The Documentation team updates documentation as needed. To get the latest documentation for your product,navigate to the Informatica Documentation Center from http://my.informatica.com.

Informatica Web SiteYou can access the Informatica corporate web site at http://www.informatica.com. The site contains informationabout Informatica, its background, upcoming events, and sales offices. You will also find product and partner

vi

Page 11: Implement CDC

information. The services area of the site includes important information about technical support, training andeducation, and implementation services.

Informatica How-To LibraryAs an Informatica customer, you can access the Informatica How-To Library at http://my.informatica.com. The How-To Library is a collection of resources to help you learn more about Informatica products and features. It includesarticles and interactive demonstrations that provide solutions to common problems, compare features andbehaviors, and guide you through performing specific real-world tasks.

Informatica Knowledge BaseAs an Informatica customer, you can access the Informatica Knowledge Base at http://my.informatica.com. Usethe Knowledge Base to search for documented solutions to known technical issues about Informatica products.You can also find answers to frequently asked questions, technical white papers, and technical tips. If you havequestions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base team throughemail at [email protected].

Informatica Multimedia Knowledge BaseAs an Informatica customer, you can access the Informatica Multimedia Knowledge Base at http://my.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia files that helpyou learn about common concepts and guide you through performing specific tasks. If you have questions,comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Base team throughemail at [email protected].

Informatica Global Customer SupportYou can contact a Customer Support Center by telephone or through the WebSupport Service. WebSupportrequires a user name and password. You can request a user name and password at http://my.informatica.com.

Use the following telephone numbers to contact Informatica Global Customer Support:

North America / South America Europe / Middle East / Africa Asia / Australia

Toll Free+1 877 463 2435 Standard RateBrazil: +55 11 3523 7761Mexico: +52 55 1168 9763United States: +1 650 385 5800

Toll Free00 800 4632 4357 Standard RateBelgium: +32 15 281 702France: +33 1 41 38 92 26Germany: +49 1805 702 702Netherlands: +31 306 022 797Spain and Portugal: +34 93 480 3760United Kingdom: +44 1628 511 445

Toll FreeAustralia: 1 800 151 830Singapore: 001 800 4632 4357 Standard RateIndia: +91 80 4112 5738

Preface vii

Page 12: Implement CDC

viii

Page 13: Implement CDC

Part I: PowerExchange CDCIntroduction

This part contains the following chapters:

¨ Change Data Capture Introduction, 2

1

Page 14: Implement CDC

C H A P T E R 1

Change Data Capture IntroductionThis chapter includes the following topics:

¨ PowerExchange CDC Overview, 2

¨ PowerExchange CDC Data Sources, 4

¨ PowerExchange CDC Components, 6

¨ PowerExchange Integration with PowerCenter, 7

¨ PowerExchange CDC Architecture, 8

¨ Summary of CDC Implementation Tasks, 10

PowerExchange CDC OverviewPowerExchange Change Data Capture (CDC) works in conjunction with PowerCenter to capture changes to datain source tables and replicate those changes to target tables and files. This guide describes PowerExchange CDCfor relational database sources on Linux, UNIX, or Windows operating systems.

These sources are:

¨ DB2 for Linux, UNIX, and Windows

¨ Microsoft SQL Server on Windows

¨ Oracle on Linux, UNIX, or Windows

After materializing target tables or files with PowerExchange bulk data movement, you can use PowerExchangeCDC to synchronize the targets with their corresponding source tables. Synchronization is faster when youreplicate only the change data rather than all of the data.

The change data replication process consists of following high-level steps:

1. Change data capture. PowerExchange captures change data for the source tables. PowerExchange canread change data directly from the RDBMS log files or database. Optionally, you can use the PowerExchangeLogger for Linux, UNIX, and Windows to capture change data to its log files.

2. Change data extraction. PowerExchange, in conjunction with PowerCenter, extracts captured change datafor movement to the target.

3. Change data apply. PowerExchange, in conjunction with PowerCenter, transforms and applies the extractedchange data to target tables or files.

2

Page 15: Implement CDC

Change Data CapturePowerExchange can capture change data directly from DB2 recovery logs, Microsoft SQL Server distributiondatabases, or Oracle redo logs. If you use the offloading feature in combination with the PowerExchange Loggerfor Linux, UNIX, and Windows, a PowerExchange Logger process can log change data from data sources on an i5/OS or z/OS system.

If you do not retain database log files long enough for CDC processing to complete, use the PowerExchangeLogger for Linux, UNIX, and Windows. The PowerExchange Logger writes change data to its log files.PowerExchange can then extract change data from the PowerExchange Logger log files rather than from thedatabase log files.

For each source table, you must define a capture registration in the PowerExchange Navigator. The captureregistration provides metadata for the columns that are selected for change capture.

PowerExchange captures changes that result from successful SQL INSERT, DELETE, and UPDATE operations.Depending on the statement type, PowerExchange captures the following data images:

¨ For INSERTS, PowerExchange captures after images only. An after image reflects a row just after an INSERToperation. PowerExchange passes these changes as INSERTs to PowerCenter.

¨ For DELETEs, PowerExchange captures before images only. A before image reflects a row just prior to the lastDELETE operation. PowerExchange passes these changes as DELETEs to PowerCenter.

¨ For UPDATEs, PowerExchange captures the following image types:

- Both before and after images if you select an image type of “BA” in the CDC application connection attributesfor PowerCenter. PowerExchange passes an UPDATE to PowerCenter as a DELETE of the before-imagedata followed by an INSERT of the after-image data.

- After images if you select an image type of “AI” in the CDC application connection attributes. PowerExchangepasses only the after-image data for an updated row, unless you also request before-image data.PowerExchange passes an UPDATE to PowerCenter as an UPDATE or INSERT.

Change Data Extraction and ApplyPowerExchange works with PowerCenter to extract change data and write it to one or more target tables or files.The targets can be on the same system as the source or on a different system.

When you create a capture registration for a source table, the PowerExchange Navigator generates acorresponding extraction map and application name for the extraction. The extraction map describes the columnsfor which to extract change data. You can edit the extraction map to remove columns from extraction processing.Also, you can create alternative extraction maps, each for a subset of the columns that are registered for capture.For DB2 for Linux, UNIX, and Windows data sources only, you can create a data map if you have user-defined ormulti-field columns for which you want to manipulate data before loading it to the target.

From PowerCenter, you run a CDC workflow and session that extracts and applies change data. To define a datasource in PowerCenter, you can import the extraction map or import the table definition from the source databasethrough PowerExchange. For DB2 only, you can import a DB2 data map instead of the extraction map. In mostsituations, Informatica recommends that you import the extraction map.

Also, you must define a mapping, session, and workflow in PowerCenter. Optionally, you can includetransformations in the mapping to manipulate the change data. When you define a CDC session, you must specifya connection type. The connection type determines the extraction mode and access method that PowerExchangeuses to extract data.

To extract change data directly from source DB2 or Oracle log files or SQL Server distribution database, you mustuse the real-time extraction mode. To extract change data from PowerExchange Logger log files, you can use

PowerExchange CDC Overview 3

Page 16: Implement CDC

either the batch extraction mode or continuous extraction mode. The following table describes these extractionmodes:

Extraction Mode Description

Real-time extraction mode Reads change data directly from the database log files in near real time, on an ongoingbasis. When the PowerExchange Listener receives an extraction request, it pulls thechange data from the log files and transmits the data to PowerCenter for extraction andapply processing. This mode provides the lowest latency for change data extraction butpotentially the highest impact on system resources.

Batch extraction mode Reads change data from PowerExchange Logger log files that are in a closed state whenan extraction request is made. After processing the log files, the extraction request ends.This mode provides the highest latency for change data extraction but minimizes theimpact on system resources.

Continuous extraction mode Reads change data continuously from open and closed PowerExchange Logger log files innear real time. This mode also minimizes database log accesses and the log retentionperiod that is required for CDC.

To initiate change data extraction and apply processing, run a CDC workflow and session from PowerCenter.

During extraction processing, PowerExchange extracts changes from the change stream in chronological orderbased on the unit of work (UOW) end time. PowerExchange passes only the successfully committed changes toPowerCenter for processing. PowerExchange does not pass ABORTed or UNDO changes. If you are capturingchanges from DB2 recovery logs or Oracle redo logs, changes that were contiguous in the change stream mightnot be contiguous in the reconstructed UOW that PowerExchange passes to PowerCenter.

To properly resume extraction processing, PowerExchange maintains restart tokens for each source table. Restarttokens are used for all extraction modes. To generate current restart tokens, you can use the PowerExchangeNavigator, the special override statement in the restart token file, or the DTLUAPPL utility.

RELATED TOPICS:¨ “Introduction to Change Data Extraction” on page 105

PowerExchange CDC Data SourcesPowerExchange can capture change data from DB2 and Oracle data sources on Linux, UNIX, or Windowssystems. PowerExchange can also capture change data from Microsoft SQL Server data sources on Windows.

In the PowerExchange Navigator, you must create a capture registration for each source table. ThePowerExchange Navigator generates a corresponding extraction map and application name. You can import theextraction map into PowerCenter to define the source for extraction and apply processing.

If you use the PowerExchange Logger for Linux, UNIX, and Windows in combination with the offloading feature,you can also process change data from data sources on i5/OS or z/OS.

DB2 for Linux, UNIX, and Windows Data SourcesPowerExchange captures change data from DB2 recovery log files for the database that contains your sourcetables. For CDC to work, archive logging must be active for the database. Also, you must create aPowerExchange capture catalog table in the source database. The capture catalog table stores information aboutthe source tables and columns, including DB2 log positioning information.

4 Chapter 1: Change Data Capture Introduction

Page 17: Implement CDC

If you have a source table with user-defined fields or multi-field columns, you can create a data map to manipulatethese fields with expressions. For example, you might want to create data map to manipulate packed data in aCHAR column. If you create a data map, you must still create a capture registration and merge the data map withthe extraction map that is generated for the capture registration.

RELATED TOPICS:¨ “DB2 for Linux, UNIX, and Windows Change Data Capture” on page 56

Microsoft SQL Server Data SourcesPowerExchange CDC uses Microsoft SQL Server transactional replication technology to access data in SQLServer distribution databases. For CDC to work, you must enable SQL Server Replication on the system fromwhich change data is captured. Also, verify that each source table in the distribution database has a primary key. Ifyour database has a high volume of change activity, use a distributed server as the host of the distributiondatabase. When the extraction process runs, the Microsoft SQL Server Agent must also be running.

RELATED TOPICS:¨ “Microsoft SQL Server Change Data Capture” on page 70

Oracle Data SourcesPowerExchange uses Oracle LogMiner to read change data from Oracle archive logs. Because PowerExchangereads data from Oracle archive logs, you must run Oracle in ARCHIVELOG mode. Also, PowerExchange requiresa copy of the Oracle online catalog in the archive logs to determine restart points for change data extractionprocessing.

If you have Oracle Version 10g Release 2 or later, PowerExchange supports CDC in Oracle Real ApplicationCluster (RAC) environments. In a RAC, the Oracle archive logs for all Oracle instances in the RAC must reside onshared disk storage for PowerExchange to access them.

RELATED TOPICS:¨ “Oracle Change Data Capture with Oracle LogMiner” on page 80

i5/OS and z/OS Data Sources with Offload ProcessingYou can use CDC offload processing in combination with the PowerExchange Logger for Linux, UNIX, andWindows to log change data from data sources on systems other than the system where the PowerExchangeLogger runs.

With offload processing, a PowerExchange Logger process on Linux, UNIX, and Windows can log change datafrom i5/OS and z/OS systems as well as from other Linux, UNIX, or Windows systems. For example, aPowerExchange Logger process can log change data from a DB2 instance on z/OS.

RELATED TOPICS:¨ “CDC Offload and Multithreaded Processing” on page 159

PowerExchange CDC Data Sources 5

Page 18: Implement CDC

PowerExchange CDC ComponentsThe following PowerExchange components are used for change data capture (CDC):

¨ PowerExchange Listener. Required, unless PowerExchange and the PowerCenter Integration Service areinstalled on the same physical machine.

¨ PowerExchange Logger for Linux, UNIX, and Windows. Optional.

¨ PowerExchange Navigator. Required.

Note: The PowerExchange Condense component has been deprecated in PowerExchange Version 8.6.1.Although PowerExchange 8.6.1 tolerates continued use of PowerExchange Condense for partial condenseprocessing, Informatica recommends that you migrate to the PowerExchange Logger. The PowerExchange Loggerreplaces PowerExchange Condense. Future PowerExchange versions will require migration to thePowerExchange Logger.

PowerExchange ListenerThe PowerExchange Listener manages capture registrations and extraction maps for all CDC data sources. It alsomanages data maps if you create any for DB2 for Linux, UNIX, and Windows tables. The PowerExchange Listenermaintains this information in the following files:

¨ CCT file for capture registrations

¨ CAMAPS directory for extraction maps

¨ DATAMAPS directory for DB2 data maps

The PowerExchange Listener also handles PowerCenter extraction requests for both change data replication andbulk data movement.

When you create, edit, or delete capture registrations or extraction maps in the PowerExchange Navigator, thePowerExchange Navigator uses the location value in the registration group and extraction group to contact thePowerExchange Listener. This location corresponds to a NODE statement in the dbmover.cfg file. For example,when you open a registration group for a RDBMS instance, the PowerExchange Navigator communicates with thePowerExchange Listener to get all capture registrations defined for that instance.

A PowerExchange Listener is not required if PowerExchange and the PowerCenter Integration Service run on thesame physical machine.

RELATED TOPICS:¨ “PowerExchange Listener” on page 12

PowerExchange Logger for Linux, UNIX, and WindowsThe PowerExchange Logger for Linux, UNIX, or Windows captures change data from DB2 recovery logs, Oracleredo logs, or a SQL Server distribution database and writes that data to PowerExchange Logger log files. Use ofthe PowerExchange Logger is optional. To use the PowerExchange Logger, run one PowerExchange Loggerprocess for each database type and instance. The PowerExchange Logger writes all successful UOWs inchronological order based on end time to its log files. This practice maintains transactional integrity. You canextract the change data from the PowerExchange Logger log files in either batch or continuous mode.

6 Chapter 1: Change Data Capture Introduction

Page 19: Implement CDC

Benefits of the PowerExchange Logger include:

¨ Source database overhead is reduced because PowerExchange makes fewer accesses to the source log filesor database to read change data. For Oracle, this overhead reduction can be significant. The PowerExchangeLogger can use only one Oracle LogMiner session to read change data for all extractions that process anOracle instance.

¨ You do not need to retain the source RDBMS log files longer than normal for CDC.

¨ PowerExchange does not need to reposition its point in the DB2 or Oracle logs from which to resume readingdata. This feature can significantly reduce restart times.

Tip: For Oracle data sources, Informatica recommends that you run the PowerExchange Logger rather than usereal-time extraction mode. Use continuous extraction mode for near-real-time access to change data. Thisconfiguration enables PowerExchange to use one Oracle LogMiner session for all extractions that process anOracle instance. Multiple concurrent LogMiner sessions can significantly degrade performance on the machinewhere CDC sessions run, including the performance of real-time extractions.

RELATED TOPICS:¨ “PowerExchange Logger for Linux, UNIX, and Windows” on page 19

PowerExchange NavigatorThe PowerExchange Navigator is the graphical user interface from which you define and manage captureregistrations, extraction maps, and data maps.

You must define a capture registration for each source table. The corresponding extraction map is automaticallygenerated. For DB2 sources, you can also define data maps if you need to perform column-level processing, suchas adding user-defined columns and building expressions to populate them. You can import the extraction mapsinto PowerCenter so that they can be used for moving change data to the target.

Note: If the PowerExchange Navigator is not installed on the same machine as a Microsoft SQL Server datasource, you must install the SQL Server client software on the PowerExchange Navigator machine. The clientsoftware is required because PowerExchange uses SQL Server services when creating capture registrations. Forthe same situation with DB2 and Oracle data sources, you do not need the RDBMS client software. Instead, fromthe PowerExchange Navigator, you can point to the PowerExchange Listener on the machine that contains thesource DB2 database or Oracle instance.

PowerExchange Integration with PowerCenterPowerCenter provides transformation and data cleansing functions that you can use in CDC sessions. Aftercapturing change data, use PowerCenter in conjunction with PowerExchange to extract and transform the changedata and then apply it to one or more targets.

To integrate PowerExchange with PowerCenter, use either the PowerExchange Client for PowerCenter (PWXPC)or the PowerExchange ODBC drivers in PowerCenter. Informatica recommends that you use PWXPC. PWXPCprovides more functionality, better performance, and better recovery and restart capabilities.

Note: This guide assumes that you use PWXPC.

For more information about PWXPC and the PowerExchange ODBC drivers, see PowerExchange Interfaces forPowerCenter.

PowerExchange Integration with PowerCenter 7

Page 20: Implement CDC

PowerExchange CDC ArchitectureThe PowerExchange CDC architecture is sufficiently flexible to handle many change data replication scenarios.You can use PowerExchange in conjunction with PowerCenter to replicate change data from multiple sources ofthe same RDBMS type to multiple targets of different types in a single session.

The targets can be tables or files on the same system as the source or on other systems. The PowerCenterIntegration Service can write data to tables in some RDBMSs as well as to flat files and XML files. If you installedPowerExchange or PowerExchange (PowerCenter Connect) products that provide connectivity to additionalnonrelational or relational targets, you can also load data to those targets, for example, DB2 for z/OS tables,VSAM data sets, IMS segments, or WebSphere MQ.

You can run multiple instances of PowerExchange CDC components on a single system. For example, you mightwant to run a separate PowerExchange Logger for each source RDBMS to create separate sets of log files foreach RDBMS type.

The following figure shows a simple CDC configuration that uses real-time extraction mode to access change datadirectly from the change stream without the PowerExchange Logger:

In this real-time configuration, PowerExchange CDC uses the CAPXRT access method to capture change datafrom a SQL Server distribution database, DB2 recovery logs, and Oracle redo logs. When an extraction requestruns, PowerCenter connects to the PowerExchange Call Level Interface (SCLI) to contact the PowerExchangeListener. The change data is passed to the SCLI and then to the PWXPC CDC Real Time reader. In this manner,the PowerCenter extraction session pulls the change data that PowerExchange captured. After the PWXPC readerreads the change data, PowerCenter uses the mapping and workflow that you created to transform the data andload it to the target. With this configuration, you can replicate change data from multiple sources in the samedatabase or instance to multiple target tables in a single extraction process.

Note: The Oracle UOW Cleanser reconstruct UOWs from redo logs into complete and consecutive UOWs that arein chronological order by end time. For DB2 and SQL Server, PowerExchange incorporates the UOW Cleanserfunction into the consumer API (CAPI) for extracting changes from the data source.

8 Chapter 1: Change Data Capture Introduction

vpfeifle
Line
vpfeifle
Line
Page 21: Implement CDC

The following figure shows a CDC configuration that uses the PowerExchange Logger in both batch extractionmode and continuous extraction mode:

In this configuration, the PowerExchange Logger captures change data from the change stream for SQL Server,Oracle, and DB2 tables and writes that data to its log files. After the data is in the PowerExchange log files, thesource RDBMS log files can be deleted, if necessary. When an extraction session runs, PWXPC contacts thePowerExchange Listener. The PowerExchange Listener reads the PowerExchange Logger log files and calls theSCLI on the PowerCenter Integration Service machine to transmit the change data to PowerCenter.

For some source tables, PWXPC extracts change data from the PowerExchange Logger log files in batchextraction mode with the CAPX access method. In this mode, the extraction session stops after it completesprocessing the log files. For other source tables, PWXPC extracts change data in continuous mode with theCAPXRT access method. In this mode, the extraction session extracts change data on an ongoing basis. InPowerCenter, you can create one source definition and one mapping that covers both extraction modes. However,batch and continuous extractions must run as separate sessions. For a batch extraction session, use a PWX CDCChange application connection. For a continuous extraction session, use a PWX CDC Real Time applicationconnection. For example, you can run batch extractions to replicate change data to targets that need to besynchronized periodically, and run continuous extractions to replicate change data to targets that need to besynchronized in near real time. Batch and continuous extraction sessions can run concurrently.

PowerExchange CDC Architecture 9

vpfeifle
Line
Page 22: Implement CDC

Summary of CDC Implementation TasksAfter you install PowerExchange, you can configure change data capture and extraction, materialize targets, andstart extraction processing. The following table identifies the tasks for implementing change data capture andextraction processing for a data source:

Step Task References

Configure and start PowerExchange CDC components

1 Configure parameters in the dbmover.cfg file for thePowerExchange Listener.

“Customizing the dbmover.cfg File for CDC” onpage 12

2 Start the PowerExchange Listener on the machine with thesource database.

“Starting the PowerExchange Listener” on page16

3 Perform RDBMS-specific configuration tasks for CDC. - Chapter 4, “DB2 for Linux, UNIX, and WindowsChange Data Capture” on page 56

- Chapter 5, “Microsoft SQL Server Change DataCapture” on page 70

- Chapter 6, “Oracle Change Data Capture withOracle LogMiner” on page 80

4 (Optional) Configure the PowerExchange Logger. “Configuring the PowerExchange Logger” on page27

5 (Optional) Start the PowerExchange Logger. “Starting the PowerExchange Logger” on page 47

Define data sources for CDC

6 From the PowerExchange Navigator, define and activatecapture registrations and extraction maps for the datasources.

PowerExchange Navigator Guide

7 For DB2 sources that have user-defined or multi-fieldcolumns that you want to manipulate, create DB2 datamaps.

PowerExchange Navigator Guide

Materialize targets and start capturing changes

8 Materialize the target from the source. PowerExchange Bulk Data Movement Guide

9 Establish a start point for the extraction. “Restart Tokens and the Restart Token File” onpage 109

Extract and apply change data

10 From PowerCenter, configure mappings, workflows,connections, and sessions. Then run the workflow.

- PowerExchange Interfaces for PowerCenter- PowerCenter Designer Guide- PowerCenter Workflow Basics Guide

10 Chapter 1: Change Data Capture Introduction

Page 23: Implement CDC

Part II: PowerExchange CDCComponents

This part contains the following chapters:

¨ PowerExchange Listener, 12

¨ PowerExchange Logger for Linux, UNIX, and Windows, 19

11

Page 24: Implement CDC

C H A P T E R 2

PowerExchange ListenerThis chapter includes the following topics:

¨ PowerExchange Listener Overview, 12

¨ Customizing the dbmover.cfg File for CDC, 12

¨ Starting the PowerExchange Listener, 16

¨ Stopping the PowerExchange Listener, 16

¨ Displaying Active PowerExchange Listener Tasks, 17

PowerExchange Listener OverviewIn a change data capture (CDC) environment, a PowerExchange Listener can provide some or all of the followingservices:

¨ Store and manage capture registrations, extraction maps, and data maps for CDC data sources.

¨ Provide captured change data to PowerCenter when you run a PowerCenter CDC session.

¨ Provide captured change data or source table data to the PowerExchange Navigator when you perform adatabase row test of an extraction map or a data map.

¨ Interact with other PowerExchange Listeners on other nodes to facilitate communication among thePowerExchange Navigator, PowerCenter Integration Service, data sources, and any system to whichPowerExchange processing is offloaded.

Customizing the dbmover.cfg File for CDCYou must configure the parameters in the dbmover.cfg file that pertain to CDC processing. This topic describesthe key CDC parameters that are common to the PowerExchange source RDBMSs on Linux, UNIX, or Windows.

The PowerExchange Listener uses these dbmover.cfg parameters to perform the following functions:

¨ Connect to source RDBMS databases and objects to capture change data.

¨ Determine the directory in which to store capture registrations, extraction maps, and PowerExchange Loggerlog files.

¨ Connect to the system with the PowerExchange Logger log files to extract change data.

12

Page 25: Implement CDC

The following table describes the key dbmover.cfg statements that are required for CDC:

Statement Description

CAPI_CONNECTION A named set of parameters that the PowerExchange Consumer API (CAPI) uses to connect to thechange stream and control extraction processing. A CAPI connection is specific to a data sourcetype. You can define up to eight CAPI_CONNECTION statements in a DBMOVER configurationfile for the same data source type or different data source types. Use the CAPI_SRC_DFLTparameter to indicate a default CAPI_CONNECTION for a data source type.PowerExchange requires a connection statement for real-time extraction mode and continuousextraction mode.For real-time extraction, PowerExchange uses a source-specific type of CAPI_CONNECTIONstatement, such as MSQL, ORCL, and UDB. For more information, see the section for yoursource type.For continuous extraction from PowerExchange Logger log files, PowerExchange CDC uses theCAPX CAPI_CONNECTION statement.

CAPI_SRC_DFLT The CAPI_CONNECTION statement that PowerExchange uses by default for a specific datasource type when no CAPI connection override is supplied. If you define multipleCAPI_CONNECTION statements for a data source, you can identify one of them as the default.Syntax is:CAPI_SRC_DFLT=(source_type,capi_connection_name)

Where:- source_type is one of the following source database types: MSS for Microsoft SQL Server,

ORA for Oracle, or UDB for DB2 for Linux, UNIX, and Windows.- capi_connection_name is the unique name of the CAPI_CONNECTION statement that you

want to use as the default statement.You can specify a CAPI_SRC_DFLT statement for each source database type.You can override the default CAPI_CONNECTION with another defined CAPI_CONNECTION inmultiple ways.

CAPT_PATH Path to the local directory that stores the following files for CDC:- CCT file, which contains capture registrations- CDEP file, which contains application names for PowerCenter extractions that use ODBC

connections, if any- CDCT file, which contains information about PowerExchange Logger log files if you use the

PowerExchange LoggerThis directory can be a directory that you created specifically for these files or another existingdirectory. Informatica recommends that you use a unique directory name to separate these CDCobjects from the PowerExchange code. This practice makes migrating to a new PowerExchangeversion easier.Default is the PowerExchange installation directory.

CAPT_XTRA Path to the local directory that stores extraction maps.This directory can be a directory that you created specifically for these files or another existingdirectory. Informatica recommends that you use a unique directory name to separate these CDCobjects from the PowerExchange code. This practice makes migrating to a new PowerExchangeversion easier.Default is the PowerExchange installation directory.

RELATED TOPICS:¨ “DB2 for Linux, UNIX, and Windows Change Data Capture” on page 56

¨ “Microsoft SQL Server Change Data Capture” on page 70

¨ “Oracle Change Data Capture with Oracle LogMiner” on page 80

¨ “CAPX CAPI_CONNECTION Parameters” on page 14

Customizing the dbmover.cfg File for CDC 13

Page 26: Implement CDC

CAPI_CONNECTION StatementsPowerExchange requires that you define CAPI_CONNECTION statements in the dbmover.cfg file on any Linux,UNIX, or Windows system where PowerExchange captures or extracts change data. PowerExchange uses theparameters that you specify in the CAPI_CONNECTION statements to connect to the change stream and tocustomize capture and extraction processing.

For each data source, you must define one of the following source-specific types of CAPI_CONNECTIONstatements:

¨ For Microsoft SQL Server, an MSQL CAPI_CONNECTION

¨ For Oracle, an ORCL CAPI_CONNECTION and a UOW CAPI_CONNECTION for the UOW Cleanser

¨ For DB2 for Linux, UNIX, and Windows, a UDB CAPI_CONNECTION

If you use continuous extraction mode to extract change data from PowerExchange Logger log files, you must alsodefine a CAPX CAPI_CONNECTION statement.

You can specify up to eight CAPI_CONNECTION statements in a dbmover.cfg file. You can identify one of thestatements as the overall default. If you define multiple CAPI_CONNECTION statements for the same source type,you can identify one of these statements as the source-specific default. In addition to or in lieu of defaults, you candefine specific CAPI_CONNECTION overrides in multiple ways. The order of precedence that PowerExchangeuses to determine which CAPI_CONNECTION statement to use is described in the PowerExchange ReferenceManual.

Note: When you extract change data, PowerExchange uses CAPI_CONNECTION statements to connect to thechange stream for the data source. To perform database row tests for data sources that are defined by captureregistrations local to the PowerExchange Navigator, you must specify the appropriate CAPI_CONNECTIONstatements on the PowerExchange Navigator machine. Otherwise, you do not need to specifyCAPI_CONNECTION statements to perform database row tests.

RELATED TOPICS:¨ “CAPX CAPI_CONNECTION Parameters” on page 14

¨ “DB2 for Linux, UNIX, and Windows CAPI_CONNECTION Parameters” on page 62

¨ “Microsoft SQL Server CAPI_CONNECTION Parameters” on page 76

¨ “ORCL CAPI_CONNECTION Statement” on page 92

¨ “UOWC CAPI_CONNECTION Statement” on page 99

CAPX CAPI_CONNECTION ParametersThe CAPX CAPI_CONNECTION statement specifies the Consumer API (CAPI) parameters needed for continuousextraction of change data from PowerExchange Logger for Linux, UNIX, and Windows log files.

OperatingSystems:

Linux, UNIX, andWindows

Required: Yes for continuousextraction mode

Syntax:

CAPI_CONNECTION=( [DLLTRACE=trace_id,] NAME=name, [TRACE=trace,] TYPE=(CAPX, DFLTINST=collection_id, [FILEWAIT=seconds,] [RSTRADV=seconds]

14 Chapter 2: PowerExchange Listener

Page 27: Implement CDC

))

Parameters:

Enter the following parameters:

DLLTRACE=trace_id

Optional. User-defined name of the TRACE statement that activates internal DLL tracing for this CAPI.Specify this parameter only at the direction of Informatica Global Customer Support.

NAME=name

Required. Unique user-defined name for this CAPI_CONNECTION statement.

Maximum length is eight alphanumeric characters.

TRACE=trace

Optional. User-defined name of the TRACE statement that activates the common CAPI tracing. Specify thisparameter only at the direction of Informatica Global Customer Support.

TYPE=(CAPX, ... )

Required. Type of CAPI_CONNECTION statement. For continuous extraction mode, this value must be CAPX.

DFLTINST=collection_id

Required. A source identifier, sometimes called the instance name or collection identifier, that is definedin capture registrations. This value must match the instance or database name that is displayed in theResource Inspector of the PowerExchange Navigator for the registration group that contains the captureregistrations.

Maximum length is eight alphanumeric characters.

FILEWAIT=seconds

Optional. Time interval, in seconds, that PowerExchange waits before checking for new PowerExchangeLogger log files.

Valid values are from 1 through 86400.

Default is 1.

RSTRADV=nnnnn

Time interval, in seconds, that PowerExchange waits before advancing restart and sequence tokens for aregistered data source during periods when UOWs do not include any changes of interest for the datasource. When the wait interval expires, PowerExchange returns the next committed "empty UOW," whichincludes only updated restart information.

The wait interval is reset to 0 when PowerExchange completes processing a UOW that includes changesof interest or returns an empty UOW because the wait interval expired without any changes of interesthaving been received.

For example, if you specify 5, PowerExchange waits 5 seconds after it completes processing the lastUOW or after the previous wait interval expires. Then PowerExchange returns the next committed emptyUOW that includes the updated restart information and resets the wait interval to 0.

If RSTRADV is not specified, PowerExchange does not advance restart and sequence tokens for aregistered source during periods when no changes of interest are received. In this case, whenPowerExchange warm starts, it reads all changes, including those not of interest for CDC, from therestart point.

Valid values are 0 through 86400. No default is provided.

Customizing the dbmover.cfg File for CDC 15

Page 28: Implement CDC

Warning: A value of 0 can degrade performance because PowerExchange returns an empty UOW aftereach UOW processed.

Starting the PowerExchange ListenerTo start the PowerExchange Listener, you can run the dtllst program or use other system-specific methods.

On a Linux or UNIX system, use one of the following methods:

¨ Enter dtllst at the command line to run the PowerExchange Listener in foreground mode:dtllst node1 [config=directory/myconfig_file] [license=directory/mylicense_key_file]

Include the optional config and license parameters if you want to specify configuration and license key files thatoverride the original dbmover.cfg and license.key files.

You can add an ampersand (&) at the end to run the PowerExchange Listener in background mode and addthe prefix "nohup" at the beginning to run the PowerExchange Listener persistently:

nohup dtllst node1 [config=directory/myconfig_file] [license=directory/mylicense_key_file] &¨ Run the startlst script, which was installed with PowerExchange. This script deletes the detail.log file and then

starts the PowerExchange Listener.

On a Windows system, use one of the following methods:

¨ Run the PowerExchange Listener as a Windows service, which is the usual practice. To start aPowerExchange Listener service from the Windows Start menu, click Start > Programs > InformaticaPowerExchange > Start PowerExchange Listener. Alternatively, use the dtllstsi program to enter the startcommand from a Windows command prompt:

dtllstsi start “service_name” ¨ Enter dtllst. The syntax is the same as for Linux and UNIX except that the & and nohup operands are not

supported. Your product license must allow this manual mode of PowerExchange Listener operation.

Note: You cannot start the PowerExchange Listener by using the pwxcmd program.

Stopping the PowerExchange ListenerTo stop the PowerExchange Listener, use the CLOSE or CLOSE FORCE command. To stop activePowerExchange Listener tasks, use the STOPTASK command.

16 Chapter 2: PowerExchange Listener

Page 29: Implement CDC

The following table describes these commands and the syntax for issuing each command from the command lineagainst a PowerExchange Listener task that is running in foreground mode:

Command Description Command Line Syntax

CLOSE Stops the PowerExchange Listener after all of thefollowing subtasks complete:- CDC subtasks, which stop at the next commit of

a unit of work (UOW)- Bulk data movement subtasks- PowerExchange Listener subtasks

On Linux, UNIX, or Windows:C

CLOSE FORCE Forces the cancellation of all user subtasks andstops the PowerExchange Listener.PowerExchange waits 30 seconds for current usersubtasks on the PowerExchange Listener tocomplete. Then PowerExchange cancels anyremaining user subtasks and stops thePowerExchange Listener. This command is usefulif you have long-running subtasks on thePowerExchange Listener.

On Linux or UNIX:C F

On Windows:CF

STOPTASK Stops a PowerExchange Listener task for aspecific extraction application process.PowerExchange waits to stop the PowerExchangeListener until either the end UOW or committhreshold is reached.

On Linux or UNIX:STOPTASK app_name

On Windows:STOPTASK APPLID=app_name

The app_name is the name of an active changedata extraction process. You can get this namefrom the PWX-00712 messages in thePowerExchange Listener D (DISPLAY ACTIVE)command output.

Alternatively, you can use any of the following methods:

¨ On a Linux, UNIX, or Windows system, use the pwxcmd program to issue the close, closeforce, or stoptaskcommand to a PowerExchange Listener running in foreground or background mode, on the local system or aremote system. You can issue these pwxcmd commands from the command line or include them in scripts orbatch files.

¨ On a Linux or UNIX system, if the PowerExchange Listener is running in background mode, use the standardoperating system commands to find the PowerExchange Listener process ID and then “kill” that process. A “kill”operation is similar to a CLOSE operation.

¨ On a Windows system, if the PowerExchange Listener does not respond to a CLOSE FORCE command, pressCtrl + C once to issue CLOSE or press Ctrl + C twice to issue CLOSE FORCE.

Displaying Active PowerExchange Listener TasksYou can use the DISPLAY ACTIVE command to display information about each active PowerExchange Listenertask that is running in foreground mode on a Linux, UNIX, or Windows system. This information includes the TCP/IP address, port number, application name, access type, and status.

On a Linux, UNIX, or Windows system, enter the following command at the command line on the screen where thePowerExchange Listener task is running in foreground mode:

D

Displaying Active PowerExchange Listener Tasks 17

Page 30: Implement CDC

Alternatively, on a Linux, UNIX, or Windows system, you can issue the pwxcmd listtask command from acommand line, script, or batch file to a PowerExchange Listener running on the local system or a remote system.The pwxcmd listtask command produces the same output as the DISPLAY ACTIVE command.

18 Chapter 2: PowerExchange Listener

Page 31: Implement CDC

C H A P T E R 3

PowerExchange Logger for Linux,UNIX, and Windows

This chapter includes the following topics:

¨ PowerExchange Logger Overview, 19

¨ PowerExchange Logger Tasks, 20

¨ PowerExchange Logger Files, 21

¨ File Switches, 25

¨ PowerExchange Logger Operational Modes, 25

¨ PowerExchange Logger Considerations on Linux and UNIX, 27

¨ Configuring the PowerExchange Logger, 27

¨ Starting the PowerExchange Logger, 47

¨ Managing the PowerExchange Logger, 49

PowerExchange Logger OverviewThe PowerExchange Logger for Linux, UNIX, and Windows captures change data from PowerExchange datasources and write that data to PowerExchange Logger log files. The PowerExchange Logger writes only thesuccessful units of work (UOWs) to its log files, in chronological order based on end time.

When a PowerCenter CDC session runs, it extracts change data from the log files instead of from the changestream.

Note: The PowerExchange Logger for Linux, UNIX, and Windows is similar in function to PowerExchangeCondense on i5/OS or z/OS systems.

The PowerExchange Logger can capture change data from DB2 recovery logs or Oracle redo logs on Linux, UNIX,or Windows, or from a Microsoft SQL Server distribution database on a Windows. If you use the offloading feature,a PowerExchange Logger process on Linux, UNIX, or Windows can also process data from data sources on i5/OSor z/OS systems.

Use the PowerExchange Logger to reduce database overhead due to CDC processing. With the PowerExchangeLogger, PowerExchange accesses the source database fewer times to read change data, which reduces databaseI/O. Also, because change data is extracted from the PowerExchange Logger log files, you often do not need toextend the retention period for source database log files to accommodate CDC processing.

You must run one PowerExchange Logger process for each source type and instance, as defined in a registrationgroup. The PowerExchange Logger runs in continuous mode or batch mode.

19

Page 32: Implement CDC

When you create capture registrations for data sources, including i5/OS and z/OS data sources for whichprocessing is offloaded, set the Condense option to Part. The PowerExchange Logger supports only partialcondense processing. For i5/OS or z/OS data sources, if you set the Condense option to Full in captureregistrations, the PowerExchange Logger ignores the registrations and does not process change data from thosesources.

For each PowerExchange Logger process, you must define a configuration file. PowerExchange provides asample configuration file named pwxccl.cfg. The configuration file contains parameters for controlling thePowerExchange Logger and for identifying the source instance. Use the COLL_END_LOG parameter to controlwhether the PowerExchange Logger runs in continuous mode or batch mode.

When PowerCenter workflow sessions run, you can extract change data from PowerExchange Logger log files inbatch extraction mode or continuous extraction mode. Do not use real-time extraction mode with thePowerExchange Logger.

Tip: For Oracle near-real-time CDC, Informatica recommends that you use the PowerExchange Logger andcontinuous extraction mode. PowerExchange then uses one Oracle LogMiner session for all extractions thatprocess an Oracle instance. If you use real-time extraction mode, without the PowerExchange Logger,PowerExchange starts a separate LogMiner session for each extraction. The use of multiple, concurrent LogMinersessions can significantly degrade the performance on the system where LogMiner runs.

RELATED TOPICS:¨ “PowerExchange Logger Operational Modes” on page 25

¨ “Customizing the PowerExchange Logger Configuration File” on page 28

PowerExchange Logger TasksThe PowerExchange Logger uses a Controller task with Command Handler and Writer subtasks.

These tasks perform the following functions:

Controller task

Loads parameter settings from the PowerExchange Logger pwxccl.cfg configuration file. Reads the cache filefrom the last run to determine if capture registrations have been added or removed, and loads the captureregistrations from the CCT file. After loading this information, the Controller starts the Command Handlersubtask and then the Writer subtask.

Command Handler subtask

Processes PowerExchange Logger commands from various sources, including user stdin and the pwxcmdprogram. If the PROMPT parameter is set to Y in the pwxccl.cfg file, the Command Handler waits for theWriter subtask to initialize before accepting a user command.

Writer subtask

Performs most of the PowerExchange Logger work that uses CPU time. The Writer initializes the CAPI for thesource database, determines the start or restart point in the change stream, reads change data from thechange stream, and writes change data to PowerExchange Logger log files. The Writer also performscheckpoint processing, writes records to the CDCT file during a file switch, deletes expired CDCT records,and rolls back CDCT records when you warm start the PowerExchange Logger from an earlier point in time. Ifthe PROMPT parameter is set to Y in the pwxccl.cfg file, the Writer waits for you to respond to confirmationprompts before proceeding with a cold start or a rollback of CDCT records.

20 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 33: Implement CDC

PowerExchange Logger FilesA PowerExchange Logger process writes information to the CDCT file, checkpoint files, PowerExchange Loggerlog files, and PowerExchange message logs.

It also uses cache files and lock files during processing.

CDCT FileThe PowerExchange Logger stores information about its log files in the CDCT file. When a PowerCenter CDCsession runs in continuous extraction mode or batch extraction mode, the PowerExchange Listener reads theCDCT file to determine the PowerExchange Logger log files from which to extract change data.

The PowerExchange Logger creates the CDCT file in the directory that is specified by the CAPT_PATH statementin the dbmover.cfg file on the system where the PowerExchange Logger runs. If the CAPT_PATH statement is notspecified, the CDCT file is in the directory from which the PowerExchange Logger is invoked.

After a file switch, or the first time the PowerExchange Logger receives change data based on an active captureregistration, the PowerExchange Logger Writer subtask writes keyed records to the CDCT file. These recordscontain information about each closed PowerExchange Logger log file, including the log file name, number ofrecords read, UOW start and end times, whether before images are included, and other control information.

For example, if a log file contains change records for two registration tags, or tables, and you are not using agroup definition file, the following processing occurs:

1. When source data for each table is first received and written to the log file, the Writer subtask writes atemporary record for the log file, which does not include a registration tag name, to the CDCT file. Thistemporary record enables the PowerExchange Logger to retrieve source data for extractions that run incontinuous extraction mode.

2. When a file switch occurs, the Writer subtask writes two keyed records to the CDCT file, one for each of theregistration tags. Each record includes the registration tag name, log file name, and change record count.

3. The PowerExchange Logger then deletes the temporary CDCT records that do not include the registrationtags.

If you use a group definitions file, processing is similar to that in the previous example except that the Writersubtask writes one temporary record without a registration tag for each log file that received source data. You canhave as many temporary records as groups in the group definition file.

Tip: You can use the PWXUCDCT utility to print information about CDCT records, back up and restore the CDCTfile, re-create the CDCT file based on PowerExchange Logger log files if necessary, and delete expired CDCTrecords.

RELATED TOPICS:¨ “Maintaining the PowerExchange Logger CDCT File and Log Files” on page 53

PowerExchange Logger Log FilesThe PowerExchange Logger creates log files for storing change data records when it first encounters changes forsource tables and columns of interest. These source tables and columns must be defined in active captureregistrations.

PowerExchange Logger Files 21

Page 34: Implement CDC

The PowerExchange Logger creates log files based on the EXT_CAPT_MASK parameter in the pwxccl.cfg file.This parameter specifies a path to the directory where log files are stored and a prefix for the log file names. Logfile names have the following format:

path/prefix.CND.CPyymmdd.Thhmmssnnn

Where:

¨ path/prefix is the EXT_CAPT_MASK value.

¨ yymmdd is the date when the file is created.

¨ hhmmss is a 24-hour time when the file is created.

¨ nnn is a generated sequence number, starting at 001, that makes each file name unique.

The log files remain open until a file switch occurs or the PowerExchange Logger shuts down.

When you run a PowerCenter CDC session in continuous extraction mode or batch extraction mode,PowerExchange extracts change data from the PowerExchange Logger log files.

RELATED TOPICS:¨ “Introduction to Change Data Extraction” on page 105

Checkpoint FilesThe PowerExchange Logger creates checkpoint files to store restart tokens and sequence tokens for correctlyresuming CDC processing after a PowerExchange Logger warm start.

The PowerExchange Logger writes information to the checkpoint files each time a file switch occurs or aSHUTDOWN or SHUTCOND command is issued.

Note: Checkpoint files are not used for PowerExchange Logger cold starts.

The PowerExchange Logger creates checkpoint files based on the CHKPT_BASENAME and CHKPT_NUMparameters in the pwxccl.cfg file, as follows:

¨ The CHKPT_BASENAME parameter specifies the path to the directory where checkpoint files are stored and abase file name. Checkpoint file names have the following format:

path/base_name.Vn.ckpWhere:

- path/base_name is the CHKPT_BASENAME value.

- n is a number that the PowerExchange Logger appends to the file name. This number can be a value from 0to (CHKPT_NUM value - 1).

¨ The CHKPT_NUM parameter specifies the number of checkpoint files. At least two checkpoint files are required.

22 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 35: Implement CDC

Checkpoint files are sequential files that have a binary variable-length format. Checkpoint files contain thefollowing types of records:

Checkpoint Record Description

1 Main record that contains the checkpoint timestamp and restart and sequence tokens. Thisinformation is used to determine the restart point in the change stream for a PowerExchangeLogger warm start.

2 Optional. Uncommitted registrations at the end of a PowerExchange Logger log file that did notend on a Commit record.

3 Optional. Names of the PowerExchange Logger log files that were closed.

If you need to relocate a PowerExchange Logger configuration, you can copy the checkpoint files to anothermachine that has the same integer endian format. However, you cannot copy checkpoint files to a machine thatuses a different integer endian format because the integer fields in checkpoint files that define record length areplatform-dependent.

To display information about checkpoint files, you can use the following methods:

¨ Issue the DISPLAY CHECKPOINTS command to display a message that reports the sequence number andtimestamp of the last checkpoint file written.

¨ Use the PWXUCDCT utility REPORT_CHECKPOINTS command to print a report that provides informationabout each checkpoint file, including its timestamp, restart and sequence tokens, reason for the checkpoint,number of expired CDCT records that were deleted, and number of log files to which data was written.

RELATED TOPICS:¨ “Customizing the PowerExchange Logger Configuration File” on page 28

Cache FilesThe PowerExchange Logger creates two identical cache files, one of which is a backup, in theCHKPT_BASENAME directory. The cache files store registration tag names for warm start processing.

When the PowerExchange Logger warm starts, it reads the cache file from its last run to determine if any captureregistrations have been added or removed. If so, the PowerExchange Logger issues message PWX-06119.

Lock FilesDuring initialization, a PowerExchange Logger process creates lock files to prevent other PowerExchange Loggerprocesses from accessing the same CDCT file, checkpoint files, and log files concurrently.

As long as the PowerExchange Logger process holds a lock on the lock files, locking is in effect for the resourcesfor which the lock files were created.

PowerExchange Logger locking works on local disks on Linux, UNIX, or Windows systems. It also works on thefollowing shared file systems on Linux or UNIX systems:

¨ Veritas Storage Foundation™ Cluster File System by Symantec

¨ IBM General Parallel File System

¨ EMC Celerra network-attached storage (NAS) with Network File System (NFS) protocol version 3

¨ NetApp NAS with NFS version 3

PowerExchange Logger Files 23

Page 36: Implement CDC

The PowerExchange Logger creates lock files in the following order:

1. A lock file for the CDCT file for a source instance. The PowerExchange Logger generates the lock file nameand location based on the directory that is specified in the CAPT_PATH parameter of the dbmover.cfg file.

2. A lock file for checkpoint files. The PowerExchange Logger generates the lock file name and location basedon the directory and base file name that are specified in the CHKPT_BASENAME parameter of the pwxccl.cfgfile.

3. One of the following lock files:

¨ If you do not use a group definition file, a lock file for PowerExchange Logger log files. ThePowerExchange Logger generates the lock file name and location based the directory and file-name prefixthat are specified in the EXT_CAPT_MASK parameter of the pwxccl.cfg file.

¨ If you use a group definition file, a lock file for each set of the PowerExchange Logger log files that isdefined by the GROUP statements in the group definition file. The PowerExchange Logger generates thelock file names and locations based on the external_capture_mask parameter in each GROUP statement.In this case, the PowerExchange Logger ignores the EXT_CAPT_MASK parameter in the pwxccl.cfg filewhen creating lock files and processing log files.

Lock file names end with _lockfile.lck. For example, a lock file for the CDCT file could have the nameCDCT_oracoll1_lockfile.lck.

When the PowerExchange Logger process ends, it unlocks the lock files to enable other PowerExchange Loggerprocesses to access the previously locked resources.

To identify a PowerExchange Logger process that holds a lock, look up the process ID (PID) in the Task Manageron a Windows system or issue the ps command on a UNIX or Linux system.

Also, the PowerExchange Logger writes messages to the PowerExchange message log that indicate the lockingstatus. Look for the following key messages:

¨ To verify that lock files are created, look for PWX-25802 messages, such as:PWX-25802 Process pwxccl.exe pid 5428 locked file C:\capture\captpath\CDCT_instance_lockfile.lck

¨ To verify that lock files are unlocked, look for PWX-25803 messages, such as:PWX-25803 Process pwxccl.exe pid 5428 unlocked file C:\capture\extcapt\loggerfiles_lockfile.lck

¨ If the PowerExchange Logger process cannot find the lock file that it needs to access some resources, it writesmessage PWX-25800:

PWX-25800 Could not find lock file file_name¨ If a lock file is locked by another process, the PowerExchange Logger process writes some or all of the

following messages, depending on if it can acquire a lock before the maximum retry interval that is specified inPWX-25814 elapses:

PWX-25804 Error trying to lock PowerExchange Logger filesPWX-25811 File file_name is locked by process process_name pid process_id on host host_name date date time timePWX-25812 File file_name is locked by pid process_id start offset length bytesPWX-25813 No information is available on process which locked file file_namePWX-25814 Trying to lock file file_name until number seconds elapsesPWX-25815 File file_name is locked by another process and no more waiting is allowed.

If a PowerExchange Logger process ends abnormally with message PWX-25815 and return code 25815, try todetermine the status of the other PowerExchange Logger process that is holding the lock. This other process isidentified in message PWX-25811. For example, the other process might not have completely shut down, or bothprocesses might be trying to use the same files because of an error in their pwxccl.cfg configuration files.

Message Log FilesThe PowerExchange Logger writes messages to the PowerExchange message log file.

24 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 37: Implement CDC

By default, on Linux, UNIX, and Windows, this file is named detail.log and is located in the working directory wherethe PowerExchange Logger process runs. However, you can optionally specify another directory forPowerExchange message log files. You can also enable the use of alternative log files.

To specify a unique directory for PowerExchange message log files, include the LOGPATH parameter in thedbmover.cfg file. Use of this parameter can help you find the PowerExchange message log files more easily.

Also, you can implement alternative logging by specifying the TRACING statement in the dbmover.cfg file. Whenalternative logging is enabled, PowerExchange creates a set of alternative log files for each PowerExchangeprocess, including each PowerExchange Logger process, in a separate directory. When an alternative log filebecomes full, PowerExchange switches to another alternative log file. This automatic rotation of message log filesprevents out-of-space conditions. Also, PowerExchange buffers messages before writing them to the alternativelog files on disk at a specific flush interval. This mode of writing messages can reduce I/O activity on thealternative log files.

File SwitchesWhen running in continuous mode, the PowerExchange Logger periodically closes its open log files if they containdata and then opens a new set of log files. This process is called a file switch.

The PowerExchange Logger automatically performs a file switch when the criteria in the following parameters ofthe pwxccl.cfg file are met:

¨ FILE_SWITCH_CRIT

¨ FILE_SWITCH_MIN

¨ FILE_SWITCH_VAL

If the open log files do not contain data when the file-switch criteria in these parameters are met, the file switchdoes not occur. The PowerExchange Logger waits until the file-switch criteria are met again. If the files still do notcontain data, the PowerExchange Logger continues to check the log files at set intervals. Only when the log filescontain data does the file switch occur.

Also, you can force a file switch by entering the fileswitch command from the command line. Alternatively, onLinux, UNIX, or Windows, you can send a pwxcmd fileswitch command to a PowerExchange Logger processrunning on the local system or a remote system.

RELATED TOPICS:¨ “Configuring the PowerExchange Logger” on page 27

PowerExchange Logger Operational ModesA PowerExchange Logger process can operate in continuous mode or batch mode.

To set the operational mode, use the COLL_END_LOG parameter in the pwxccl.cfg file.

RELATED TOPICS:¨ “File Switches” on page 25

¨ “Customizing the PowerExchange Logger Configuration File” on page 28

¨ “Extraction Modes” on page 106

File Switches 25

Page 38: Implement CDC

Continuous ModeIn continuous mode, the PowerExchange Logger process runs continuously until you manually stop it.

Consider using continuous mode in the following situations:

¨ You have a database with a high level of change activity that occurs continuously.

¨ You have a database with intermittent activity that occurs at unpredictable intervals.

¨ You want to avoid the overhead of scheduling PowerExchange Logger runs.

¨ You cannot restart the PowerExchange Logger process often enough to keep up with the change volume.

To enable continuous mode, set the COLL_END_LOG parameter to 0.

In continuous mode, each time the Writer subtask completes a logging cycle, the PowerExchange Logger processis temporarily suspended. The next cycle is triggered by any of the following events:

¨ The wait interval that is defined in the NO_DATA_WAIT parameter of the pwxccl.cfg file elapses.

¨ The CONDENSE command is manually entered at the command line or with the pwxcmd program.

¨ The FILESWITCH command is manually entered at the command line or with the pwxcmd program.

The PowerExchange Logger process continues to run until you enter the SHUTDOWN or SHUTCOND command.To prevent log files from becoming too large, the PowerExchange Logger process periodically performs a fileswitch. Files that are too large can extend restart times for CDC sessions that run in continuous extraction modeor batch extraction mode.

You can use the NO_DATA_WAIT2 parameter in the pwxccl.cfg file to prevent the PowerExchange Logger fromconsuming too much CPU time when PowerExchange is not receiving changes. For example, if you set theNO_DATA_WAIT2 parameter to 30 seconds, the PowerExchange Logger sleeps for 30 seconds, provided that noupdates are received, and then performs another processing cycle. However, a large NO_DATA_WAIT2 value candelay processing of a SHUTDOWN command. If you need to reduce the amount of time that the PowerExchangeLogger sleeps on a quiet system, you can adjust the FILE_FLUSH_VAL, FILE_SWITCH_VAL, andFILE_SWITCH_MIN parameters.

Run the PowerExchange Logger in continuous mode unless you have a specific reason to use batch mode.

Tip: On a Linux or UNIX system, you can run a continuous PowerExchange Logger process in background mode.Then use the pwxcmd program to send commands to the PowerExchange Logger process that is running inbackground mode.

When you run the PowerExchange Logger in continuous mode, you can use either continuous or batch extractionmode for workflows that extract change data from the PowerExchange Logger log files.

Batch ModeThe PowerExchange Logger process shuts down after the number of seconds in the NO_DATA_WAIT2 parameterof the pwxccl.cfg file elapse and no data has been received.

Use batch mode in the following situations:

¨ You want to run the PowerExchange Logger on a scheduled basis after batch applications that update thedatabase complete.

¨ You want to run the PowerExchange Logger manually or for testing.

To enable batch mode, set the COLL_END_LOG parameter to 1 in the pwxccl.cfg file. Also, set NO_DATA_WAIT2parameter to the number of seconds that PowerExchange waits at the end-of-log for more change data beforeshutting down the PowerExchange Logger.

When you run the PowerExchange Logger in batch mode, use batch extraction mode for any workflows thatextract change data from the PowerExchange Logger log files.

26 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 39: Implement CDC

PowerExchange Logger Considerations on Linux andUNIX

If you run the PowerExchange Logger on a Linux or UNIX system, review the requirements for the amount ofmemory needed and for running the PowerExchange Logger in background mode.

PowerExchange Logger Memory Requirement on Linux or UNIXThe PowerExchange Logger requires sufficient amounts of main memory and virtual memory to process changedata.

If the memory is not sufficient, PowerExchange writes the error messages PWX-00271 and PWX-00904 to thePowerExchange message log file when you attempt to start the PowerExchange Logger on Linux or UNIX.

To prevent this problem, use the Linux or UNIX ulimit command to set the size limits for maximum memory andvirtual memory to unlimited. The specific ulimit syntax varies by platform and shell. For more information about thiscommand, see the documentation for your Linux or UNIX operating system.

Running the PowerExchange Logger in Background ModeYou can run a PowerExchange Logger process in background mode on Linux or UNIX systems.

For background PowerExchange Logger processes, Informatica recommends that you set the COLL_END_LOGparameter to 0 in the pwxccl.cfg file to run the PowerExchange Logger continuously. Also, accept the default valueof N for the PROMPT parameter. If you specify PROMPT=Y, the PowerExchange Logger ignores this setting andissues an error message.

To send commands to a PowerExchange Logger process that is running in the background, use the pwxcmdprogram. To enable pwxcmd use, define the CONDENSENAME parameter in the pwxccl.cfg file and define theSVCNODE statement in the dbmover.cfg file.

Configuring the PowerExchange LoggerTo configure the PowerExchange Logger, you must define a PowerExchange Logger configuration file for eachsource type and instance, as defined in a registration group. Also, verify that the Condense option is set to Part inthe capture registrations for all sources that PowerExchange Logger processes.

If you want the PowerExchange Logger to create separate log files for one or more groups of tables, create aPowerExchange group definition file that defines groups of capture registrations for the tables.

Enabling a Capture Registration for PowerExchange Logger UseFor the PowerExchange Logger to use a capture registration, the registration must have a status of active and aCondense setting of Part.

If the PowerExchange Logger does not find any active capture registration, the PowerExchange Logger issueserror message PWX-06427 and ends.

PowerExchange Logger Considerations on Linux and UNIX 27

Page 40: Implement CDC

To enable a capture registration for PowerExchange Logger use:

1. In the PowerExchange Navigator, open the capture registration.

2. In the Resource Inspector, select Active in the Status list.

3. In the Condense list, select Part.

Customizing the PowerExchange Logger Configuration FileBefore you start the PowerExchange Logger, configure its parameters in the PowerExchange Logger configurationfile.

PowerExchange provides an example configuration file, named pwxccl.cfg, in the PowerExchange installationdirectory that is specified in the PWX_HOME environment variable on Linux or UNIX or PATH environmentvariable on Windows. Use this example file as a starting point for your customized file. You can rename theexample file and copy it to another directory. If you do so, you must specify the CS parameter when you start thePowerExchange Logger to identify the alternative path or file name or both.

If you used the similar PowerExchange Condense feature in an earlier PowerExchange release, you can copy itsdtlca.cfg configuration file and then customize the copy. You might want to add PowerExchange Loggerparameters that PowerExchange Condense did not support. Rename the file to pwxccl.cfg or use the CSexecution parameter. The PowerExchange Logger replaces PowerExchange Condense on Linux, UNIX, andWindows.

If you specify a parameter value that contains one or more spaces, such as a Windows path, you must enclose thevalue in double quotation marks. Make sure that you use straight quotation marks (").

Parameter DescriptionsThis topic describes the PowerExchange Logger parameters that you can specify in pwxccl.cfg.

28 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 41: Implement CDC

The parameters are:

Parameter Description Valid Values

CAPT_IMAGE Data image type that the PowerExchange Loggercaptures to its log files. The PowerExchangeLogger can capture after images only or bothbefore and after images of the data.This image type must be consistent with theimage type delivered to the target duringextraction processing.If you enter AI for this parameter, the followinglimitations apply:- You cannot extract before images to the target.- You cannot use DTL_BI columns in extraction

maps.- If you add DTL_CI columns to extraction

maps, any Insert or Delete operations result inNull values in these columns.

Informatica recommends that you specify BA sothat you have the flexibility to use either AI or BAfor the PowerCenter Image Type connectionattribute for extraction processing.

- AI for after images.- BA for before and after

images.Default is AI.

CAPTURE_NODE The node name that the PowerExchange Loggeruses to retrieve capture registrations and changedata.Specify this parameter only if you are using CDCoffload processing with the PowerExchangeLogger. Enter the node name of the remotenode, as specified in a NODE statement in thedbmover.cfg file on the local machine where thePowerExchange Logger runs. ThePowerExchange Logger uses the specified nodename to connect to the PowerExchange Listeneron the remote node to read capture registrationsand change data. The PowerExchange Loggerwrites the change data to its local log files.This parameter is optional. Default is local. Donot specify this parameter if the captureregistrations and change data are on the localmachine where the PowerExchange Logger runs.You can also specify an optional user ID andpassword to control connection to the specifiednode. For more information, see theCAPTURE_NODE_UID parameter and theCAPTURE_NODE_EPWD orCAPTURE_NODE_PWD parameter.

A node name that isspecified in a NODEstatement in thedbmover.cfg file on thelocal machine where thePowerExchange Loggerruns.

Configuring the PowerExchange Logger 29

Page 42: Implement CDC

Parameter Description Valid Values

CAPTURE_NODE_EPWD An encrypted password that is associated withthe user ID specified in theCAPTURE_NODE_UID parameter. Thispassword, in conjunction with theCAPTURE_NODE_UID value, is used to controlPowerExchange access to capture registrationsand change data.Tip: You can create an encrypted password inthe PowerExchange Navigator by selecting File> Encrypt Password .This parameter is optional. However, if youspecify CAPTURE_NODE_UID, you must enter apassword or encrypted password with either theCAPTURE_NODE_PWD orCAPTURE_NODE_EPWD parameter.If you specify this parameter, do not also specifyCAPTURE_NODE_PWD.

CAPTURE_NODE_PWD A clear text password that is associated with theuser ID specified in the CAPTURE_NODE_UIDparameter. This password, in conjunction withthe CAPTURE_NODE_UID value, is used tocontrol PowerExchange access to captureregistrations and change data.This parameter is optional. However, if youspecify CAPTURE_NODE_UID, you must enter apassword or encrypted password with either theCAPTURE_NODE_PWD orCAPTURE_NODE_EPWD parameter.If you specify this parameter, do not also specifyCAPTURE_NODE_EPWD.

30 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 43: Implement CDC

Parameter Description Valid Values

CAPTURE_NODE_UID User ID that is used to control access to captureregistrations and change data on the localmachine or on the remote node that is specifiedin the CAPTURE_NODE parameter.Whether this parameter is required depends onthe operating system of the local or remote nodeand the SECURITY setting in its DBMOVERconfiguration file.If CAPTURE_NODE specifies a z/OS or i5/OSnode that has a SECURITY setting of 0, do notspecify this parameter. PowerExchange uses theuser ID under which the PowerExchangeListener job runs to control access to captureregistrations and change data.If CAPTURE_NODE specifies a z/OS or i5/OSnode that has a SECURITY setting of 1, youmust enter a valid operating system user ID forthis parameter. Otherwise, error messagePWX-00231 is issued, indicating a signon failure.However, PowerExchange uses the user IDunder which the PowerExchange Listener jobruns to control access to capture registrationsand change data.If CAPTURE_NODE specifies a z/OS or i5/OSnode that has a SECURITY setting of 2, youmust enter a valid operating system user ID forthis parameter. Otherwise, error messagePWX-00231 is issued, indicating a signon failure.PowerExchange uses this user ID to controlaccess to capture registrations and change data.If the specified user ID does not have theauthority that is required to read captureregistrations or change data, access fails.For a Linux, UNIX, or Windows local or remotenode, enter a user ID that is valid for your datasource type:- For DB2 for Linux, UNIX, or Windows sources,

enter a valid operating system user ID thathas DB2 DBADM or SYSADM authority.

- For Oracle sources, enter a database user IDthat permits access to Oracle redo logs andOracle LogMiner.

- For Microsoft SQL Server instances that useSQL Server Authentication, enter a databaseuser ID that permits access to the SQL Serverdistribution database. For SQL Serverinstances that use Windows Authentication,PowerExchange uses the user ID under whichthe PowerExchange Listener was started. Inthis case, do not specify this parameter unlessyou want to specify another user.

Configuring the PowerExchange Logger 31

Page 44: Implement CDC

Parameter Description Valid Values

CHKPT_BASENAME Required. An existing directory path and basefile name that PowerExchange uses to createcheckpoint files. Checkpoint files storeinformation for properly resumingPowerExchange Logger processing after a warmstart.For example:/capture/logger.chkpt

When creating the full checkpoint file name,PowerExchange appends Vn, where n is anumber from 0 to (CHKPT_NUM value - 1).For example:/capture/logger.chkptV1.ckp

Maximum length is 256.

CHKPT_NUM Recommended. Number of checkpoint files touse. The PowerExchange Logger requires atleast two checkpoint files.If you decrease the number of checkpoint filesafter running the PowerExchange Logger, youmust cold start the PowerExchange Logger. Ifyou perform a warm start, the PowerExchangeLogger might restart from an incorrect location inits log files.

A number from 2 through999999.Default is 3.

COLL_END_LOG Required. PowerExchange Logger operationalmode.Options are:- 0. Runs the PowerExchange Logger

continuously until you manually stop it. Afterthe Writer subtask completes a processingcycle, it waits for the number of minutesspecified in the NO_DATA_WAIT parameterbefore starting another processing cycle.

- 1. Runs the PowerExchange Logger in batchmode. The PowerExchange Logger shutsdown after the seconds specified in theNO_DATA_WAIT2 parameter elapse and nodata has been received.

0 for continuous mode.1 for batch mode.Default is 0.

COND_CDCT_RET_P Recommended. Retention period, in days, forCDCT records and PowerExchange Logger logfiles. Log files that are older than this period andtheir corresponding CDCT records are deletedautomatically during PowerExchange Loggercleanup processing. Cleanup processing occursduring startup, file switch, or shutdownprocessing.Tip: Set this parameter to minimize the size ofthe CDCT file while preserving the log files thatcontain the earliest change data you might needto access. If you use continuous extractionmode, PowerExchange reads the CDCT file eachtime the interval specified in the FILEWAITparameter of the CAPX CAPI_CONNECTIONstatement elapses. If a CDCT file becomeslarge, this read activity can increase I/O, systemresource use, and latency of change dataextraction. If you use batch extraction mode, thishigh read activity is not a consideration.

Any number greater than 0.Default is 60.

32 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 45: Implement CDC

Parameter Description Valid Values

CONDENSENAME Optional. A name for the command-handlingservice for a PowerExchange Logger for Linux,UNIX, and Windows process to which pwxcmdcommands will be issued.Syntax is:CONDENSENAME=service_name

This service name must match the service namethat is specified in the associated SVCNODEstatement in the dbmover.cfg file. TheSVCNODE statement specifies the TCP/IP porton which this service listens for pwxcmdcommands.Tip: If you run the PowerExchange Logger as abackground process in continuous mode, specifythis parameter so that you can use the pwxcmdprogram to issue commands to thePowerExchange Logger. Without the use ofpwxcmd, you cannot shut down aPowerExchange Logger process that is runningin the background or send status information to acomputer that is remote from where thePowerExchange Logger runs.

Maximum length is 64characters.No default.

CONDENSE_SHUTDOWN_TIMEOUT Maximum amount of time, in seconds, that thePowerExchange Logger waits after receiving theSHUTDOWN or pwxcmd shutdown commandbefore stopping. During a shutdown, thePowerExchange Logger updates the CDCT filefor each capture registration that is used tocapture change data. If you have a large numberof capture registrations, you might need toincrease this timeout period.

A number from 0 through2147483647.Default is 600.

CONN_OVR Recommended. Name of the overrideCAPI_CONNECTION statement to use for thePowerExchange Logger. If you do not specifyCONN_OVR, the PowerExchange Logger usesthe default CAPI_CONNECTION if one isspecified in dbmover.cfg.Informatica recommends that you specifyCONN_OVR. It is the only type of override thatthe PowerExchange Logger can use.

Valid CAPI_CONNECTIONname for the source type.

Configuring the PowerExchange Logger 33

Page 46: Implement CDC

Parameter Description Valid Values

DBID Required. A source identifier, sometimes calledthe instance name, that is defined in captureregistrations. When used with DB_TYPE, itdefines selection criteria for capture registrationsin the CCT file.This value must match the instance or databasename that is displayed in the Resource Inspectorof the PowerExchange Navigator for theregistration group that contains the captureregistrations.For Microsoft SQL Server, an instance name isgenerated when you create a registration group.Open the registration group in thePowerExchange Navigator to view this Instancevalue.

- For DB2 for Linux,UNIX, and Windows,this value is theDatabase name that isdisplayed for theregistration group in theResource Inspector.

- For Microsoft SQLServer, this value is theInstance name that isdisplayed for theregistration group in theResource Inspector.

- For Oracle, this value isthe Instance name thatis displayed for theregistration group and isalso the first positionalparameter in theORACLEID statement indbmover.cfg.

If you use CDC offloadprocessing with thePowerExchange Logger tocapture change data fromz/OS or i5/OS datasources, see “ConfiguringPowerExchange toCapture Change Data on aRemote System” on page162 for information aboutwhat to enter for thisparameter.

DB_TYPE Required. Source RDBMS type. - UDB for DB2 for Linux,UNIX, and Windows

- MSS for Microsoft SQLServer

- ORA for OracleIf you use CDC offloadprocessing with thePowerExchange Logger tocapture change data fromz/OS or i5/OS datasources, see “ConfiguringPowerExchange toCapture Change Data on aRemote System” on page162 for information aboutwhat to enter for thisparameter.

EPWD A deprecated parameter. UseCAPTURE_NODE_EPWD instead. If bothCAPTURE_NODE_EPWD and EPWD arespecified, CAPTURE_NODE_EPWD takesprecedence.

34 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 47: Implement CDC

Parameter Description Valid Values

EXT_CAPT_MASK Required. An existing directory path and aunique prefix to be used for generating thePowerExchange Logger log files.For example:/capture/pwxlog

Verify that no existing files match this path andprefix. PowerExchange considers any file thatmatches this path and prefix to be aPowerExchange Logger log file, even if it isunrelated to PowerExchange Logger processing.To create the log files, the PowerExchangeLogger appends the following information:.CND.CPyymmdd.Thhmmssnnn

Where:- yymmdd is a date composed of a two-digit

year, a month, and a day.- hhmmss is 24-hour time value, including

hours, minutes, seconds.- nnn is a generated sequence number, which

starts from 001.For example:/capture/pwxlog.CND.CP080718.T1545001

Warning: Do not use the sameEXT_CAPT_MASK value for multiplePowerExchange Logger processes. Otherwise, aPowerExchange Logger process might corruptlog files that are used by anotherPowerExchange Logger process. Also, do not re-use an EXT_CAPT_MASK value until thePowerExchange Logger process has completedprocessing all of the log files that match themask.

Maximum length is 256characters.No default.

Configuring the PowerExchange Logger 35

Page 48: Implement CDC

Parameter Description Valid Values

FILE_FLUSH_VAL Recommended. File flush interval in seconds.This parameter affects the latency of changedata extractions that run in continuous extractionmode. The PowerExchange Logger waits for thisinterval to elapse before flushing, or writing, datato the current log file on disk. Flushing data todisk enables the data to be read by extractionsrunning in continuous extraction mode.Valid values are:- A -1 causes the PowerExchange Logger

process to not flush data to the current logfile. Specify this value only if you use batchextraction mode. Do not specify this value ifyou use continuous extraction mode.Otherwise, the latency of your continuous-mode extractions increases.

- A 0 results in a flush after every record.- Any value from 1 through 86400 sets the flush

interval to that specific value.Warning: A value of 0 can degradePowerExchange Logger and file systemperformance.Set this value as appropriate for your CDCenvironment. Values that are too high canincrease change extraction latency, and valuesthat are too low can degrade PowerExchangeLogger and system performance. Informaticarecommends that you set this parameter to avalue that is equal to or greater than theNO_DATA_WAIT2 value because file flushescannot occur until the NO_DATA_WAIT2 periodexpires.

-1 or any number from 0through 86400.Default is -1.

FILE_SWITCH_CRIT Type of units to use for the FILE_SWITCH_MINand FILE_SWITCH_VAL parameters, whichdetermine when to do an automatic file switch.

- M for minutes.- R for records.Default is M.

36 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 49: Implement CDC

Parameter Description Valid Values

FILE_SWITCH_MIN File-switch criteria that the PowerExchangeLogger uses when it encounters change data fora new source. You can use this parameter toreduce change data latency when runningextractions in continuous extraction mode.Syntax is:FILE_SWITCH_MIN=(min_val,min_val_ign)

Where:min_val is the minimum number ofFILE_SWITCH_CRIT units that must elapse afterthe PowerExchange Logger encounters achange record for a source that has no entry inthe CDCT file, before a file switch can beperformed. Valid values are:- A -1 causes this parameter to be ignored. File

switch processing is controlled byFILE_SWITCH_VAL only.

- A 0 causes the PowerExchange Logger toperform a file switch each time a new sourceis encountered.

- Any value from 1 through 2147483647 causesthe PowerExchange Logger to perform a fileswitch when this specified number ofFILE_SWITCH_CRIT units is reached.

min_val_ign is the minimum number ofFILE_SWITCH_CRIT units that must pass duringa PowerExchange Logger cold start before thePowerExchange Logger uses the min_val value.Before the min_val_ign threshold is met, onlyFILE_SWITCH_VAL controls file switch activity.Valid values are:- A 0 causes the PowerExchange Logger to use

the minimum file switch value specified inmin_val immediately after it is cold started.

- Any value from 1 through 2147483647 causesthe PowerExchange Logger to ignore themin_val keyword for the specified number ofunits.

The min_val_ign value is ignored if thePowerExchange Logger is warm started.Warning: The value (0,0) can result in a largenumber of file switches when thePowerExchange Logger is cold started. Thissituation occurs because the PowerExchangeLogger does a file switch each time it encountersa data source without an entry in the CDCT file.During a cold start, the CDCT file is emptied.Thereafter, a file switch occurs each time thePowerExchange Logger encounters a changerecord for a registered data source for the firsttime.

- min_val. A value from -1through 2147483647.

- min_val_ign. A valuefrom 0 through2147483647.

Default is (-1,0).

Configuring the PowerExchange Logger 37

Page 50: Implement CDC

Parameter Description Valid Values

FILE_SWITCH_VAL Number of minutes or change records, asdetermined by FILE_SWITCH_CRIT, that mustelapse before PowerExchange performs a fileswitch.For example, if this value is 30 andFILE_SWITCH_CRIT=R, the PowerExchangeLogger performs a file switch every 30 records.Or if FILE_SWITCH_CRIT=M, thePowerExchange Logger performs a file switchevery 30 minutes.If the PowerExchange Logger log files contain nodata when the FILE_SWITCH_VAL threshold isreached, the file switch does not occur.This value affects the size of thePowerExchange Logger log files. Specify a valuethat results in log files of the appropriate size foryour environment.Tip: When using continuous extraction mode, setthis parameter such that you have larger log filesand a smaller CDCT file. When using batchextraction mode, set this parameter to a valuethat causes file switches to occur within thetimeframe that meets your change extractionlatency requirements.

Any number greater than 0.Default is 30.

GROUPDEFS Path and file name of the optionalPowerExchange Logger group definition file.This file defines groups of capture registrationsthat the PowerExchange Logger uses to capturechange data to separate sets of log files. It alsodefines the path that the PowerExchange Loggeruses to create the log files that contain thechange data for each group. This parameter isoptional.

Maximum length is 255characters.No default.

LOGGER_DELETES_EXPIRED_CDCT_RECORDS

Controls how expired CDCT records, for whichthe retention period has elapsed, are deleted.Options are:- Y. The PowerExchange Logger maintains the

CDCT retention array and deletes expiredCDCT records during file switches. If youenter Y, you cannot issue theDELETE_EXPIRED_CDCT command from thePWXUCDCT utility to delete expired CDCTrecords.

- N. The PowerExchange Logger does notmaintain the CDCT retention array and doesnot delete expired CDCT records. However,you can issue the DELETE_EXPIRED_CDCTcommand from the PWXUCDCT utility todelete expired CDCT records. To use theDELETE_EXPIRED_CDCT command, youmust specify N.

Note: This parameter does not affectPowerExchange Logger deletions of CDCTrecords rolled back because of a cold start ora warm start to a prior point in time.

Y or NDefault is Y.

38 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 51: Implement CDC

Parameter Description Valid Values

MAX_RETENTION_EXPIRY_DAYS Maximum number of days to hold retention arrayitems in memory. Retention array items definewhen CDCT records expire and indicate the logfile names and registration tags referenced bythe CDCT records.Tip: If you have a large volume of CDCTrecords, you can usually avoid memoryshortages by setting theLOGGER_DELETES_EXPIRED_CDCT_RECORDS parameter to N and running the PWXUCDCTutility DELETE_EXPIRED_CDCT command on aregular, scheduled basis. Use theMAX_RETENTION_EXPIRY_DAYS parameteronly in situations with extreme memorylimitations.

A number from 1 through999.Default is 999.

NO_DATA_WAIT If you run the PowerExchange Logger incontinuous mode, specify the number of minutesthat the PowerExchange Logger must waitbefore starting the next logging cycle. A value of0 causes no waiting to occur betweenPowerExchange Logger processing cycles. Ifsource data is not available, the CAPI sleeps.For continuous extraction mode, this valueshould be low so that the next logging cyclestarts shortly after the current one completes.If the value of FILE_SWITCH_CRIT is M and thevalue of FILE_SWITCH_VAL is less than thevalue of NO_DATA_WAIT, the PowerExchangeLogger uses the FILE_SWITCH_VAL valueinstead.

0 or greater.Default is 60.

NO_DATA_WAIT2 Number of seconds that PowerExchange waits atthe end-of-log for more change data beforereturning control to the PowerExchange Logger.If this wait period elapses and new change datahas not been received, PowerExchange returnscontrol to the PowerExchange Logger, and thePowerExchange Logger then stops the currentlogging cycle.The recommended value is 2. If you enter ahigher value, execution of commands for thePowerExchange Logger might be delayed.

Any number greater than 0.Recommended value is 2.Default is 600.

Configuring the PowerExchange Logger 39

Page 52: Implement CDC

Parameter Description Valid Values

PROMPT When you run the PowerExchange Logger inforeground mode, controls whetherPowerExchange displays a user confirmationprompt and waits for a response when youperform one of the following actions:- Cold start the PowerExchange Logger.- Warm start the PowerExchange Logger from a

previous position in the change stream. Thissituation occurs only if checkpoint files thatwere more recent than the current ones weredeleted, and the CDCT file still containsrecords related to the deleted files.

Options are:- Y. Displays the confirmation message

PWX-33236 for a cold start or PWX-33242 fora warm start. You must respond to themessage for startup processing to continue.

- N. Does not display the confirmationmessages. PowerExchange attempts to startwithout first prompting for user confirmation.

If you run the PowerExchange Logger inforeground mode, the default is Y.If you run the PowerExchange Logger inbackground mode or as a PowerExchangeLogger Service in the Informatica domain, thedefault is N. In this case, if you enterPROMPT=Y in the pwxccl.cfg file, thePowerExchange Logger ignores this setting,issues error message PWX-33253, andcontinues processing.

Y or NDefault is Y for aPowerExchange Loggerthat runs in foregroundmode.Default is N for aPowerExchange Loggerprocess that runs inbackground mode or as aPowerExchange LoggerService in the Informaticadomain.

PWD A deprecated parameter. UseCAPTURE_NODE_PWD instead. If bothCAPTURE_NODE_PWD and PWD are specified,CAPTURE_NODE_PWD takes precedence.

40 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 53: Implement CDC

Parameter Description Valid Values

RESTART_TOKEN andSEQUENCE_TOKEN

Parameters that define a restart point for startingchange data processing when a PowerExchangeLogger is cold started.A restart point is defined by both a restart tokenand a sequence token.Depending on how you set these parameters,PowerExchange Logger processing starts fromone of the following restart points during a coldstart.If you do not specify these parameters,processing starts from the current end-of-logposition.If you enter 0 for both parameters, processingstarts from the default start location:- For DB2, the default location is the current log

position at the time the PowerExchangecapture catalog was created.

- For Oracle, the default location is the mostcurrent Oracle catalog dump.

- For Microsoft SQL Server, the default locationis the oldest data available in the publicationdatabase.

If you enter restart token and sequence tokenvalues other than 0, processing resumes fromthe specific restart point defined by these tokenvalues.

- Specific restart andsequence token values.

- 0- Not specified.If you use CDC offloadprocessing with thePowerExchange Logger tocapture change data fromz/OS or i5/OS datasources, see thePowerExchange Condensechapter in thePowerExchange CDCGuide for i5/OS andPowerExchange CDCGuide for z/OS forinformation about what toenter for these parameters.

SIGNALLING Indicates whether the PowerExchange Loggerattempts to take automatic action in the event ofcertain errors.Options are:- N. The PowerExchange Logger does not

automatically trap and handle system errors.Instead, the operating system uses defaulterror handling. Usually, the default handing isto report the program line in error and dumpmemory.

- Y. The PowerExchange Logger automaticallyhandles certain errors such as memorycorruption. After the PowerExchange Loggerhandles the error, it attempts to shut down ina controlled manner.

Y or NDefault is N.

Configuring the PowerExchange Logger 41

Page 54: Implement CDC

Parameter Description Valid Values

UID A deprecated parameter. UseCAPTURE_NODE_UID instead. If bothCAPTURE_NODE_UID and UID are specified,CAPTURE_NODE_UID takes precedence.

VERBOSE Indicates whether the PowerExchange Loggerwrites verbose or terse messages to thePowerExchange message log file for activitiesthat it performs frequently, such as cleanup,checkpoint, condense, and file-switch processing.Options are:- N. The PowerExchange Logger logs a single

terse message for each file switch andcheckpoint.

- Y. The PowerExchange Logger logs multiplemessages at various processing points, suchas when starting or ending a cycle of readingsource data or doing a file switch. Verbosemessaging often includes processing statisticssuch as records processed and elapsed time.

- Y for verbose messaging- N for terse messagingDefault is Y.

RELATED TOPICS:¨ “PowerExchange Logger Operational Modes” on page 25

¨ “Configuring PowerExchange to Capture Change Data on a Remote System” on page 162

Example pwxccl.cfgPowerExchange provides an example pwxccl.cfg file in the PowerExchange installation directory, which you cancustomize.

The example file contains the following statements:

/* Name for PWXCMD control/*CONDENSENAME=PWXCCL1

DBID=ORACOLL1 DB_TYPE=ORA CAPTURE_NODE_UID=user_idCAPTURE_NODE_EPWD=encrypted_password/* CAPTURE_NODE_PWD=plain_text_password

PROMPT=Y

EXT_CAPT_MASK=/capture/condenseO CHKPT_NUM=3 CHKPT_BASENAME=/capture/condenseO.chkpt COND_CDCT_RET_P=50LOGGER_DELETES_EXPIRED_CDCT_RECORDS=Y

/* 0 = continuous, 1 = Stop at end-of-log (batch)COLL_END_LOG=0

/* Number of minutes to wait between CAPI read cycles in secondsNO_DATA_WAIT=0/* Number of seconds to wait at the end-of-log for more change dataNO_DATA_WAIT2=60

/* Number of seconds before flushing, or writing, data to the current log file on disk/* -1 = No flush, 0 = flush every record, 1 to N flush every N seconds/*FILE_FLUSH_VAL=60/* Minimum number of FILE_SWITCH_CRIT units after new CDCT source entry (normal,coldstart)/*FILE_SWITCH_MIN=(0,0)

42 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 55: Implement CDC

FILE_SWITCH_CRIT=MFILE_SWITCH_VAL=20

CAPT_IMAGE=BA SEQUENCE_TOKEN=00RESTART_TOKEN=00

Note: If you enter values in the EXT_CAP_MASK and CHKPT_BASENAME parameters that include spaces, youmust enclose the values in double quotation marks.

Customizing dbmover.cfg for the PowerExchange LoggerTo use the PowerExchange Logger, you must define the CAPT_PATH statement and certain source-specificstatements in the dbmover.cfg file.

Also, you can include some optional parameters to help make finding messages for the PowerExchange Loggereasier or to send commands to a PowerExchange Logger process that is running in background mode.

Use the following key parameters:

CAPT_PATH

Required. Specifies the path to the directory where the CCT and CDCT files reside. The CCT file containsinformation about capture registrations. The CDCT file contains information about the PowerExchange Loggerlog files, such as file names and number of records.

LOGPATH

Optional. Specifies a unique path to the PowerExchange message log files. Use this parameter to createmessage log files in a directory that is separate from your current working directory so that you can find themessage log files more easily.

SVCNODE

Optional. Specifies the TCP/IP port on which a command-handling service for a PowerExchange Loggerprocess listens for commands that you issue with the pwxcmd program. You must define this parameter if yourun the PowerExchange Logger process in background mode on a Linux or UNIX system. For moreinformation about pwxcmd commands, see the PowerExchange Command Reference.

TRACING

Optional. Enables alternative logging. PowerExchange creates a set of alternative log files for eachPowerExchange process in a separate directory. You can specify the directory location, the number of logfiles, and the log file size in MB. When a log file reaches the specified size, PowerExchange switches to thenext log file and begins overwriting any data in that file. Alternative logging is faster and enables you tocustomize the amount of data logged for long-running jobs, such as a PowerExchange Logger process thatruns in continuous mode. If you specify this statement, also specify the LOGPATH statement.

In addition to these parameters, the PowerExchange Logger requires source-specific statements, such as theORCL CAPI_CONNECTION, UOWC CAPI_CONNECTION, and ORACLEID statements for Oracle.

For more information about all DBMOVER configuration parameters, see the PowerExchange Reference Manual.

RELATED TOPICS:¨ “DB2 for Linux, UNIX, and Windows Change Data Capture” on page 56

¨ “Microsoft SQL Server Change Data Capture” on page 70

¨ “Oracle Change Data Capture with Oracle LogMiner” on page 80

Configuring the PowerExchange Logger 43

Page 56: Implement CDC

Using PowerExchange Logger Group DefinitionsTo create separate sets of PowerExchange Logger log files for groups of tables, create a PowerExchange Loggergroup definition file. Then, specify its path and file name in the GROUPDEFS parameter of the pwxccl.cfg file.

When the PowerExchange Logger process starts, it reads the group definition file and creates a separate set oflog files for each defined group.

Group definitions can help improve the efficiency of extraction sessions because the extractions target a morespecific set of PowerExchange Logger log files.

By default, the PowerExchange Logger processes change data for all tables that reside on the instance specifiedby the DBID parameter and that have active capture registrations with the Condense option set to Part. Changesfor all of these tables are written to a single set of log files (not taking into account file switching). For a table witha low level of change activity, the PowerExchange Logger might need to read many change records in the log filesbefore finding the changes of interest.

With group definitions, you can define a group that includes a subset of capture registrations. The PowerExchangeLogger then writes change data to a separate set of log files for the tables that are associated with theseregistrations. When an extraction process runs, it is more likely to find the change data for a table in the groupfaster because it reads only the log files for that group.

For example, if you have five source tables with a low level of change activity and one table with a high level ofchange activity, you can define a group that includes the low-activity tables and another group that includes onlythe high-activity table only. Then, in PowerCenter, define a CDC session that extracts change data from thePowerExchange Logger log files for the low-activity group, and define another CDC session that extracts changefrom the log files for the high-activity group. This configuration enables the CDC session for the low-activity tablesto find and extract the few change records for these tables much more quickly.

If you have multiple tables with the same table name but different schemas, you can define a single captureregistration for the table and specify it once, under a single group, in the group definition file. For any other groupthat includes the same table with a different schema, you can override the schema name in the group definition byusing a SCHEMA statement. By using the SCHEMA statement, you can avoid creating multiple captureregistrations and specifying each one in the group definition file. For example, if you have an EMPLOYEE tablewith different schemas for the north, south, east, and west regions, you can register the north EMPLOYEE tableonly and specify the capture registration name in the NORTH group. Then specify only the override schemas inthe EAST, WEST, and SOUTH groups.

Note: SCHEMA statements are optional for DB2 for i5/OS sources and for DB2 and Oracle sources on Linux,UNIX, and Windows. SCHEMA statements are not supported for SQL Server sources on Windows or any datasource on z/OS.

On Linux, UNIX, and Windows, PowerExchange requirements for unregistered versions of tables, for which a REGstatement is not specified, vary by source type:

¨ For DB2 for Linux, UNIX, and Windows, you must define any unregistered version of a table with the DATACAPTURE CHANGES clause.

¨ For Oracle, you must create an Oracle supplemental log group for the unregistered table, which is similar to thesupplemental log group that was created for the registered copy of the table at registration completion.

¨ For Microsoft SQL Server, you must register all versions of a table in PowerExchange and specify a REGstatement in the group definition file.

Tip: When using group definitions, you can optimize extraction efficiency by defining a CDC session inPowerCenter for each group of tables defined in the group definition file.

RELATED TOPICS:¨ “Customizing the PowerExchange Logger Configuration File” on page 28

¨ “PowerExchange Logger Group Definition File” on page 45

44 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 57: Implement CDC

PowerExchange Logger Group Definition FileA PowerExchange Logger group definition file contains one or more GROUP statements. Each GROUP statementcontains REG or SCHEMA parameters that directly or indirectly identify a group of capture registrations and tablesfor which you want to create separate sets of PowerExchange Logger log files.

For the PowerExchange Logger to use the group definition file, you must specify the path and file name of the filein the GROUPDEFS parameter of the pwxccl.cfg file.

Note: If you specify the GROUPDEFS parameter, the PowerExchange Logger ignores the EXT_CAPT_MASKparameter in the pwxccl.cfg file when creating log files.

The following table describes the statements and parameters in the group definition file:

Statement Positional Parameter Description Data Type andLength

GROUP group_name A unique user-defined name for the group. Thisparameter is required.

VARCHAR(255)

external_capture_mask A unique path and file-name prefix for thePowerExchange Logger log files that are createdfor tables in the group. This parameter is required.Note: This path and prefix is used for the groupinstead of any the path and prefix that arespecified in the EXT_CAPT_MASK parameter ofthe pwxccl.cfg file.

VARCHAR(255)

REG registration_name Optional. Registration name that is specified in theName field of a capture registration. Thislowercase name can be the full registration nameor the first part of the name followed by anasterisk (*) wildcard. This parameter is optional. Ifomitted, the PowerExchange Logger assumesREG=*.

VARCHAR(8)

SCHEMA schema_name Optional. Name of the override schema. You canoptionally use this parameter for DB2 for i5/OSsources and for DB2 and Oracle sources on Linux,UNIX, and Windows.Note: This parameter is not supported for SQLServer sources on Windows. If you use theoffloading feature to have the PowerExchangeLogger process data from z/OS sources, thisparameter is also not supported for the z/OSsources.

VARCHAR(255)

Use the following rules and guidelines when you create a PowerExchange Logger group definition file:

¨ Each group_name must be unique within the group definition file.

¨ Each external_capture_mask must be unique on the system.

¨ SCHEMA statements are optional for DB2 for i5/OS sources and for DB2 and Oracle sources on Linux, UNIX,and Windows. SCHEMA statements are not supported for SQL Server sources on Windows or any data sourceon z/OS.

¨ If you use a SCHEMA statement, you must define a capture registration in the group. You can specify multipleSCHEMA statements under a GROUP if you want the tables with those schemas to be included in the group.

¨ REG statements apply to the preceding SCHEMA statement. If a SCHEMA statement is not present, the REGstatements apply to the preceding GROUP statement.

Configuring the PowerExchange Logger 45

Page 58: Implement CDC

¨ If the file contains a SCHEMA or REG statement without a preceding GROUP statement, the PowerExchangeLogger issues a syntax error.

¨ Do not include the same schema.table value in more than one group. If a table is included in multiple groups,only the first group that includes the table logs changes for it.

¨ If you do not define at least one REG statement for a GROUP, the PowerExchange Logger includes all of theactive capture registrations that are defined for the specified DBID instance and for which the Condense optionis set to Part.

¨ If a registration belongs to multiple groups, the PowerExchange Logger logs changes for that registration onlyunder the first group in the group definition file that includes the registration.

Example Group Definition FilePowerExchange provides an example group definition file, pwxcclgrp.cfg, in the PowerExchange installationdirectory. Use this example as a starting point when creating your group definition file.

The example file contains the following statements:

GROUP=(Company1People,"/user/logger_files/people/company1/condense") REG=Emp* REG=Manager GROUP=(UK_People,"/user/logger_files/people/UK/condense") SCHEMA=Company2 REG=Manager REG=Emp* REG=Em* SCHEMA=Company3 REG=Manager REG=Emp* GROUP=(All_Managers,"/user/logger_files/people/managers/condense") SCHEMA=Company1 REG=Manager SCHEMA=Company2 REG=Manager SCHEMA=Company3 REG=Manager GROUP=(AllCompany3_Locations,"/user/logger_files/locations/company3/condense")REG=loc* GROUP=(Company2Jobs,"/user/logger_files/jobs/company2/condense") REG=Job*

Note: Because this example is for a group definition file on a Linux or UNIX system, the paths include forwardslashes. A group definition file on Windows system would be similar but have back slashes.

This example file defines the following groups:

¨ Company1People group. Groups all tables associated with capture registrations that have names beginningwith “Emp” or the name “Manager.” Changes for these tables are logged to log files that have file namesbeginning with “condense” and that are located at “/user/logger_files/people/company1/.”

¨ UK_People group. Groups tables that have the schema Company2 and that are associated with captureregistrations that have names beginning with “Emp” or “Em” or the name “Manager.” Changes for these tablesare logged to log files that have names beginning with “condense” and that are located at “/user/logger_files/people/UK/.”

¨ All_Managers group. Groups tables that have the schema Company1, Company2, or Company3 and that areassociated with the capture registration with the name “Manager.” Changes for these tables are logged to logfiles that have names beginning with “condense” and that are located at “/user/logger_files/people/managers/.”

46 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 59: Implement CDC

¨ AllCompany3_Locations group. Groups all tables that are associated with capture registrations that havenames beginning with “loc.” Changes for these tables are logged to log files that have names beginning with“condense” and that are located at “/user/logger_files/locations/company3/.”

¨ Company2Jobs group. Groups all tables that are associated with capture registrations that have namesbeginning with “Job.” Changes for these tables are logged to log files that have names beginning with“condense” and that are located at “/user/logger_files/jobs/company2/.”

Some tables might be included in more than one group. For example, the table COMPANY2.MANAGERS is in theCompany1People, UK_People, and All_Managers groups. However, changes for this table are logged only underthe Company1People group because it is the first group in the file that includes this table.

Starting the PowerExchange LoggerYou can cold start or warm start the PowerExchange Logger process.

¨ A cold start uses the restart and sequence tokens, if present, in the pwxccl.cfg configuration file to determinethe point in the change stream from which the PowerExchange Logger starts reading changes. If you arestarting the PowerExchange Logger for the first time, you must perform a cold start.

¨ A warm start uses the restart and sequence tokens in the last checkpoint file to resume CDC processing. Youcan perform a warm start only if you have run the PowerExchange Logger previously and have recentcheckpoint files.

You cannot use the pwxcmd program to start the PowerExchange Logger.

PWXCCL Syntax and ParametersTo start the PowerExchange Logger process, run the pwxccl program, which is located in the PowerExchangeinstallation directory by default.

PWXCCL SyntaxThe pwxccl statement has the following syntax:

pwxccl [coldstart={Y|N}] [config=path/pwx_config_file] [cs=path/pwxlogger_config_file] [license=path/license_file]

Use the following rules and guidelines when you enter the pwxccl statement:

¨ To cold start the PowerExchange Logger, you must set the coldstart parameter to Y. The default is N.

¨ All parameters are optional. However, if you specify the config or license parameter, the cs parameter isrequired.

¨ In the config, cs, and license parameters, the full path is required only if the file is not in the default location.

¨ On Linux and UNIX, append an ampersand (&) at the end of the statement to run the PowerExchange Loggerin background mode.

For more information about pwxccl syntax, see the PowerExchange Command Reference.

PWXCCL ParametersYou can specify several optional parameters in the pwxccl statement.

Starting the PowerExchange Logger 47

Page 60: Implement CDC

The following table describes each parameter:

Parameter Description

coldstart Indicates whether to cold start or warm start the PowerExchange Logger.Enter one of the following values:- Y. Cold starts the PowerExchange Logger. You must specify COLDSTART=Y to perform a cold start.

The absence of checkpoint files does not trigger a cold start. If you specify Y and checkpoint files exist,the PowerExchange Logger ignores the files. If the CDCT file contains records, the PowerExchangeLogger deletes these records.

- N. Warm starts the PowerExchange Logger from the restart point that is indicated in the last checkpointfile. If no checkpoint file exists in the CHKPT_BASENAME directory, the PowerExchange Logger endswith error message PWX-33227.

Default is N.

config Full path and file name for a DBMOVER configuration file that overrides the default dbmover.cfg file in theinstallation directory. The override file must have a path or file name that is different from that of the defaultfile.This override file takes precedence over any other override configuration file that you optionally specifywith the PWX_CONFIG environment variable.

cs Full path and file name of the PowerExchange Logger configuration file. Use this parameter to specify aPowerExchange Logger configuration file that overrides the default pwxccl.cfg in the installation directory.The override file must have a path or file name that is different from that of the default file.

license Full path and file name for a license key file that overrides the default license.key file in the installationdirectory. The override file must have a file name or path that is different from that of the default file.This override file takes precedence over any other override license key file that you optionally specify withthe PWX_LICENSE environment variable.

Note: In these parameters, the full path is required only if the file is not in the default location.

How the PowerExchange Logger Determines the Start Point for aCold Start

When you cold start a PowerExchange Logger process, it uses the RESTART_TOKEN and SEQUENCE_TOKENparameters, if present, in the pwxccl.cfg configuration file to determine the point in the change stream at which tostart reading changes.

Based on how you set these parameters, the PowerExchange Logger starts from one of the following points in thechange stream:

¨ If you do not define the RESTART_TOKEN and SEQUENCE_TOKEN parameters, the PowerExchangeLoggers starts from the current end-of-log (EOL), or current point in time in the change stream.

Tip: You can generate restart and sequence tokens for the current EOL by running the DTLUAPPL utility withthe RSTTKN GENERATE parameter or by performing a database row test with the SELECTCURRENT_RESTART SQL statement in PowerExchange Navigator.

¨ If you enter only zeroes (a single 0, or an even number of 0s) in the RESTART_TOKEN andSEQUENCE_TOKEN parameters, the PowerExchange Logger starts from the oldest available change record inthe change stream.

¨ If you enter valid restart information in the RESTART_TOKEN and SEQUENCE_TOKEN parameters, thePowerExchange Logger starts from the point in the change stream that the token values identify. Use thismethod to start the PowerExchange Logger from a specific point.

48 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 61: Implement CDC

Cold Starting the PowerExchange LoggerUse this procedure to cold start the PowerExchange Logger. In the start statement, you must includeCOLDSTART=Y.

During a cold start, the PowerExchange Logger ignores any checkpoint files that exist in the directory that isspecified by the CHKPT_BASENAME parameter in the pwxccl.cfg file. If the CDCT file contains records, thePowerExchange Logger deletes these records.

To cold start the PowerExchange Logger:

1. If you previously ran the PowerExchange Logger and have existing checkpoint, CDCT, and log files, retainthese files for historical purposes.

You can move or rename the files, as long as another PowerExchange Logger process is not using them. Dodelete them if you want to retain change processing history.

Warning: If you delete, move, or rename the CCT file, the capture registrations will not be available.

2. In the pwxccl.cfg configuration file, set the RESTART_TOKEN and SEQUENCE_TOKEN parameters in amanner that causes the PowerExchange Logger to start from the appropriate point in the change stream.

3. To cold start the PowerExchange Logger, enter the following statement at the command line:pwxccl coldstart=y

Include the optional config, cs, and license parameters if you want to override the default dbmover.cfg,pwxccl.cfg, and license.key files. On Linux and UNIX systems, you can add an ampersand (&) at the end ofthe statement to run the PowerExchange Logger in background mode. For more information aboutPowerExchange Logger syntax, see the PowerExchange Command Reference.

RELATED TOPICS:¨ “How the PowerExchange Logger Determines the Start Point for a Cold Start” on page 48

¨ “PWXCCL Parameters” on page 47

Managing the PowerExchange LoggerTo assess the status of the PowerExchange Logger for Linux, UNIX, and Windows, you can display messagesabout PowerExchange Logger processing, memory use, and CPU use.

Occasionally, you might need to stop the PowerExchange Logger.

Commands for Controlling and Stopping PowerExchange LoggerProcessing

Use PowerExchange Logger for Linux, UNIX, and Windows commands to manually initiate a file switch or anotherlogging cycle, stop the PowerExchange Logger, or display messages about PowerExchange Logger processingand system resource use.

You can enter these commands from the command line or by using the pwxcmd program. The output is displayedon screen and written to the PowerExchange message log.

Note: To use pwxcmd, you must specify the CONDENSENAME parameter in the pwxccl.cfg file and theSVCNODE statement in the dbmover.cfg file.

Managing the PowerExchange Logger 49

Page 62: Implement CDC

The following table describes each command:

Command-line Command pwxcmd Command Description

CONDENSE condense When the PowerExchange Logger isrunning in continuous mode, manuallystarts a new PowerExchange Loggerlogging cycle before the wait period forstarting another cycle has elapsed. Thewait period is defined by theNO_DATA_WAIT parameter inpwxccl.cfg.

DISPLAY ALL displayall Displays all messages that can beproduced by the other PowerExchangeLogger DISPLAY commands, arrangedby command.

DISPLAY CHECKPOINTS displaycheckpoints Displays message PWX-26041, whichreports information about the latestcheckpoint file. The informationincludes the file sequence number,timestamp, number of data records andcommit records, and commit time.

DISPLAY CPU displaycpu Displays the CPU time spent, inmicroseconds, for PowerExchangeLogger processing during the currentlogging cycle, by processing phase.Also includes the total CPU time for allPowerExchange Logger processing.Processing phases include readingsource data, writing data to log files,performing file switches, and performing“other” processing such as initialization.

DISPLAY EVENTS displayevents Displays events that thePowerExchange Logger Controller,Command Handler, and Writer tasksare waiting on. Also indicates if theWriter is processing data or is in asleep state waiting for an event ortimeout to occur.

DISPLAY MEMORY displaymemory Displays PowerExchange Loggermemory use, in bytes, for eachPowerExchange Logger task andsubtask, with totals for the entirePowerExchange Logger process.Memory use is reported for thefollowing categories: Application, Total,and Maximum.

50 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 63: Implement CDC

Command-line Command pwxcmd Command Description

DISPLAY RECORDS displayrecords Displays counts of change records thatthe PowerExchange Logger processedduring the current processing cycle. Ifthe PowerExchange Logger did notreceive changes during the currentcycle, displays counts of changerecords for the current set ofPowerExchange Logger log files.Record counts are shown by recordtype. Record types are Delete, Insert,Update, Commit, and Total.

DISPLAY STATUS displaystatus Displays the status of thePowerExchange Logger Writer subtask,for example, initializing, writing sourcedata to a PowerExchange Logger logfile, or starting a checkpoint.

FILESWITCH fileswitch Closes open PowerExchange Loggerlog files if they contain data and thenswitches to a new set of log files. If thelog files do not contain data, the fileswitch does not occur.If you use batch extraction mode, youcan use this command to make changedata in the current log files available forextraction processing before the nextfile switch is due to occur. To issue thefileswitch command from a script orbatch file, you must use the pwxcmdprogram. Usually, you do not need toperform manual file switches if you usecontinuous extraction mode.

Managing the PowerExchange Logger 51

Page 64: Implement CDC

Command-line Command pwxcmd Command Description

SHUTCOND shutcond Stops the PowerExchange Logger in acontrolled manner after initiating andcompleting a final logging cycle. Thefinal logging cycle enables thePowerExchange Logger to capture all ofthe changes up to point when thecommand is issued. After the loggingcycle completes, the PowerExchangeLogger closes open log files, updatesthe CDCT file, takes a final checkpointto record the latest restart andsequence tokens, closes the CAPI,stops the Writer and Command Handlersubtasks, and then ends the pwxcclprogram. Use this command if a loggingcycle has not run recently.

SHUTDOWN shutdown Stops the PowerExchange Logger in acontrolled manner after closing anyopen PowerExchange Logger log filesand writing the latest restart position tothe checkpoint files. During shutdownprocessing, the PowerExchange Loggercloses open log files, updates theCDCT file, takes a final checkpoint,closes the CAPI, stops the Writer andCommand Handler subtasks, and thenends the pwxccl program. Use thiscommand to stop a PowerExchangeLogger process that is running incontinuous mode.

For more information about command syntax, example output, and pwxcmd use, see the PowerExchangeCommand Reference.

Assessing PowerExchange Logger PerformanceTo assess PowerExchange Logger performance, you can view key PowerExchange Logger messages that reportCPU use and elapsed times for processing.

Enter VERBOSE=Y in the pwxccl.cfg configuration file to have the PowerExchange Logger produce more detailedmessages during initialization, condense, fileswitch, record expiration, and shutdown processing. For example, thefollowing verbose messages indicate CPU use by the Writer subtask:

¨ Message PWX-33274 is issued before the Writer subtask starts reading source data after initialization andbefore the PowerExchange Logger shuts down:

PWX-33274 CPU Total number. CAPI Read number. Writing number. File switching number. Other number¨ Message PWX-33279 issued after each file switch and checkpoint:

PWX-33279 CPU total number. This file total number. CAPI Reads number. Writing file number. Other number

52 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 65: Implement CDC

If you do not use verbose messaging, you can use the DISPLAY CPU and DISPLAY RECORDS commands togather statistics that are useful for assessing PowerExchange Logger performance and status.

¨ The DISPLAY CPU command displays the CPU time spent, in microseconds, for PowerExchange Loggerprocessing during the current logging cycle, by processing phase and with the total for all processing.Processing phases include:

- Reading source data

- Writing data to PowerExchange Logger log files

- Performing file switches

- Performing "other processing," such as initialization and Command Handler processing of commands

¨ The DISPLAY RECORDS command displays counts of change records that the PowerExchange Loggerprocessed during the current processing cycle. If the PowerExchange Logger did not receive changes duringthe current cycle, the command displays counts of change records for the current PowerExchange Logger logfiles. Record counts are shown for each type of change record processed and for total records processed.Change record types include Delete, Insert, Update, and Commit.

For more information about these commands, including example output, see the PowerExchange CommandReference.

Maintaining the PowerExchange Logger CDCT File and Log FilesYou can use the PWXUCDCT utility to maintain the PowerExchange Logger CDCT file and log files.

Use the following utility commands to perform maintenance tasks:

Command Description

CREATE_CDCT_BACKUP Back up all CDCT records for the source instance that isspecified in the DBID parameter of the pwxccl.cfgconfiguration file.

DELETE_EXPIRED_CDCT Delete CDCT records for which the retention period hasexpired and any PowerExchange Logger log files that arereferenced by those records. Use this command only if youset the LOGGER_DELETES_EXPIRED_CDCT_RECORDSparameter to N in the pwxccl.cfg file.

DELETE_ORPHAN_FILES Delete PowerExchange Logger log files that are notreferenced by any record in the CDCT file.

DERIVE_CDCT_BACKUP Create a backup of the CDCT file based on PowerExchangeLogger log files, if the original backup file is damaged ordeleted.

REPORT_CDCT List information about the CDCT file and its records. For eachCDCT record, the command reports the record number,registration tag name, log file name, number of changerecords received for the registered table, start and end times,and start and end restart tokens.

REPORT_CDCT_BY_TIME List CDCT records in the order in which they expire.

REPORT_CONFIG List the parameter settings in the PowerExchange Loggerpwxccl.cfg configuration file.

Managing the PowerExchange Logger 53

Page 66: Implement CDC

Command Description

REPORT_CHECKPOINTS List checkpoint files in chronological order, from earliest tolatest, based on when they were written. For each file, the listprovides information such as number of capture registrationsprocessed, reason for the checkpoint, sequence and restarttokens, number of expired CDCT records that were deleted,and number of log files to which change data was written.

REPORT_EXPIRED_CDCT List PowerExchange Logger log files based on their filenames.

REPORT_FILES_BY_TIME List PowerExchange Logger log files in the order in whichthey were created, from earliest to latest.

REPORT_ORPHAN_FILES List PowerExchange Logger log files that are not referencedby any record in the CDCT file.

RESTORE_CDCT Restore the CDCT file from a backup if the CDCT file isdamaged or deleted.

For more information about the PWXUCDCT utility, see the PowerExchange Utilities Guide.

Backing Up PowerExchange Logger FilesPeriodically, back up the PowerExchange Logger for Linux, UNIX, and Windows checkpoint files, CDCT file, andlog files. If existing files become damaged or deleted, you can then use the backups to restore the files.

If possible, back up the PowerExchange Loggers files during a period when source data is not being written to thePowerExchange Logger log files. Back up the files in the following sequence to ensure that you have a checkpointfile that matches the backup:

1. Checkpoint files

2. CDCT file

3. PowerExchange Logger log files

Note: During a file switch, the Writer subtask processes the files in the reverse sequence.

To back up the CDCT file, you can use the PWXUCDCT utility CREATE_CDCT_BACKUP command.

Re-creating the CDCT File After a FailureIf the CDCT file and its recent backups are damaged or deleted, you can re-create the CDCT file based on thePowerExchange Logger log files. You must derive a CDCT backup based on the current PowerExchange Loggerlog files and then restore that backup.

1. Issue PWXUCDCT utility DERIVE_CDCT_BACKUP command.

For more information about using PWXUCDCT utility commands, see the PowerExchange Utilities Guide.

2. Restore the derived backup by issuing the PWXUCDCT utility RESTORE_CDCT command.

3. Verify that the restore operation was successful as follows:

¨ Verify that the return code from the PWXUCDCT utility is zero.

¨ Verify that messages PWX-25140 through PWX-25145 provide reasonable record counts for the recordsread from the backup file and for the records that were changed in the CDCT file.

54 Chapter 3: PowerExchange Logger for Linux, UNIX, and Windows

Page 67: Implement CDC

Part III: PowerExchange CDC DataSources

This part contains the following chapters:

¨ DB2 for Linux, UNIX, and Windows Change Data Capture, 56

¨ Microsoft SQL Server Change Data Capture, 70

¨ Oracle Change Data Capture with Oracle LogMiner, 80

55

Page 68: Implement CDC

C H A P T E R 4

DB2 for Linux, UNIX, and WindowsChange Data Capture

This chapter includes the following topics:

¨ DB2 for Linux, UNIX, and Windows CDC Overview, 56

¨ Planning for DB2 CDC, 57

¨ Configuring DB2 for CDC, 58

¨ Configuring PowerExchange for DB2 CDC, 59

¨ Using a DB2 Data Map, 65

¨ Managing DB2 CDC, 66

¨ DB2 for Linux, UNIX, and Windows CDC Troubleshooting, 69

DB2 for Linux, UNIX, and Windows CDC OverviewPowerExchange captures change data from the DB2 for Linux, UNIX, and Windows recovery logs for the databasethat contains your source tables. PowerExchange uses the PowerExchange Client for PowerCenter (PWXPC) tocoordinate with PowerCenter to move the captured change data to one or more targets.

For PowerExchange to capture DB2 change data, you must perform the following configuration tasks in DB2:

¨ Ensure that archive logging is active for the database.

¨ Create a PowerExchange capture catalog table in the database. The capture catalog table stores informationabout all tables in the source database, including column definitions and DB2 log positions.

Also, perform the following configuration tasks in PowerExchange:

¨ Define a capture registration for each source table. In the capture registration, you can select a subset ofcolumns for which to capture data. PowerExchange generates a corresponding extraction map. Optionally, youcan define an additional extraction map.

¨ If a source table contains columns in which you store data in a format that is inconsistent with the columndatatype, you can optionally create a data map to manipulate that data with expressions. For example, if youstore packed data in a CHAR column, you can create a data map to manipulate and prepare that data for

56

Page 69: Implement CDC

loading to a target. You must merge the data map with the extraction map for the source table during captureregistration creation.

¨ If you want to use the PowerExchange Logger for Linux, UNIX, and Windows to capture change data and writeit to PowerExchange Logger log files, configure the PowerExchange Logger. The change data is then extractedfrom the PowerExchange Logger log files. Benefits of the PowerExchange Logger include fewer databaseaccesses, faster CDC restart, and no need to prolong retention of DB2 log files for change capture.PowerExchange works in conjunction with PowerCenter to extract change data from DB2 recovery logs orPowerExchange Logger log files and load that data to one or more targets.

RELATED TOPICS:¨ “PowerExchange Logger for Linux, UNIX, and Windows” on page 19

¨ “Introduction to Change Data Extraction” on page 105

¨ “Extracting Change Data” on page 125

Planning for DB2 CDCBefore you configure DB2 for Linux, UNIX, and Windows CDC, verify that the following prerequisites and userauthority requirements are met. Also, review the restrictions so that you can properly configure CDC.

PrerequisitesPowerExchange CDC has the following prerequisites:

¨ Archive logging must be active for the database that contains the source tables from which change data is tobe captured.

¨ DB2 source tables must have been defined with the DATA CAPTURE CHANGES clause for capture processingto occur.

Required User AuthorityFor PowerExchange to read change data from DB2 logs, the user ID that you specify for database access musthave SYSADM or DBADM authority. Usually, you specify this user ID in the UDB CAPI_CONNECTION statementin the dbmover.cfg file.

Planning for DB2 CDC 57

Page 70: Implement CDC

CDC RestrictionsThe following restrictions apply to DB2 CDC processing:

¨ To extract change data on a DB2 client machine that is remote from the DB2 server where the change data iscaptured, both machines must have the same architecture. Otherwise, change data capture processing mightfail with the error message PWX-20628.

¨ PowerExchange cannot capture change data for the following DB2 datatypes:

- DECFLOAT, LOB, and XML datatypes. You can create a capture registration for a table that includes columnswith DECFLOAT, LOB, and XML datatypes. However, the registration does not include these columns, andPowerExchange does not capture change data for them. PowerExchange does capture change data for theother columns in the registered table that have supported datatypes.

- User-defined datatypes. Tables that include columns with user-defined datatypes cannot be registered forchange data capture. PowerExchange cannot capture change data for these tables.

¨ To add or drop partitions in a partitioned database and then redistribute table data across the updated partitiongroup, or to reconfigure a database partition group, you must use a special procedure. Otherwise,PowerExchange might not be able to resume change data capture properly.

¨ If you alter a column datatype to or from FOR BIT DATA, PowerExchange does not detect the datatypechange. PowerExchange continues to use the datatype that is specified in the existing capture registration.

¨ In a partitioned database, if an UPDATE to a table row changes the partition key and that change causes therow to move to another partition, PowerExchange processes the UPDATE as two operations: a DELETE andan INSERT. However, based on the DB2 log information, PowerExchange cannot predictably determine theorder in which to perform the DELETE and INSERT operations. If the INSERT is processed first, both theoriginal row and the updated row appear on the target until the DELETE is processed.

¨ The maximum length of a row from which PowerExchange can capture change data is 32 KB.

RELATED TOPICS:¨ “Reconfiguring a Partitioned Database or Database Partition Group” on page 67

Configuring DB2 for CDCTo configure DB2 for Linux, UNIX, or Windows for PowerExchange CDC, perform the following tasks:

1. In the DB2 Control Center Configure Database Logging Wizard, enable archive logging for the DB2 database.For more information, see the IBM DB2 documentation.

If archive logging is not enabled, PowerExchange issues the error messages PWX-20204 and PWX-20229during CDC.

2. Set the following user environment variables in any process that runs PowerExchange CDC or theDTLUCUDB program:

¨ Set DB2NOEXITLIST to ON.

¨ Set DB2CODEPAGE to 1208.

3. Verify that the DB2 source tables are defined with the DATA CAPTURE CHANGES clause.

4. If a table that is selected for change data capture includes columns with a LONG datatype, use the INCLUDELONGVAR COLUMNS clause to alter the table so that PowerExchange can capture data for the LONGcolumns. Otherwise, PowerExchange might issue the error message PWX-20094 during CDC processing.

58 Chapter 4: DB2 for Linux, UNIX, and Windows Change Data Capture

Page 71: Implement CDC

Configuring PowerExchange for DB2 CDCThe tasks that you perform to configure PowerExchange for DB2 for Linux, UNIX, and Windows CDC depend onwhether you want to use the PowerExchange Logger for Linux, UNIX, and Windows and the extraction mode youplan to use.

RELATED TOPICS:¨ “PowerExchange Logger for Linux, UNIX, and Windows” on page 19

Configuring PowerExchange CDC without the PowerExchange LoggerIf you plan to run extractions in real-time extraction mode and not use the PowerExchange Logger for Linux, UNIX,and Windows, complete the following tasks to configure PowerExchange CDC:

1. Create the PowerExchange capture catalog table.

2. Run the DTLUCUDB SNAPSHOT command to initialize the capture catalog table.

3. When you configure the dbmover.cfg file, include the following statements:

¨ CAPT_PATH

¨ CAPT_XTRA

¨ UDB CAPI_CONNECTION

4. In the PowerExchange Navigator, create a capture registration for each source table. The PowerExchangeNavigator generates a corresponding extraction map. Optionally, create a data map if you want to performfield-level processing.

Tip: Set the Condense option to Part even though you do not plan to use the PowerExchange Logger,unless you have a specific reason not to do so. This practice prevents having to edit the capture registrationslater if you decide to use the PowerExchange Logger. You might want to set the Condense option to None ifyou plan to run both real-time and continuous extractions against tables defined by the same captureregistrations and you do not want the PowerExchange Logger to capture change data for some registeredtables.

If capture registrations already exist for the source tables, delete the existing registrations and extractionmaps and create new ones.

5. Activate the capture registrations. Usually, you do this task after materializing the targets.

Next Step: Configure and start extractions. You must use real-time extraction mode.

RELATED TOPICS:¨ “Initializing the Capture Catalog Table” on page 61

¨ “Customizing dbmover.cfg for DB2 CDC” on page 61

¨ “Creating the Capture Catalog Table” on page 60

¨ “Introduction to Change Data Extraction” on page 105

¨ “Extracting Change Data” on page 125

Configuring PowerExchange for DB2 CDC 59

Page 72: Implement CDC

Configuring PowerExchange CDC with the PowerExchange LoggerIf you plan to use the PowerExchange Logger for Linux, UNIX, and Windows and run extractions in batch orcontinuous extraction mode, complete the following tasks to configure PowerExchange CDC:

1. Create the PowerExchange capture catalog table.

2. Run the DTLUCUDB SNAPSHOT command to initialize the capture catalog table.

3. When you configure the dbmover.cfg file, include the following statements:

¨ CAPT_PATH

¨ CAPT_XTRA

¨ UDB CAPI_CONNECTION

¨ CAPX CAPI_CONNECTION (for continuous extraction mode only)

4. Configure the pwxccl.cfg file for the PowerExchange Logger.

5. In the PowerExchange Navigator, create a capture registration for each DB2 source table. You must selectPart in the Condense drop-down list. The PowerExchange Navigator generates a corresponding extractionmap.

If capture registrations already exist for these tables, delete the existing registrations and extraction maps andcreate new ones.

6. Activate the capture registrations. Usually, you do this task after materializing the targets.

7. Start the PowerExchange Logger.

Next Step: Configure and start extractions. You can use either batch extraction mode or continuous extractionmode.

RELATED TOPICS:¨ “Configuring the PowerExchange Logger” on page 27

¨ “Starting the PowerExchange Logger” on page 47

¨ “Creating the Capture Catalog Table” on page 60

¨ “Initializing the Capture Catalog Table” on page 61

¨ “Customizing dbmover.cfg for DB2 CDC” on page 61

¨ “Introduction to Change Data Extraction” on page 105

¨ “Extracting Change Data” on page 125

¨ “CAPX CAPI_CONNECTION Parameters” on page 14

Creating the Capture Catalog TableThe PowerExchange capture catalog table stores information about the CDC source tables, column definitions,and valid DB2 log positions. You must create this table in the same database that contains the source tables fromwhich change data is captured.

If the database has multiple partitions, the capture catalog table stores positioning information for each partition. Ifthe database has only a single partition, the capture catalog table still contains positioning information for thepartition.

Use the following DDL to create the capture catalog table:

CREATE TABLE DTLCCATALOG ( VTSTIME TIMESTAMP NOT NULL, VTSACC INTEGER NOT NULL, NODENUM SMALLINT NOT NULL, SEQ INTEGER NOT NULL,

60 Chapter 4: DB2 for Linux, UNIX, and Windows Change Data Capture

Page 73: Implement CDC

TBSCHEMA VARCHAR(128), TBNAME VARCHAR(128), OP VARCHAR(1024) NOT NULL, PRIMARY KEY(VTSTIME, VTSACC, NODENUM, SEQ) );

In this DDL, the table name is DTLCCATALOG. If necessary, you can specify another table name.

Tip: Informatica recommends that you place the PowerExchange capture catalog table in the DB2 catalogpartition.

Initializing the Capture Catalog TableTo initialize the PowerExchange capture catalog table, run the DTLUCUDB utility with the SNAPSHOT command.You should need to do this task only once.

To specify the command, use the following syntax:

DTLUCUDB SNAPSHOT [DB=database_name] [CCATALOG=capture_catalog_name] [UID=user_id] [EPWD=encrypted_password] [REPLACE=Y|N]

If the capture catalog table contains existing rows of data, you must set the REPLACE parameter to Y to enablePowerExchange to overwrite the data. For a new capture catalog table, accept the default of N.

After the snapshot successfully completes, back up the capture catalog table to create a point of consistency forrecovery.

Note: If you run the DTLUCUDB SNAPSHOT command while the DB2 catalog is being updated, the snapshotfails. If this failure occurs, run the SNAPSHOT command again after the DB2 catalog updates are complete.

Customizing dbmover.cfg for DB2 CDCIn the dbmover.cfg configuration file, include the CAPI connection statement that is specific to DB2 for Linux,UNIX, and Windows. Also add the other statements that are required for CDC and any optional statements thatyou want to use.

The following statements are required for DB2 CDC:

¨ CAPT_PATH. Path to the local directory where the following files reside: CCT file for capture registrations,CDEP file for application names used in ODBC extractions, and CDCT file for information aboutPowerExchange Logger for Linux, UNIX, and Windows log files.

¨ CAPT_XTRA. Path to the local directory for extraction maps.

¨ UDB CAPI_CONNECTION. A named set of parameters that the CAPI uses to connect to the change streamand control extraction processing for DB2 for Linux, UNIX, and Windows sources.

Add this statement to the dbmover.cfg file on the system where DB2 capture registrations are stored. Thislocation corresponds to the Location node that you specify when defining a registration group. Usually, thislocation is where the source database resides.

If you plan to use continuous extraction mode, you must also define the CAPX CAPI_CONNECTION statement.

To find PowerExchange messages more easily, include the LOGPATH statement. This statement defines aspecific directory for the PowerExchange message log files.

RELATED TOPICS:¨ “CAPX CAPI_CONNECTION Parameters” on page 14

¨ “DB2 for Linux, UNIX, and Windows CAPI_CONNECTION Parameters” on page 62

Configuring PowerExchange for DB2 CDC 61

Page 74: Implement CDC

Example StatementsThe following statements are typical of those included in a dbmover.cfg for DB2 for Linux, UNIX, and WindowsCDC:

CAPT_PATH=c:/pwxcapt/VnnnCAPT_XTRA=c:/pwxcapt/Vnnn/extrmapsCAPI_CONN_NAME=UDBCCCAPI_CONNECTION=(NAME=UDBCC ,DLLTRACE=bbbb ,TYPE=(UDB ,CCATALOG=mylib.captcat_tbl ,USERID=db2admin ,PASSWORD=db2admin))

DB2 for Linux, UNIX, and Windows CAPI_CONNECTION ParametersThe UDB CAPI_CONNECTION statement specifies the Consumer API (CAPI) parameters needed for DB2 forLinux, UNIX, and Windows CDC sources.

Data Sources: DB2 for Linux, UNIX, andWindows

Required: Yes for DB2 for Linux,UNIX, and Windows CDC

Syntax:

CAPI_CONNECTION=( [DLLTRACE=trace_id,] NAME=name, [TRACE=trace,] TYPE=(UDB, [CCATALOG=capture_catalog,] [DBCONN=database_name,] [EPWD=encryted_password,] [MEMCACHE=cache_size,] [PASSWORD=password,] [RSTRADV=seconds,] [SPACEPRI=primary_space,] [UDBSCHEMA=schema,] [UPDINT=seconds,] [UPDREC=num_records,] [USERID=user_id] ))

Parameters:

Enter the following parameters:

DLLTRACE=trace_id

Optional. User-defined name of the TRACE statement that activates internal DLL tracing for this CAPI.Specify this parameter only at the direction of Informatica Global Customer Support.

NAME=name

Required. Unique user-defined name for this CAPI_CONNECTION statement.

Maximum length is eight alphanumeric characters.

TRACE=trace

Optional. User-defined name of the TRACE statement that activates the common CAPI tracing. Specify thisparameter only at the direction of Informatica Global Customer Support.

62 Chapter 4: DB2 for Linux, UNIX, and Windows Change Data Capture

Page 75: Implement CDC

TYPE=(UDB, ... )

Required. Type of CAPI_CONNECTION statement. For DB2 for Linux, UNIX, and Windows sources, thisvalue must be UDB.

CCATALOG=capture_catalog

Optional. Name of the PowerExchange capture catalog table in the format creator.table_name.

Default is creator.DTLCCATALOG, where creator is the user ID that is used to connect to the database.

DBCONN=database_name

Optional. A database name that specifies an override database to which to connect for data extraction.The override database must contain tables and columns that are identical to those in the originaldatabase. The original database name is included in the registration tag names and extraction mapnames.

Use this parameter if you want extract change data from another database that is identical to the onespecified in the registration group.

EPWD=encryted_password

Optional. Encrypted password that is used with the database user ID specified in the USERID parameter.

You can create encrypted passwords by using the PowerExchange Navigator.

If you specify the USERID parameter, you must specify either the PASSWORD or EPWD parameter. Donot specify both PASSWORD and EPWD.

MEMCACHE=cache_size

Optional. Memory cache size, in kilobytes, that PowerExchange allocates to reconstruct complete UOWs.

For each extraction session, PowerExchange keeps all changes for each UOW in the memory cache untilit processes the end-UOW record. If the memory cache is too small to hold all of the changes in a UOW,PowerExchange spills the changes to a sequential files on disk, called UOW spill files.

Each UOW spill file contains one UOW. A UOW might require multiple UOW spill files to hold all of thechanges for that UOW. If the change stream contains multiple large UOWs and the memory cache isinsufficient, PowerExchange might create numerous UOW spill files.

PowerExchange processes the change stream more efficiently if it does not need to use UOW spill files.In addition to degrading extraction performance, large numbers of UOW spill files can cause a disk spaceshortage.

Important: If the change stream contains only small UOWs, the default value might be sufficient.However, the default value is often too small to eliminate UOW spill files. Informatica recommends thatso you specify a larger value.

Configuring PowerExchange for DB2 CDC 63

Page 76: Implement CDC

The location in which PowerExchange allocates the UOW spill files varies by operating system, asfollows:

¨ For Linux and UNIX, PowerExchange uses the current directory by default for UOW spill files. To usea different directory, specify the TMPDIR environment variable.

PowerExchange creates the UOW spill file names by using the operating system tempnam functionwith a prefix of dtlq.

Note: The UOW spill files are temporary files that are deleted when PowerExchange closes them.They are not visible in the directory while open.

¨ For Windows, PowerExchange uses the current directory by default for UOW spill files. To use adifferent directory, specify the TMP environment variable.

PowerExchange creates the UOW spill file names by using the Windows _tempnam function with aprefix of dtlq.

Valid values are from 1 through 519720.

Warning: Because PowerExchange allocates the cache size for each extraction operation, use cautionwhen coding large values for MEMCACHE. Otherwise, many concurrent extraction sessions might causememory constraints.

Default is 1024, or 1 MB.

PASSWORD=password

Optional. Clear text password that is used with the database user ID specified in the USERID parameter.

If you specify the USERID parameter, you must specify either the PASSWORD or EPWD parameter. Donot specify both PASSWORD and EPWD.

RSTRADV=nnnnn

Time interval, in seconds, that PowerExchange waits before advancing restart and sequence tokens for aregistered data source during periods when UOWs do not include any changes of interest for the datasource. When the wait interval expires, PowerExchange returns the next committed "empty UOW," whichincludes only updated restart information.

The wait interval is reset to 0 when PowerExchange completes processing a UOW that includes changesof interest or returns an empty UOW because the wait interval expired without any changes of interesthaving been received.

For example, if you specify 5, PowerExchange waits 5 seconds after it completes processing the lastUOW or after the previous wait interval expires. Then PowerExchange returns the next committed emptyUOW that includes the updated restart information and resets the wait interval to 0.

If RSTRADV is not specified, PowerExchange does not advance restart and sequence tokens for aregistered source during periods when no changes of interest are received. In this case, whenPowerExchange warm starts, it reads all changes, including those not of interest for CDC, from therestart point.

Valid values are 0 through 86400. No default is provided.

Warning: A value of 0 can degrade performance because PowerExchange returns an empty UOW aftereach UOW processed.

SPACEPRI=primary_space

Optional. PowerExchange allocates UOW spill files as temporary files.

Valid values are from 1 through 2147483647.

64 Chapter 4: DB2 for Linux, UNIX, and Windows Change Data Capture

Page 77: Implement CDC

Default is 2147483647, or 2 GB.

UDBSCHEMA=schema

Optional. Schema name that overrides the schema name in capture registrations.

UPDINT=seconds

Optional. Minimum number of seconds that PowerExchange must wait after encountering a virtualtimestamp (VTS) in the DB2 log records for a partition before writing a positioning entry to thePowerExchange capture catalog table. The positioning entry, which is composed of a log sequencenumber (LSN) and VTS, indicates a location in the DB2 logs.

Note: The UPDREC minimum number of records must also be met before positioning entries can bewritten to the capture catalog table.

Valid values are from 1 through 2147483647.

Default is 600.

UPDREC=number_records

Optional. Minimum number of DB2 log records that PowerExchange must read for a partition before itcan write a positioning entry to the PowerExchange capture catalog table. The positioning entry, which iscomposed of a LSN and VTS, indicates a location in the DB2 logs.

Note: The UPDINT minimum wait period must also be met before positioning entries can be written to thecapture catalog table.

Valid values are from 1 through 2147483647.

Default is 10000.

USERID=user_id

Optional. Database user ID. The user ID must have SYSADM or DBADM authority.

If you specify this parameter, you must also specify either the PASSWORD or EPWD parameter.

Using a DB2 Data MapIf you want PowerExchange to perform field-level processing on some records in a DB2 for Linux, UNIX, andWindows source table, you must use a data map.

For example, in some DB2 environments, a table can contain a single column that stores an array of fields in aformat that is not consistent with the column datatype, such as a CHAR or VARCHAR column that stores multiplepacked data fields. You can use an expression to modify this data before PowerCenter replicates it to a target.Also, if you add a user-defined field to a table in record view, you can build an expression to populate it. In thePowerExchange Navigator, you can define expressions only for data maps.

You might have data maps available for your source tables if you used PowerExchange bulk data movement tomaterialize your data targets. Bulk data movement requires data maps. You can use the bulk data maps for CDC ifyou merge them with the extraction maps for your data sources. The PowerExchange Navigator automaticallygenerates an extraction map when you create a capture registration. Alternatively, you can manually add anextraction map.

Note: The field names in the data map must match the actual column names, as indicated in the DB2 captureregistration.

Using a DB2 Data Map 65

Page 78: Implement CDC

Task Flow for DB2 Data Map UsePerform the following tasks to use a DB2 data map for change data capture:

1. In the PowerExchange Navigator, create a capture registration for the DB2 source table.

2. Create a DB2 data map for the same DB2 source table if one is not available from a previous bulk datamovement operation.

3. Merge the DB2 data map with the extraction map for the table.

4. Perform a row test on the merged extraction map.

RELATED TOPICS:¨ “Testing a Change Data Extraction” on page 126

Managing DB2 CDCYou might need to stop DB2 for Linux, UNIX, and Windows CDC for source tables occasionally, for example, tochange the table definitions.

Stopping DB2 CDCYou might need to stop change data capture for a DB2 source table to perform troubleshooting or routinemaintenance tasks, such as maintenance on the capture catalog table or redistribution of table data acrossreconfigured database partitions.

To stop change data capture, use one of the following methods:

¨ Open the capture registration for a source table, and change the Status value from Active to History.

Warning: After you set the status of a capture registration to History, you cannot activate the registrationagain. This status change permanently stops change data capture based on the capture registration.

¨ To temporarily stop change data capture, alter the DB2 table to specify the DATA CAPTURE NONE clause:ALTER owner.table_name DATA CAPTURE NONE

When DATA CAPTURE NONE is specified, DB2 no longer writes changes to the DB2 log files in expandedformat. Because CDC requires expanded format, PowerExchange can no longer capture change data for thetable from the log files. If you set it back to DATA CAPTURE CHANGES, you might need to rematerialize thetargets.

RELATED TOPICS:¨ “Stopping PowerCenter CDC Sessions” on page 142

Changing a DB2 Source Table DefinitionOccasionally, you might need to change the definition of a DB2 for Linux, UNIX, and Windows source table that isregistered for change data capture. If your metadata changes affect the columns from which change data iscaptured, use this procedure to enable PowerExchange to switch to the updated table definition, while preservingaccess to previously captured data.

66 Chapter 4: DB2 for Linux, UNIX, and Windows Change Data Capture

Page 79: Implement CDC

Perform this procedure whenever you add, alter, or drop columns for which change data is captured. You do notneed to perform this procedure if you are selectively capturing change data for a subset of columns and none ofthe selected columns are affected by the metadata changes.

Tip: If you no longer need to capture change data from a column in a table, you can remove the column from theextraction map without changing the capture registration. Change data for that column is still captured but is notextracted.

To change a DB2 source table definition:

1. Stop DELETE, INSERT, and UPDATE activity against the table.

2. Verify that any change data that was captured under the previous table definition has completed extractionprocessing. Then stop all workflows that extract change data for the table.

3. In the PowerExchange Navigator, open the original capture registration and set its status to History.

Alternatively, if you need to add or delete existing columns, right-click the capture registration and clickAmend Columns. You can then add or delete columns, as needed. This action creates a new version of thecapture registration that has a status of Inactive.

Note: PowerExchange does not capture change data based on capture registrations that have a status ofHistory or Inactive.

4. Use DDL to make the table changes.

5. In the PowerExchange Navigator, create a new capture registration that reflects the metadata changes andset its status to Active.

Alternatively, if you created a new version of the original capture registration by amending columns, you canadd any new columns that you defined. Then set the capture registration status to Active. Also, edit theassociated extraction map to point to the new capture registration version. Right-click the associatedextraction map and click Amend Capture Registrations.

PowerExchange uses the newly activated capture registration for change data capture.

6. If necessary, change the target table definition to reflect the source table metadata changes.

7. In PowerCenter Designer, import the altered source and target tables. Edit the mapping if necessary.

8. If necessary, rematerialize the target tables. After materialization completes, create new restart tokens.

9. Re-enable DELETE, INSERT, and UPDATE activity against the table.

10. Restart extraction processing.

RELATED TOPICS:¨ “Creating Restart Tokens for Extractions” on page 135

Reconfiguring a Partitioned Database or Database Partition GroupIn a DB2 for Linux, UNIX, and Windows partitioned database environment, you might need to perform the followingreconfiguration tasks:

¨ Add a new partition to a partitioned database, or drop an existing partition. Then reconfigure the databasepartition group or groups to reflect the change.

¨ Reconfigure a database partition group by adding or removing existing partitions.

Typically, after making these types of changes, you run the DB2 REDISTRIBUTE DATABASE PARTITIONGROUP command to redistribute table data among the partitions in the updated database partition group.

If PowerExchange change data capture is active in the partitioned database environment, you must use thefollowing procedure to properly resume change data capture after making the reconfiguration changes.

Managing DB2 CDC 67

Page 80: Implement CDC

Adding or Dropping Database PartitionsUse the following procedure to create a new partition in a partitioned database or to drop an existing partition, andthen update the appropriate database partition group for the change:

1. In PowerCenter, stop all CDC sessions that extract change data for the tables in the partitioned databaseinstance.

2. For each table for which the DATA CAPTURE CHANGES clause is specified, specify DATA CAPTURENONE.

Note: This step temporarily disables DB2 capture of changes to its log files. If you do not perform this step,DB2 records the data redistribution changes that result from the RESTRIBUTE command as regular changedata activity.

3. Execute the SQL for adding the new database partition or for dropping an existing partition.

4. Execute the ALTER DATABASE PARTITION GROUP SQL to add the new partition to or remove the droppedpartition from the appropriate database partition group.

5. Run the DB2 REDISTRIBUTE DATABASE PARTITION GROUP command to redistribute table data amongthe partitions in the altered database partition group.

6. Back up the PowerExchange capture catalog table.

7. Run the PowerExchange DTLUCUDB SNAPUPDT command. Set the REPLACE option set to Y. This stepupdates the PowerExchange capture catalog table to reflect the reconfigured partitioned database.

Tip: Informatica recommends that you first perform a test run with the REPLACE option set to N.

8. For each table for which you specified DATA CAPTURE NONE in step 2, reinstate the DATA CAPTURECHANGES clause.

9. Restart the PowerCenter CDC sessions to resume extraction processing.

RELATED TOPICS:¨ “Initializing the Capture Catalog Table” on page 61

Reconfiguring a Database Partition GroupUse the following procedure to add a partition to or remove a partition from a database partition group withoutchanging the partitioning of the partitioned database instance:

1. In PowerCenter, stop all CDC sessions that extract change data for the tables in the partitioned databaseinstance.

2. For each table for which the DATA CAPTURE CHANGES clause is specified, specify DATA CAPTURENONE.

Note: This step temporarily disables DB2 capture of changes to its log files. If you do not perform this step,DB2 records the data redistribution changes that result from the RESTRIBUTE command as regular changedata activity.

3. Execute the ALTER DATABASE PARTITION GROUP SQL to add the new partition to or remove the droppedpartition from the appropriate database partition group.

4. Run the DB2 REDISTRIBUTE DATABASE PARTITION GROUP command to redistribute table data amongthe partitions in the altered database partition group.

5. For each table for which you specified DATA CAPTURE NONE in step 2, reinstate the DATA CAPTURECHANGES clause.

6. Restart the PowerCenter CDC sessions to resume extraction processing.

68 Chapter 4: DB2 for Linux, UNIX, and Windows Change Data Capture

Page 81: Implement CDC

DB2 for Linux, UNIX, and Windows CDCTroubleshooting

If you encounter the following issue when running DB2 for Linux, UNIX, and Windows CDC, attempt the solutionthat is described. If you cannot resolve the problem, contact Informatica Global Customer Support.

Workaround for SQL1224 Error on AIXOn AIX systems only, you might receive the following PowerExchange message for a DB2 SQL1224 error whenyou connect locally to a DB2 database that has multiple other local connections:

PWX-20604 State=08001, Code=-1224, Msg=[IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032.

To circumvent this problem, implement a loopback TCP/IP connection for the local DB2 database. The databasecan then function as a remote client that uses TCP/IP instead of interprocess communications (IPC) over sharedmemory.

To implement a loopback connection without changing the database alias that users enter for databaseconnection, issue the following DB2 commands:

db2 catalog tcpip node node_name1 remote server_name1 server port_number1 db2 uncatalog database database_name1db2 catalog database database_name1 at node node_name1db2 catalog database database_name1 as database_alias1db2 catalog database database_alias1 as database_name1 at node node_name1

For more information about these commands, see your IBM DB2 documentation.

IBM APARs for Specific IssuesIf you encounter the issues documented in the following IBM APARs, go to the IBM Web site for more informationor apply the appropriate FixPak for your DB2 version.

DB2 for Linux, UNIX, and Windows 9.5:

¨ The following issue can cause invalid PowerExchange capture registrations, which include character columnswith an incorrect code page:

JR30422: "ALTER TABLE ALTER COLUMN" STATEMENT DOES NOT ALTER THE CODEPAGE COLUMN IN THE SYSCAT.COLUMNS VIEW.

To resolve this issue, search the IBM Web site for the latest information about this APAR.

DB2 for Linux, UNIX, and Windows 9.1:

¨ The following issue can result in a SQL error message:IY87631: PESSIMISTIC LOCKING FOR CLI SQL_CONCUR_LOCK NO LONGER WORKING IN V8

The SQL error message is:SQL0644N Invalid value specified for keyword "CONCURRENCY" in statement "ATTRIBUTE-STRING". SQLSTATE=42615

To resolve this issue, apply DB2 9.1 FixPak 1 or later.

¨ The following issue can cause invalid PowerExchange capture registrations that include character columns withan incorrect code page:

JR30420: "ALTER TABLE ALTER COLUMN" STATEMENT DOES NOT ALTER THE CODEPAGE COLUMN IN THE SYSCAT.COLUMNS VIEW.

To resolve this issue, search the IBM Web site for the latest information about this APAR.

DB2 for Linux, UNIX, and Windows CDC Troubleshooting 69

Page 82: Implement CDC

C H A P T E R 5

Microsoft SQL Server Change DataCapture

This chapter includes the following topics:

¨ Microsoft SQL Server CDC Overview, 70

¨ Planning for SQL Server CDC, 71

¨ Configuring SQL Server for CDC, 73

¨ Configuring PowerExchange for SQL Server CDC, 74

¨ Managing SQL Server CDC, 78

Microsoft SQL Server CDC OverviewPowerExchange uses SQL Server transactional replication to capture change data from SQL Server distributiondatabases. PowerExchange uses the PowerExchange Client for PowerCenter (PWXPC) to coordinate withPowerCenter to move the captured change data to one or more targets.

For CDC to work, you must enable SQL Server Replication on the system from which change data is to becaptured. If your database has a high volume of change activity, you should use a distributed server as the host ofthe distribution database.

To configure CDC in PowerExchange, you must define a capture registration for each source table. In the captureregistration, you can select a subset of columns for which to capture data. PowerExchange generates acorresponding extraction map.

If you want to use the PowerExchange Logger for Linux, UNIX, and Windows to capture change data and write itto PowerExchange Logger log files, configure the PowerExchange Logger. The change data is then extracted fromthe PowerExchange Logger log files. Benefits of the PowerExchange Logger include fewer database accessesand faster CDC restart.

PowerExchange works with PowerCenter to extract change data from the SQL Server distribution database orPowerExchange Logger log files and load that data to one or more targets.

RELATED TOPICS:¨ “PowerExchange Logger for Linux, UNIX, and Windows” on page 19

¨ “Introduction to Change Data Extraction” on page 105

¨ “Extracting Change Data” on page 125

70

Page 83: Implement CDC

Planning for SQL Server CDCBefore you configure SQL Server change data capture (CDC), verify that the following prerequisites and userauthority requirements are met. Also, review the restrictions so that you can properly configure CDC.

SQL Server CDC PrerequisitesPowerExchange CDC has some SQL Server prerequisites.

These prerequisites are:

¨ PowerExchange CDC requires an edition of Microsoft SQL Server 2000 or later that supports transactionalreplication. You must configure and enable transactional replication on the source system to participate in CDC.

¨ If you use Microsoft SQL Server 2008, install the Microsoft SQL Server 2005 Backward Compatibilitycomponents if have not done so. You can download these components from the Microsoft Web site.

¨ The Microsoft SQL Server Agent and Log Reader Agent must be running on the Windows machine from whichchange data is extracted. Usually, the SQL Server Agent remains running after it is initially started. For moreinformation, see your SQL Server documentation.

¨ Each source table in the distribution database must have a primary key.

¨ If the PowerExchange Navigator does not reside on the same machine as the Microsoft SQL Server software,you must install the SQL Server client components on the PowerExchange Navigator machine.

Required User Authority for SQL Server CDCPowerExchange CDC requires the following user authority levels:

¨ To create capture registrations in the PowerExchange Navigator, you must be a member of the SQL Serversysadmin server role.

¨ To run change data extractions against a SQL Server distribution database, you must have read access to thatdatabase.

If you do not specify a user ID and password, the PowerExchange Navigator and your extraction processesattempt to use your Windows user ID and password to connect to the SQL Server distribution database.

Datatypes Supported for SQL Server CDCThis topic identifies the SQL Server datatypes that PowerExchange supports for CDC.

The following table lists the datatypes and indicates whether they are supported for CDC:

Datatype Supported for CDC? Comments

bigint Yes

binary Yes

bit Yes

char Yes

date No This datatype was introduced in SQL Server 2008.

Planning for SQL Server CDC 71

Page 84: Implement CDC

Datatype Supported for CDC? Comments

datetime Yes

datetime2 No This datatype was introduced in SQL Server 2008.

datetimeoffset No This datatype was introduced in SQL Server 2008.

decimal Yes

float Yes

geography No This datatype was introduced in SQL Server 2008.

geometry No This datatype was introduced in SQL Server 2008.

hierarchyid No This datatype was introduced in SQL Server 2008.

image1 No Use varbinary(MAX) instead.

int Yes

money Yes

nchar Yes

ntext1 No Use nvarchar(MAX) instead.

numeric Yes

nvarchar Yes

real Yes

smalldatetime Yes

smallint Yes

smallmoney Yes

sql_variant No PowerExchange does not capture change data forsql_variant columns but does capture change datafor other columns in the same table.

text1 No Use varchar(MAX) instead.

time No This datatype was introduced in SQL Server 2008.

timestamp Yes

tinyint Yes

uniqueidentifier Yes PowerCenter imports the uniqueidentifier datatypeas a varchar datatype of 38 characters.

72 Chapter 5: Microsoft SQL Server Change Data Capture

Page 85: Implement CDC

Datatype Supported for CDC? Comments

user-defined datatypes (UDTs) Yes PowerExchange treats a UDT in the same way asthe datatype on which the UDT is based.

varbinary Yes

varchar Yes

xml Yes PowerExchange treats this datatype asvarchar(MAX).

1. PowerExchange might not be able to capture change data for columns that have the datatypes of image, ntext, or textbecause of SQL Server transactional replication restrictions on these types of columns. Instead, use the alternative datatypesthat Microsoft recommends, as shown in the Comments column.

SQL Server CDC RestrictionsThe following restrictions apply to SQL Server CDC:

¨ PowerExchange does not capture change data for SQL Server system tables.

¨ The maximum length of a row for which PowerExchange can capture and process change data is 32 KB.

¨ PowerExchange does not capture the user ID that is associated with the original transaction that updated thedatabase.

¨ The timestamp that PowerExchange records for each captured change indicates when the change wascaptured, not when the original transaction occurred.

¨ PowerExchange does not capture change data for derived columns that are not persisted. SQL Servercomputes values for these columns at run-time based on an expression but does not store the values in a table.

¨ SQL Server publishes deferred updates to SQL Server tables as DELETEs followed by INSERTs rather than asUPDATEs. Consequently, PowerExchange propagates deferred updates as DELETEs followed by INSERTs,even if you select AI for the Image Type attribute in the CDC connection. PowerExchange does not includebefore image (BI) and change indicator (CI) information in DELETE and INSERT operations. For moreinformation about deferred updates, see your Microsoft SQL Server documentation.

Configuring SQL Server for CDCYou must perform a few configuration tasks to prepare SQL Server for PowerExchange change data capture(CDC).

If your SQL Server tables have a high level of update activity, use a distributed server as the host of thedistribution database from which change data is captured. This practice prevents competition betweenPowerExchange CDC and your production database for CPU use and disk storage.

Configuring SQL Server for CDC 73

Page 86: Implement CDC

To configure SQL Server for PowerExchange CDC, perform the following tasks:

1. Start the SQL Server Agent and Log Reader Agent if they are not running. For more information, see yourMicrosoft SQL Server documentation.

2. Configure and enable SQL Server transactional replication. For more information, see your Microsoft SQLServer documentation.

Tip: The default transactional retention period at the Distributor is 72 hours. If you are use thePowerExchange Logger, accept this default retention period. If you do not use the PowerExchange Logger,Informatica recommends that you increase the retention period to 14 days. However, you might need to alower value if you have a high volume of transactions or space constraints.

3. Verify that each source table in the distribution database has a primary key.

Configuring PowerExchange for SQL Server CDCThe tasks that you perform to configure PowerExchange for change data capture (CDC) depend on whether youwant to use the PowerExchange Logger for Linux, UNIX, and Windows and the extraction mode you plan to use.

RELATED TOPICS:¨ “PowerExchange Logger for Linux, UNIX, and Windows” on page 19

Configuring PowerExchange CDC without the PowerExchange LoggerIf you plan to run extractions in real-time extraction mode and not use the PowerExchange Logger for Linux, UNIX,and Windows, complete the following tasks to configure PowerExchange CDC:

1. When you configure the dbmover.cfg file, define the following statements:

¨ CAPT_PATH

¨ CAPT_XTRA

¨ MSQL CAPI_CONNECTION

2. In the PowerExchange Navigator, create a capture registration for each SQL Server source table. ThePowerExchange Navigator generates a corresponding extraction map.

Tip: Set the Condense option to Part even though you do not plan to use the PowerExchange Logger,unless you have a particular reason not to do so. This practice prevents having to change the captureregistrations later if you decide to use the PowerExchange Logger. You might want to set the Condenseoption to None if you run both real-time and continuous extractions against tables defined by the samecapture registrations and do not want the PowerExchange Logger to capture change data for certainregistered tables.

If capture registrations already exist for these tables, delete the existing registrations and extraction maps andcreate new ones.

The PowerExchange Navigator generates a corresponding extraction map for each capture registration.

3. Activate the capture registrations. Usually, you do this task after materializing the targets.Next Step: Configure and start extractions. You must use real-time extraction mode.

RELATED TOPICS:¨ “Customizing dbmover.cfg for SQL Server CDC” on page 75

¨ “Introduction to Change Data Extraction” on page 105

74 Chapter 5: Microsoft SQL Server Change Data Capture

Page 87: Implement CDC

¨ “Extracting Change Data” on page 125

Configuring PowerExchange CDC with the PowerExchange LoggerIf you plan to run extractions in batch or continuous extraction mode and use the PowerExchange Logger forLinux, UNIX, and Windows, complete the following tasks to configure PowerExchange CDC:

1. When you configure the dbmover.cfg file, define the following statements:

¨ CAPT_PATH

¨ CAPT_XTRA

¨ MSQL CAPI_CONNECTION

¨ CAPX CAPI_CONNECTION (for continuous extraction mode only)

2. Configure the pwxccl.cfg file for the PowerExchange Logger.

3. In the PowerExchange Navigator create a capture registration for each SQL Server source table. You mustset the Condense option to Part. The PowerExchange Navigator generates a corresponding extraction map.

If capture registrations already exist for these tables, delete the existing registrations and extraction maps andcreate new ones.

4. Start the PowerExchange Logger.

5. Activate the capture registrations. Usually, you do this task after materializing the targets.Next Step: Configure and start extractions. You can use either batch extraction mode or continuous extractionmode.

RELATED TOPICS:¨ “Customizing the PowerExchange Logger Configuration File” on page 28

¨ “Starting the PowerExchange Logger” on page 47

¨ “Customizing dbmover.cfg for SQL Server CDC” on page 75

¨ “Introduction to Change Data Extraction” on page 105

¨ “Extracting Change Data” on page 125

¨ “CAPX CAPI_CONNECTION Parameters” on page 14

Customizing dbmover.cfg for SQL Server CDCIn the dbmover.cfg configuration file, include the CAPI connection statement that is specific to SQL Server. Alsoadd the other statements that are required for CDC and any optional statements that you want to use.

The following statements are required for SQL Server CDC:

¨ CAPT_PATH. Path to the local directory that stores the following files for CDC: CCT file for captureregistrations, CDEP file for application names used in ODBC extractions, and CDCT file for information aboutPowerExchange Logger for Linux, UNIX, and Windows log files.

¨ CAPT_XTRA. Path to the local directory that stores extraction maps.

¨ MSQL CAPI_CONNECTION. A named set of parameters that the CAPI uses to connect to the change streamand control extraction processing for SQL Server CDC. Add this statement to the dbmover.cfg file on thesystem where SQL Server capture registrations are stored. This location corresponds to the Location nodethat you specify when defining a registration group. Usually, this location is where the source database resides.

If you plan to use the PowerExchange Logger and continuous extraction mode, you must also define the CAPXCAPI_CONNECTION statement.

Configuring PowerExchange for SQL Server CDC 75

Page 88: Implement CDC

To find PowerExchange messages more easily, include the LOGPATH statement. This statement defines aspecific directory for the PowerExchange message log files.

RELATED TOPICS:¨ “CAPX CAPI_CONNECTION Parameters” on page 14

¨ “Microsoft SQL Server CAPI_CONNECTION Parameters” on page 76

Example StatementsThe following statements are typical of those included in a dmover.cfg for SQL Server CDC:

LOGPATH="C:\Informatica\PowerExchangeVnnn\Logs"CAPT_XTRA="C:\Informatica\PowerExchangeVnnn\Capture\camaps"CAPT_PATH="C:\Informatica\PowerExchangeVnnn\Capture"CAPI_CONN_NAME=CAPIMSSCCAPI_CONNECTION=(NAME=CAPIMSSC ,TYPE=(MSQL,DISTSRV=AUX159908\PWXPC ,DISTDB=distribution ,RSTRADV=30))

Note: You must use non-curly double quotation marks around values that include a space.

Microsoft SQL Server CAPI_CONNECTION ParametersThe MSQL CAPI_CONNECTION statement specifies the Consumer API (CAPI) parameters needed for MicrosoftSQL Server CDC sources.

Data Sources: Microsoft SQL ServerRequired: Yes for Microsoft SQL

Server CDC

Syntax:

CAPI_CONNECTION=( [DLLTRACE=trace_id,] NAME=name, [TRACE=trace,] TYPE=(MSQL, DISTDB=distribution_database, DISTSRV=distribution_server, [DWFLAGS=flag1flag2flag3,] [EOF={N|Y},] [MEMCACHE=cache_size,] [POLWAIT=seconds,] [RSTRADV=seconds] ))

Parameters:

Enter the following parameters:

DLLTRACE=trace_id

Optional. User-defined name of the TRACE statement that activates internal DLL tracing for this CAPI.Specify this parameter only at the direction of Informatica Global Customer Support.

NAME=name

Required. Unique user-defined name for this CAPI_CONNECTION statement.

Maximum length is eight alphanumeric characters.

76 Chapter 5: Microsoft SQL Server Change Data Capture

Page 89: Implement CDC

TRACE=trace

Optional. User-defined name of the TRACE statement that activates the common CAPI tracing. Specify thisparameter only at the direction of Informatica Global Customer Support.

TYPE=(MSQL, ... )

Required. Type of CAPI_CONNECTION statement. For Microsoft SQL Server sources, this value must beMSQL.

DISTDB=distribution_database

Required. Name of the distribution database.

DISTSRV=distribution_server

Required. Network name of the server that hosts the distribution database.

Important: This name is different from the network name of the instance if the distribution databaseresides on a different server.

DWFLAGS=flag1flag2flag3

Optional. Series of three positional parameters that control whether processing stops or continues whendata loss, truncation, or schema changes occur.

Enter the following positional parameters:

¨ flag1. Controls whether PowerExchange stops a change data extraction when data of an unexpectedlength is retrieved from the distribution database. Enter Y to continue processing or N to stopprocessing.

¨ flag2. Controls whether PowerExchange stops a change data extraction when a schema change isdetected. Enter Y to continue processing or N to stop processing.

¨ flag3. Controls whether PowerExchange stops a change data extraction when the requested startsequence is not found in the transaction log. Enter Y to continue processing or N to stop processing.

Specify this parameter only at the direction of Informatica Global Customer Support.

Default is NNN.

EOF={N|Y}

Optional. Controls whether PowerExchange stops change data extractions when the end-of-log (EOL) isreached.

Enter one of the following options:

¨ N. PowerExchange does not stop change data extractions when EOL is reached.

¨ Y. PowerExchange stops change data extractions when EOL is reached.

Because this parameter affects all users of the AS4J CAPI_CONNECTION statement, Informaticarecommends that you use one of the following alternative methods to stop change data extractions atEOL:

¨ For CDC sessions that use real-time extraction mode, enter 0 for the Idle Time attribute of the PWXMSSQL CDC Real Time application connection.

¨ For PowerExchange Logger for Linux, UNIX, and Windows, enter 1 for the COLL_END_LOGstatement in the pwxccl.cfg configuration file.

¨ For CDC sessions that use ODBC connections, enter 0 for the WAITTIME parameter in the ODBCdata source.

Default is N.

Configuring PowerExchange for SQL Server CDC 77

Page 90: Implement CDC

MEMCACHE=cache_size

Optional. Memory cache size, in kilobytes, that PowerExchange allocates to cache a single change.

Valid values are from 1 through 519720.

Default is 248.

POLWAIT=seconds

Optional. Time interval, in seconds, that PowerExchange waits after reaching the end of current databefore polling for new data.

Valid values are from 1 through 2147483647.

Default is 1.

RSTRADV=nnnnn

Time interval, in seconds, that PowerExchange waits before advancing restart and sequence tokens for aregistered data source during periods when UOWs do not include any changes of interest for the datasource. When the wait interval expires, PowerExchange returns the next committed "empty UOW," whichincludes only updated restart information.

The wait interval is reset to 0 when PowerExchange completes processing a UOW that includes changesof interest or returns an empty UOW because the wait interval expired without any changes of interesthaving been received.

For example, if you specify 5, PowerExchange waits 5 seconds after it completes processing the lastUOW or after the previous wait interval expires. Then PowerExchange returns the next committed emptyUOW that includes the updated restart information and resets the wait interval to 0.

If RSTRADV is not specified, PowerExchange does not advance restart and sequence tokens for aregistered source during periods when no changes of interest are received. In this case, whenPowerExchange warm starts, it reads all changes, including those not of interest for CDC, from therestart point.

Valid values are 0 through 86400. No default is provided.

Warning: A value of 0 can degrade performance because PowerExchange returns an empty UOW aftereach UOW processed.

Managing SQL Server CDCYou might need to stop CDC for source tables occasionally, for example, to change the table definitions.

Disabling Publication of Change Data for a SQL Server SourceYou can disable publication of change data for a SQL Server source. For example, you might disable publicationto perform some database maintenance, change the table definition, or avoid capturing unwanted changes.

u Open the capture registration for the table, and change the Status setting from Active to History.

This action disables publication of the SQL Server article for the table to the distribution database, whichcauses change capture to stop.

Warning: After the registration status is set to History, you cannot activate the registration for CDC use again.

78 Chapter 5: Microsoft SQL Server Change Data Capture

Page 91: Implement CDC

Changing a SQL Server Source Table DefinitionIf you change the definition of a SQL Server source table that is registered for change data capture, use thisprocedure to enable PowerExchange to use the updated table definition and preserve access to previouslycaptured data. Table definition changes include adding, altering, or dropping columns.

Tip: If you no longer need to capture change data from a column in a table, you can remove the column from theextraction map without changing the capture registration. Change data for that column is still captured but is notextracted.

To change a SQL Server source table definition:

1. Stop DELETE, INSERT, and UPDATE activity against the table.

2. Verify that any change data that was captured under the previous table definition has completed extractionprocessing. Then stop all workflows that extract change data for the table.

3. Delete the capture registration and extraction map.

4. Use DDL to change the table definition in SQL Server.

5. In the PowerExchange Navigator, create a new capture registration that reflects the metadata changes andset its status to Active. PowerExchange creates a corresponding extraction map.

The newly activated capture registration becomes eligible for change data capture.

6. If necessary, change the target table definition to reflect the source table metadata changes.

7. In the PowerCenter Designer, import the altered source and target definitions. Edit the mapping if necessary.

8. If necessary, rematerialize the target tables. After materialization completes, create new restart tokens.

9. Create new restart tokens for the altered table.

10. Re-enable DELETE, INSERT, and UPDATE activity against the table.

11. Cold start the extraction workflows.

Managing SQL Server CDC 79

Page 92: Implement CDC

C H A P T E R 6

Oracle Change Data Capture withOracle LogMiner

This chapter includes the following topics:

¨ Overview of Oracle LogMiner CDC, 80

¨ Planning for Oracle LogMiner CDC, 81

¨ Oracle Configuration for LogMiner CDC, 83

¨ PowerExchange Configuration for Oracle LogMiner CDC, 88

¨ Management of Oracle LogMiner CDC, 102

Overview of Oracle LogMiner CDCPowerExchange can use Oracle LogMiner to read change data from Oracle redo logs. To move the change data toone or more targets, PowerExchange uses the PowerExchange Client for PowerCenter (PWXPC) in conjunctionwith PowerCenter.

To implement Oracle LogMiner CDC, you need to perform configuration tasks in Oracle, PowerExchange, andPowerCenter.

In Oracle, ensure that ARCHIVELOG mode with global minimal supplemental logging is enabled so that changedata can be retrieved from archived redo logs. Also, ensure that a copy of the Oracle online catalog exists in thearchived redo logs. PowerExchange requires a copy of the catalog to determine restart points for change dataextraction processing.

In PowerExchange, define a capture registration for each source table. In the capture registration, you can select asubset of columns for which to capture data. PowerExchange generates a corresponding extraction map.

If you want to use the PowerExchange Logger for Linux, UNIX, and Windows, also configure the PowerExchangeLogger. The PowerExchange Logger can capture change data from Oracle redo logs and write only the successfulunits of work (UOWs), in chronological order based on commit time, to PowerExchange Logger log files. Thechange data is then extracted from the PowerExchange Logger log files in either continuous extraction mode orbatch extraction mode. Benefits of using the PowerExchange Logger include fewer database accesses, fasterCDC restart, and no need to prolong retention of the Oracle redo files for change capture.

Note: Informatica strongly recommends that you use the PowerExchange Logger for Oracle LogMiner CDC. If youuse real-time extraction mode without the PowerExchange Logger, PowerExchange starts a separate OracleLogMiner session for each extraction session. Running multiple, concurrent sessions can significantly degradeperformance of the system where LogMiner runs.

80

Page 93: Implement CDC

PowerExchange works with PowerCenter to extract change data from Oracle redo logs or PowerExchange Loggerlog files and load that data to one or more targets.

RELATED TOPICS:¨ “PowerExchange Logger for Linux, UNIX, and Windows” on page 19

¨ “Introduction to Change Data Extraction” on page 105

Planning for Oracle LogMiner CDCBefore you configure Oracle change data capture, review the following restrictions, requirements, andperformance considerations.

Requirements and Restrictions for Oracle LogMiner CDCThe following restrictions and requirements apply to Oracle LogMiner CDC:

¨ The Oracle instance must be running in ARCHIVELOG mode.

¨ Oracle global minimal supplemental logging must be enabled.

¨ A copy of the Oracle catalog must exist in the Oracle archived redo logs.

¨ Oracle LogMiner continuous mining reads archived redo logs only from the directory to which they wereoriginally written.

¨ If you truncate Oracle source tables from which change data is captured, or if you drop and re-create sourcetables, PowerExchange cannot continue to extract change data for these tables. In these situations, you mustrematerialize the corresponding targets.

¨ If PowerExchange CDC is not installed on the same machine as the Oracle instance, configure a TNS entry onthe client machine with SERVER=DEDICATED in the CONNECT_DATA section of the connect descriptor. Thisspecification is also required if the network is configured for Multi-Threaded Server (MTS) mode.

¨ PowerExchange requires the Oracle Client binaries. When you install Oracle, the Client binaries are installedby default. To use SQL*Net connectivity on a machine that does not have an installed Oracle instance, youmust install the Oracle Client.

¨ The maximum length of a row for which PowerExchange can capture and process change data is 32 KB.

Datatypes Supported for Oracle LogMiner CDCPowerExchange uses Oracle LogMiner to retrieve changes from the Oracle redo logs. Oracle does not log, ordoes not completely log, data with some datatypes in the Oracle redo logs. Consequently, PowerExchange cannotretrieve change data for columns that have these datatypes.

The following table identifies the Oracle datatypes that PowerExchange supports for Oracle LogMiner CDC:

Datatype Supported for CDC? Comments

BFILE No Data for columns that have this datatype are notcompletely logged in the Oracle redo logs andcannot be captured.

BINARY_DOUBLE Yes

Planning for Oracle LogMiner CDC 81

Page 94: Implement CDC

Datatype Supported for CDC? Comments

BINARY_FLOAT Yes

CHAR Yes

DATE Yes

FLOAT Yes

LOBs No

LONG No

LONG RAW No

NCHAR Yes For CDC support of this datatype, you must havePowerExchange 8.5 or later.

NUMBER Yes PowerExchange handles NUMBER columns asfollows:- Numbers with a scale of 0 and a precision value

less than 10 are treated as INTEGER.- Numbers with a defined precision and scale are

treated as NUMCHAR.- Numbers with an undefined precision and scale

are treated as DOUBLE.

NVARCHAR2 Yes For CDC support of this datatype, you must havePowerExchange 8.5 or later.

RAW Yes

TIMESTAMP Yes

TIMESTAMP WITH TIME ZONE No

TIMESTAMP WITH LOCAL TIME ZONE No

VARCHAR2 Yes

SQL*Loader RestrictionsPowerExchange CDC can capture data that was loaded into Oracle tables by the SQL*Loader utility. However, thefollowing restrictions apply:

¨ The load type must be conventional path. PowerExchange cannot capture data that was loaded by a directpath load because Oracle LogMiner does not support direct path loads.

¨ The load method should be Insert, Append, or Replace. Do not use Truncate. Truncate causes SQL*Loader toissue TRUNCATE TABLE DDL. Because PowerExchange does not capture DDL, it cannot capture any rowdeletions that result from TRUNCATE TABLE DDL.

82 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 95: Implement CDC

Performance Considerations for Oracle LogMiner CDCThe following considerations pertain to PowerExchange CDC performance:

¨ Use real-time extraction mode only if you run very few concurrent change data extractions. PowerExchangeCDC creates an Oracle LogMiner session for each real-time extraction. Because LogMiner sessions areresource intensive, they can impact overall system performance. Instead, use continuous extraction mode. Forcontinuous extraction mode, PowerExchange extracts change data from PowerExchange Logger log files.

¨ If you use continuous extraction mode, minimize the size of the CDCT file. The CDCT file contains informationabout the PowerExchange Logger log files. PowerExchange reads the CDCT file each time the interval that isspecified in the FILEWAIT parameter of the CAPX CAPI_CONNECTION statement elapses. If a CDCT file islarge, PowerExchange read operations can result in a high level of I/O activity, increased use of systemresources, and increased extraction latency. To manage the CDCT file size, use the COND_CDCT_RET_Pstatement in the pwxccl.cfg configuration file for the PowerExchange Logger for Linux, UNIX, and Windows.

Oracle Configuration for LogMiner CDCPowerExchange provides sample script files to help you configure Oracle for PowerExchange CDC.

Configuration Script FilesTo configure Oracle for CDC, use the sample Oracle configuration script files in the PowerExchange installationdirectory.

PowerExchange provides the following script files for RAC and non-RAC environments:

oracapt.sql

Configures Oracle for CDC in a non-RAC environment.

oracapt_rac.sql

Configures Oracle for CDC in an RAC environment. PowerExchange supports CDC in RAC environments onlyfor Oracle 10g Release 2 and later.

Each script file contains sample SQL statements for performing the necessary configuration tasks. Before runningany of the SQL statements, read the comments in the script file. The comments provide important information.

Use the script file that is appropriate for your environment to perform the following configuration tasks:

¨ Grant required Oracle privileges.

¨ Enable ARCHIVELOG mode.

¨ Enable global minimal supplement logging.

¨ Configure Oracle LogMiner.

¨ Copy the Oracle catalog to the archived redo logs.

¨ Set the transaction_auditing parameter to “True,” if you run an Oracle version earlier than 10.2.01.

Configuring Oracle for LogMiner CDCThis section describes steps for configuring Oracle for LogMiner CDC. For sample SQL and DDL, refer to theoracapt.sql file.

Oracle Configuration for LogMiner CDC 83

Page 96: Implement CDC

Step 1. Specify an Archive Log DestinationEdit your init.ora file to specify the archive log destination and file-name format. For more information, see yourOracle database administrator's guide.

Alternatively, if you use a server parameter file (spfile), issue the following SQL statements to indicate the archivelog destination:

CONNECT SYS/sys_pwd AS SYSDBA; ALTER SYSTEM SET log_archive_dest_1 = 'location=/oracle_path/arch' SCOPE=SPFILE;

Step 2. Set the Oracle Compatible Parameter (Oracle 9.2.0)If you use Oracle 9.2.0 and the "compatible" parameter is not specified in the init.ora or spfile file, or if the"compatible" parameter is set to an Oracle version earlier than 9.2.0, you must set this parameter to 9.2.0.

To set this parameter, issue the following SQL statement:

ALTER SYSTEM SET compatible=?9.2.0? SCOPE=SPFILE;

Step 3. Set the Oracle transaction_auditing ParameterIf you use an Oracle version earlier than 10.1.0.1, verify that the transaction_auditing parameter is set to "True" inthe init.ora or spfile file. This setting is required for Oracle CDC to work properly.

To set this parameter in the spfile file, execute the following SQL statement in an SQL*Plus session:

Alter SYSTEM SET transaction_auditing=TRUE SCOPE=SPFILE;

For more information, see the oracapt.sql or oracapt_rac.sql file.

If you run Oracle 9.2.0.6 or 10.1.0.4, install the appropriate patch for your release instead. You can find the patchby searching My Oracle Support (formerly MetaLink) Knowledge Base for bug report 3456259.

Step 4. Enable ARCHIVELOG ModeFor CDC, Oracle must be running in ARCHIVELOG mode.

By default, ARCHIVELOG mode is not enabled.

To enable ARCHIVELOG mode, issue the following statements:

SHUTDOWN IMMEDIATE;STARTUP MOUNT;ALTER DATABASE ARCHIVELOG;ALTER DATABASE OPEN;SHUTDOWN IMMEDIATE;STARTUP;

Tip: Back up your database after both SHUTDOWN commands.

If you use the Oracle init.ora initialization parameter file, you must edit the appropriate parameters in this file toidentify the archive log destination and file name format. For more information, see the Oracle databaseadministrator’s guide for your Oracle version.

If you use a server parameter file (spfile), you must execute some ALTER SYSTEM SET SQL. The specific SQLand configuration steps vary for RAC and non-RAC environments and are described in the oracapt.sql andoracapt_rac.sql files.

Step 5. Stop and Restart the Oracle DatabaseIf you set the ARCHIVELOG mode or the "compatible" or "transaction_auditing" parameter, you must stop andrestart the Oracle instance for your changes to take effect.

84 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 97: Implement CDC

For more information, see the oracapt.sql file.

Step 6. Grant User Privileges Required for Oracle LogMiner CDCTo extract change data from Oracle redo logs, a CDC user must have specific Oracle system and objectprivileges. You can either use an existing user who has the required authority as the CDC user, or create a userand grant the required privileges to that user.

The oracapt.sql and oracapt_rac.sql configuration script files contain the required SQL GRANT statements. Editthis SQL, as needed, for your environment.

The following table identifies the minimum system privileges that Oracle CDC users must have:

System Privilege Oracle Release Description

ALTER ANY TABLE All Required for users that create capture registrations and allowPowerExchange to automatically run the DDL that isgenerated for creating a supplemental log group atregistration completion.

CONNECT All Required for users that extract Oracle CDC data in real timeand for PowerExchange Logger tasks.

LOCK ANY TABLE All If you specify GENRLOCK=Y in the ORCLCAPI_CONNECTION statement of the dbmover.cfg file, youmust either grant the LOCK ANY TABLE system privilege orgrant the SELECT object privilege on each table that isregistered for change data capture.

SELECT ANY TRANSACTION 10g and later Required for users who extract Oracle CDC data in real timeand for PowerExchange Logger tasks.

The following table identifies the minimum object privileges that Oracle CDC users must have:

Object Name Object Privilege

Source tables If you specify GENRLOCK=Y in theORCL CAPI_CONNECTIONstatement of the dbmover.cfg file, youmust either grant the LOCK ANYTABLE system privilege or grant theSELECT object privilege on eachtable that is registered for changedata capture.

PUBLIC.V$ARCHIVED_LOG SELECT

PUBLIC.V$DATABASE SELECT

PUBLIC.V$INSTANCE SELECT

PUBLIC.V$LOGMNR_CONTENTS SELECT

PUBLIC.V$NLS_PARAMETERS SELECT

PUBLIC.V$PARAMETER SELECT

PUBLIC.V$TRANSACTION SELECT

Oracle Configuration for LogMiner CDC 85

Page 98: Implement CDC

Object Name Object Privilege

SYS.DBA_LOG_GROUPS SELECT

SYS.DBA_LOG_GROUP_COLUMNS SELECT

SYS.DBMS_FLASHBACK EXECUTE

SYS.DBMS_LOGMNR EXECUTE

SYS.DBMS_LOGMNR_D EXECUTE

Step 7. Configuring Oracle Minimal Global Supplemental LoggingPowerExchange requires Oracle to use minimal global supplemental logging for Oracle LogMiner to properlyhandle chained rows.

To enable minimal global supplemental logging, log in to the Oracle database and execute the following SQLstatement, which is included in the oracapt.sql and oracapt_rac.sql configuration files:

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;COMMIT;

If you do not know whether minimal global supplemental logging has been enabled for your database, you can stillexecute this ALTER statement. The statement has no effect if minimal supplemental logging is active.

Note: You must also define a supplemental log group for each Oracle source table. When you register an Oraclesource table in the PowerExchange Navigator, PowerExchange generates DDL for adding a supplemental loggroup for the table. Oracle supplemental log groups cause Oracle to log full before- and after-images of the datathat changed. PowerExchange requires these images to properly process changes.

Step 8. Create a Table Space for Oracle LogMiner Use (Optional)Create a table space exclusively for Oracle LogMiner use. This step is necessary only if you have not previouslyconfigured LogMiner for use with other Oracle features such as logical standby databases, Oracle Streams, ornative Oracle change capture processes.

This step prevents the SYSTEM table space (in Oracle 9i) or SYSAUX table space (in Oracle 10g or later) frombecoming full and causing service problems during PowerExchange CDC.

To create the LogMiner table space, use the DDL in the PowerExchange oracapt.sql or oracapt_rac.sql file that issupplied for this purpose.

1. To create the table space, issue the following DDL:CREATE TABLESPACE "LOGMNRTS" NOLOGGING DATAFILE '/oracle_path/datafilename.ora' SIZE 50M REUSE AUTOEXTEND ON NEXT 10M MAXSIZE 100M EXTENT MANAGEMENT LOCAL;

Specify NOLOGGING if you use Oracle LogMiner only for PowerExchange CDC and an occasional query. ChangeNOLOGGING to LOGGING if you use any of the following Oracle features: logical standby databases, OracleStreams, or native Oracle change capture processes.

86 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 99: Implement CDC

For the DATAFILE value, specify a file name based on your local Oracle database file naming standards forthe data files that comprise this table space.

2. Enter the following command:EXECUTE SYS.DBMS_LOGMNR_D.SET_TABLESPACE('LOGMNRTS');

If this statement fails with the ORA_01353 message, see the comments in oracapt.sql for more information.

3. To recompile the SYS.DBMS_LOGMNR_D package, enter the following command:ALTER PACKAGE SYS.DBMS_LOGMNR_D COMPILE BODY;

Tip: LogMiner opens a number of cursors internally to handle its processing. When you configure LogMinerfor the first time, you might receive messages that state “number of open cursors exceeded.” You canincrease the maximum number of open cursors to handle the extra LogMiner processing.

Step 9. Copy the Oracle Catalog to the Archived LogsPowerExchange CDC requires a copy of the Oracle online catalog in the Oracle archived redo logs to determinethe point from which to restart change data extractions.

PowerExchange reads the last catalog copy in the archived logs, even if you specified ONLINECAT=Y in theORCL CAPI_CONNECTION statement. You should copy the catalog on a routine basis to minimize CDC restarttimes.

To copy the catalog, issue the following command in an SQL*Plus session:

beginSYS.DBMS_LOGMNR_D.BUILD( options => sys.dbms_logmnr_d.store_in_redo_logs);end;/

Tip: Periodically, PowerExchange requests Oracle to recopy the catalog to the Oracle archived redo logs. Tocontrol how often Oracle copies the catalog and the time period within which the copy operation can occur, set theCATBEGIN, CATEND, and CATINT parameters in the ORCL CAPI_CONNECTION statement of the dbmover.cfgfile.

Configuration in an Oracle RAC EnvironmentIf you use Oracle 10g Release 2 or later, PowerExchange can process change data for database instances in areal application cluster (RAC) environment. Certain Oracle patches might be required. For more information, seeKnowledge Base (KB) item 102503.

The Oracle instance from which you run PowerExchange CDC must be able to access the Oracle archived redologs for all Oracle instances in the RAC for which you want to capture change data.

In the init.ora file for each of these Oracle instances, define the LOG_ARCHIVE_DEST_1 parameter to point to thedirectory in which you want Oracle to create the archived logs.

Note: PowerExchange uses Oracle LogMiner to read change data from the archived logs. If you use an archivedlog destination other than the LOG_ARCHIVE_DEST_1 path and LogMiner processing lags behind, problemsmight occur. In this situation, LogMiner starts reading change data from the archived logs in theLOG_ARCHIVE_DEST_1 directory. If these archived logs are inaccessible from the machine with the Oracleinstance to which you are connected, the LogMiner session might fail.

Additional tasks for ensuring access to archived redo logs vary by operating system.

On Windows, you must set up an Oracle flash recovery area on the shared file system that contains all of the tabledata for the RAC. For each Oracle instance in the RAC, set the LOG_ARCHIVE_DEST_1 parameter to point tothat recovery area.

Oracle Configuration for LogMiner CDC 87

Page 100: Implement CDC

On Linux and UNIX, you can use any of the following methods:

¨ Set up an Oracle flash recovery area in the same manner as for Windows.

¨ Store all archived redo logs on shared storage.

¨ Set up Network File System (NFS) access to the archive logs.

If you use shared storage or NFS access, the Oracle instance from which you run CDC must access the archivedlogs of the other RAC member instances. This access uses the mount points that match the archive log directoriesdefined for those member instances. For example, assume that ORA2 is an Oracle instance in a RAC, which has aLOG_ARCHIVE_DEST_1 parameter that points to the following archive log directory:

/ora/arch2/

ORA1 is the Oracle instance that runs CDC. The mount point that the ORA1 machine must use to access theORA2 archive logs is also /ora/arch2/.

Also, all of the Oracle instances in the RAC that participate in CDC must have access to the Oracle online redologs. Usually, these redo logs reside on shared storage.

PowerExchange Configuration for Oracle LogMiner CDCThe tasks that you perform to configure PowerExchange for CDC depend whether you want to use thePowerExchange Logger for Linux, UNIX, and Windows and the extraction mode that you plan to use.

Configuring Oracle LogMiner CDC without the PowerExchange LoggerIf you plan to run extractions in real-time extraction mode and not use the PowerExchange Logger for Linux, UNIX,and Windows, complete the following tasks to configure PowerExchange for Oracle LogMiner CDC:

1. When you configure the dbmover.cfg file on the Oracle source machine, include the following statements:

¨ CAPT_PATH

¨ CAPT_XTRA

¨ ORACLEID

¨ ORCL CAPI_CONNECTION

¨ UOWC CAPI_CONNECTION

For more information, see the PowerExchange Reference Manual.

2. In the PowerExchange Navigator, create a capture registration for each Oracle source table.

If capture registrations already exist for these tables, delete the existing registrations and extraction maps andcreate new ones.

You must enter a name in the Supplemental Log Group Name field.

Tip: Set the Condense option to Part even though you do not plan to use the PowerExchange Logger,unless you have a specific reason not to do so. This practice prevents having to edit the capture registrationslater if you decide to use the PowerExchange Logger. You might want to set the Condense option to None ifyou plan to run both real-time and continuous extractions against tables defined by the same captureregistrations and do not want the PowerExchange Logger to capture change data for some registered tables.

The PowerExchange Navigator generates a corresponding extraction map and the DDL for creating asupplemental log group. If you selected the Execute DDL now option, PowerExchange executes the DDL for

88 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 101: Implement CDC

creating a supplemental log group when you click Finish. If you did not select this option, you must executethe DDL prior to starting extraction processing.

3. Activate the capture registrations. Usually, you do this task after materializing the targets.Next Step: Configure and start extractions. You must use real-time extraction mode.

RELATED TOPICS:¨ “Customizing dbmover.cfg for Oracle LogMiner CDC” on page 90

¨ “Introduction to Change Data Extraction” on page 105

Configuring Oracle LogMiner CDC with the PowerExchange LoggerIf you plan to use the PowerExchange Logger for Linux, UNIX, and Windows and run extractions in batch orcontinuous extraction mode, complete the following tasks to configure PowerExchange for Oracle LogMiner CDC:

1. When you configure the dbmover.cfg file used to access the source tables, include the following statements:

¨ CAPT_PATH

¨ CAPT_XTRA

¨ ORACLEID

¨ ORCL CAPI_CONNECTION

¨ UOWC CAPI_CONNECTION

¨ CAPX CAPI_CONNECTION (for continuous extraction mode only)

For more information, see the PowerExchange Reference Manual.

2. Configure the pwxccl.cfg file for the PowerExchange Logger.

3. Start the PowerExchange Listener on the source machine.

4. Customize the dbmover.cfg files on the Windows machine where the PowerExchange Navigator runs and onthe PowerCenter Integration Service machine, if these machines are separate from the Oracle sourcemachine.

In each of these dbmover.cfg files, you must specify a NODE statement that points to the machine thatcontains the Oracle source tables. On the Windows machine, you must also specify an ORACLEID statement.

5. In the PowerExchange Navigator, create a capture registration for each Oracle source table.

If capture registrations already exist for these tables, delete the existing registrations and extraction maps andcreate new ones.

You must select Part in the Condense list, and enter a name in the Supplemental Log Group Name field.You can also set the Status option to Active, or wait until after you materialize the target tables.

The PowerExchange Navigator generates a corresponding extraction map and the DDL for creating asupplemental log group. If you selected the Execute DDL now option, PowerExchange executes the DDL forcreating a supplemental log group when you click Finish. If you did not select this option, you must executethe DDL prior to starting extraction processing.

6. In the PowerExchange Navigator, perform a database row test on the extraction maps to verify thatPowerExchange can access the source data.

7. After stopping updates to the source tables, materialize the target tables.

8. Start the PowerExchange Logger.

9. Allow changes to be written to the source tables.Next Step: Configure and start extractions. You can use either batch extraction mode or continuous extractionmode.

PowerExchange Configuration for Oracle LogMiner CDC 89

Page 102: Implement CDC

RELATED TOPICS:¨ “Customizing dbmover.cfg for Oracle LogMiner CDC” on page 90

¨ “Customizing the PowerExchange Logger Configuration File” on page 28

¨ “Starting the PowerExchange Logger” on page 47

¨ “Introduction to Change Data Extraction” on page 105

Customizing dbmover.cfg for Oracle LogMiner CDCIn the dbmover.cfg configuration file, include the statements that are required for Oracle LogMiner CDC and anyoptional statements that you want to use.

The following statements are required for Oracle CDC with Oracle LogMiner:

CAPT_PATH

Path to the local directory where the CCT file and CDCT file reside. The CCT file contains captureregistrations. The CDCT file contains information about PowerExchange Logger log files.

CAPT_XTRA

Path to the local directory where extraction maps reside.

ORACLEID

Oracle source instance, database, and connection information.

ORCL CAPI_CONNECTION

A named set of parameters that the CAPI uses to connect to the change stream and control extractionprocessing for Oracle sources.

UOWC CAPI_CONNECTION

A named set of parameters for the UOW Cleanser. The CAPINAME parameter in the UOWCCAPI_CONNECTION points to an ORCL CAPI_CONNECTION.

CAPX CAPI_CONNECTION (required for continuous extraction only)

If you plan to use the PowerExchange Logger and continuous extraction mode, you must also define a CAPXCAPI_CONNECTION statement.

Define the CAPI_CONNECTION statements in the dbmover.cfg file that is on the system where the Oracle captureregistrations are stored. This location corresponds to the Location node that you specify when defining aregistration group. Usually, this location is where the source database resides.

Additionally, Informatica recommends including the LOGPATH and TRACING statements to make findingmessages easier. The LOGPATH statement defines a directory specifically for PowerExchange message log files,and the TRACING statement enables PowerExchange to create an alternative set of message log files for eachPowerExchange process.

For more information about all dbmover.cfg statements, see the PowerExchange Reference Manual.

Example Oracle LogMiner CDC StatementsThe following statements are typical of those included in a dmover.cfg for Oracle LogMiner CDC:

LOGPATH=/pwx/logsTRACING=/PFX=PWXLOG,RECLEN=255,FILENUM=3,APPEND=Y,FLUSH=99)CAPT_XTRA=/pwx/capture/vnnn/camapsCAPT_PATH=/aus/pwx/capture/vnnnORACLEID=(FOX123,FO920DTL)CAPI_SRC_DFLT=(ORA,CAPIUOWC)

90 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 103: Implement CDC

CAPI_CONN_NAME=CAPIUOWC/*/* CAPI connection statements/*/* Both UOWC and ORCL CAPI_CONNECITON statements are required for Oracle CDC.CAPI_CONNECTION=(NAME=CAPIORA ,DLLTRACE=ORA2 ,TYPE=(ORCL ,ARRAYSIZE=1000 ,BYPASSUF=Y ,CATBEGIN=00:01 ,CATEND=23:59 ,CATINT=1440 ,ORACOLL=FOX123 ,SELRETRY=0))CAPI_CONNECTION=(NAME=CAPIUOWC ,TYPE=(UOWC ,CAPINAME=CAPIORA ,MEMCACHE=50000 ,RSTRADV=1800))/* Additional CAPX CAPI_CONNECTION statement is required for continuous extraction mode.CAPI_CONNECTION=(NAME=CAPXORA ,TYPE=(CAPX ,DFLTINST=FOX920))

ORACLEID StatementThe ORACLEID statement specifies the Oracle source database and connection information for PowerExchangeCDC with Oracle LogMiner.

Data Sources: Oracle CDC sourcesRequired: Yes for Oracle LogMiner

CDC

Syntax:

ORACLEID=( collection_id, oracle_db, [source_connect_string,] [capture_connect_string,])

Parameters:

Enter the following positional parameters:

capture_connect_string

Optional. Oracle connection string, defined in TNS, that the PowerExchange Logger uses to connect to theOracle database with the source tables for Oracle LogMiner CDC. This connection string must be specified inthe Oracle Client tnsnames.ora file that is used for connection to the Oracle source database.

If this value is null, the value of the ORACLE_SID environment variable is used by default and thePowerExchange Logger does not use Oracle SQL*Net for connection. If the ORACLE_SID environmentvariable is not defined, the default Oracle database is used, if defined.

For Oracle LogMiner CDC only, if you have multiple Oracle databases and capture changes from a databaseother than the default database, you must specify both the source_connect_string and capture_connect_stringparameters.

Tip: If possible, bypass the use of SQL*Net to improve PowerExchange Logger performance, even if thePowerExchange Logger is running on the same machine as the Oracle source database. Set the following

PowerExchange Configuration for Oracle LogMiner CDC 91

Page 104: Implement CDC

environment variables, whenever possible, to enable connection to the appropriate Oracle database withoutusing the capture_connect_string parameter and SQL*Net:

¨ ORACLE_HOME

¨ ORACLE_SID

¨ PATH

¨ On a Linux or UNIX operating system, one of the following variables: LD_LIBRARY_PATH, LIBPATH, orSHLIB_PATH

collection_id

Required. User-defined identifier for this ORACLEID statement. This value must match the ORACOLLparameter value in the ORCL CAPI_CONNECTION statement, the collection ID in the registration groupdefined for the source tables, and the DBID value in the PowerExchange Logger pwxccl.cfg file. Maximumlength is eight characters.

oracle_db

Required. Name of the Oracle database that contains the source tables you registered for change datacapture.

source_connect_string

Optional. Oracle connection string, defined in TNS, that is used to connect to the Oracle database thatcontains the source tables. This connection string must be defined in the Oracle Client tnsnames.ora file onthe machine with the source database.

For Oracle LogMiner CDC, the source connection string is used only for PowerExchange Navigator access tothe Oracle source database. Enter this parameter in the dbmover.cfg file on the machine from which thePowerExchange Listener retrieves data for PowerExchange Navigator requests. If you plan to run a databaserow test on extraction maps for the source tables, also specify the capture_connect_string parameter.

Note: The source connection string is not used to transfer change data.

If this value is null, the value of the ORACLE_SID environment variable is used by default. If theORACLE_SID environment variable is not defined, the default Oracle database is used, if defined.

Usage Notes:

PowerExchange requires an ORACLEID statement for each Oracle database for which you want to capture andextract change data. You can specify a maximum of 20 ORACLEID statements in a single dbmover.cfg file.

Specify the ORACLEID statement in the dbmover.cfg file on the machine where the PowerExchange Logger runs,or if you plan to perform Oracle LogMiner CDC without the PowerExchange Logger, on the machine where yourPowerExchange extractions run.

ORCL CAPI_CONNECTION StatementThe ORCL CAPI_CONNECTION statement specifies the Consumer API (CAPI) parameters needed for OracleCDC sources that use Oracle LogMiner.

Data Sources: Oracle sourcesRelatedStatements:

UOWCCAPI_CONNECTION

Required: Yes for Oracle LogMinerCDC

Syntax:

CAPI_CONNECTION=( [DLLTRACE=trace_id,] NAME=name,

92 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 105: Implement CDC

[TRACE=trace,] TYPE=(ORCL, [ARRAYSIZE=array_size,] [BYPASSUF={N|Y},] [CATBEGIN=hh:mm,] [CATEND=hh:mm,] [CATINT=minutes,] [COMMITINT=minutes,] [GENRLOCK={N|Y},] [IGNUFMSG={N|Y},] [LOGDEST=logdest_id,] [LGTHREAD=instance_number,] [ONLINECAT={N|Y},] ORACOLL=collection_id, [SELRETRY=retry_number,] [SNGLINST={N|Y}] ))

Parameters:

Enter the following parameters:

DLLTRACE=trace_id

Optional. User-defined name of the TRACE statement that activates internal DLL tracing for this CAPI.Specify this parameter only at the direction of Informatica Global Customer Support.

NAME=name

Required. Unique user-defined name for this CAPI_CONNECTION statement.

Maximum length is eight alphanumeric characters.

TRACE=trace

Optional. User-defined name of the TRACE statement that activates the common CAPI tracing. Specify thisparameter only at the direction of Informatica Global Customer Support.

TYPE=(ORCL, ... )

Required. Type of CAPI_CONNECTION statement. For Oracle CDC sources that use LogMiner, this valuemust be ORCL.

ARRAYSIZE=array_size

Optional. Size, in number of rows, of the prefetch array that PowerExchange uses to read the Oracleredo logs. A value of less than 100 can degrade Oracle CDC performance.

Note: A value of 0 disables prefetch. Specify 0 only at the direction of Informatica Global CustomerSupport.

Valid values are from 0 through 2147483647.

Default is 100.

BYPASSUF={N|Y}

Optional. Controls whether PowerExchange ends abnormally or issues a warning message when anunformatted log record is returned from Oracle LogMiner.

LogMiner returns unformatted log records when Global Temporary Tables are updated, or whenONLINECAT=Y is specified and the log data that is being read is inconsistent with the catalog.

PowerExchange Configuration for Oracle LogMiner CDC 93

Page 106: Implement CDC

Enter one of the following options:

¨ N. PowerExchange ends with an error whenever it receives an unformatted log record from OracleLogMiner.

¨ Y. PowerExchange writes a warning message to the PowerExchange message log that warns thatunformatted log data has been found, and then continues processing. Depending on the amount ofunformatted log data, many warning messages might be written. You can specify Y for theIGNUFMSG parameter to suppress these warning messages.

Default is N.

Tip: Specify Y if the Oracle instance contains Global Temporary tables. Otherwise, do not include theBYPASSUF parameter.

CATBEGIN=hh:mm

Optional. Earliest time of day, in 24-hour clock format, at which PowerExchange requests Oracle to writea copy of the Oracle catalog to the redo logs.

If you specify a value for the CATBEGIN parameter, you must also specify a value for the CATENDparameter.

Default is 00:00.

CATEND=hh:mm

Optional. Latest time of day, in 24-hour clock format, at which PowerExchange requests Oracle to write acopy of the Oracle catalog to the redo logs.

If you specify a value for the CATEND parameter, you must also specify a value for the CATBEGINparameter.

Default is 24:00.

CATINT=minutes

Optional. Time interval, in minutes, between requests to copy the Oracle catalog to the redo logs.

If this interval elapses but the time is outside of the time period specified in the CATBEGIN and CATENDparameters, PowerExchange does not request Oracle to take a copy of the Oracle catalog. Instead,PowerExchange waits until the time specified for the CATBEGIN parameter to request a catalog copy.

Valid values are from 1 through 1440.

Default is 1440.

COMMITINT=minutes

Optional. Time interval, in minutes, between the SQL COMMIT operations issued by PowerExchange tocommit the transactions automatically generated by the Oracle LogMiner session.

Although PowerExchange does not update data in user tables while reading change data from the redologs, the Oracle LogMiner interface automatically generates transactions for the LogMiner sessions thatPowerExchange initiates. Oracle leaves these transactions open, or in-flight, until the LogMiner sessionends.

To be able to restart change data extraction operations efficiently, PowerExchange must occasionallyissue SQL COMMIT operations to end these in-flight transactions. Otherwise, the restart of all future real-time extraction operations might be impacted because PowerExchange always begins reading changedata at the beginning of the oldest in-flight UOW.

Valid values are from 1 through 60.

94 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 107: Implement CDC

Default is 5.

GENRLOCK={N|Y}

Optional. Controls whether PowerExchange generates a safe restart point for requests for restart pointsthat match the current end-of-log (EOL).

Enter one of the following options:

¨ N. PowerExchange generates restart points that match the current EOL, ignoring any in-flighttransactions for the source tables.

¨ Y. PowerExchange generates safe restart points for source tables.

A safe restart point for a source table is a point in the change stream that does not skip any in-flightUOWs for that table. To generate a safe restart point for a source table, PowerExchange obtains anexclusive lock on the table to stop further changes. PowerExchange then searches the Oracle catalog forthe point in the change stream that matches the earliest active transaction for the table and uses thispoint as the restart point. If no in-flight UOWs exist for a table, PowerExchange uses the current EOL.PowerExchange releases the lock on the source table after the restart point generation processcompletes, which allows new changes to the table to occur.

PowerExchange generates restart tokens that match the current EOL in the following situations:

¨ The PowerExchange Logger for Linux, UNIX, and Windows is cold started and the pwxccl.cfgconfiguration file does not specify the SEQUENCE_TOKEN and RESTART_TOKEN parameters.

PowerExchange obtains locks for all tables represented by capture registrations selected forprocessing by the PowerExchange Logger.

¨ The restart token file for a CDC session specifies the CURRENT_RESTART option on the RESTART1and RESTART2 special override statements.

PowerExchange obtains locks only for the tables in the CDC session to which the special overridestatements apply.

¨ A database row test in the PowerExchange Navigator that uses the SELECT CURRENT_RESTARTSQL statement.

PowerExchange obtains a lock for the table represented by capture registration associated with theextraction map used in the database row test.

¨ A DTLUAPPL utility operation that uses the RSTTKN GENERATE option.

PowerExchange obtains a lock for the table represented by the capture registration specified in theutility control statements.

Default is N.

IGNUFMSG={N|Y}

Optional. Controls whether PowerExchange writes warning messages to the PowerExchange messagelog file for unformatted data records.

Enter one of the following options:

¨ N. PowerExchange does not write any warning messages.

¨ Y. PowerExchange writes warning messages.

Default is N.

PowerExchange Configuration for Oracle LogMiner CDC 95

Page 108: Implement CDC

LOGDEST=logdest_id

Optional. For RAC environments, the numeric identifier for the archive log destination that you want toforce PowerExchange to use. This archive log destination must be local to the Oracle instance thatPowerExchange is using.

For example, to use archived logs from the destination set by the LOG_ARCHIVE_DEST_3 parameter inthe init.ora file, specify LOGDEST=3.

The SNGINST parameter affects how PowerExchange uses the archive log destination and the Oracleinstance specified by LOGDEST and LGTHREAD.

If you specify Y for the ONLINECAT parameter, PowerExchange validates and then ignores theLOGDEST and LGTHREAD parameters.

Valid values are from 1 through 10.

LGTHREAD=instance_number

Optional. For RAC environments, the numeric instance number for the Oracle instance thatPowerExchange uses to identify the archived redo logs to process.

The SNGINST parameter affects how PowerExchange uses the archive log destination and the Oracleinstance specified by LOGDEST and LGTHREAD.

If you specify Y for the ONLINECAT parameter, PowerExchange validates and then ignores theLOGDEST and LGTHREAD parameters.

Valid values are from 1 through 2147483647.

ONLINECAT={N|Y}

Optional. Controls whether PowerExchange directs Oracle LogMiner to use the Oracle online catalog orthe copy of the catalog in the redo logs to format log data for CDC.

Enter one of the following options:

¨ N. Oracle LogMiner uses the copy of the catalog from the archived redo logs and PowerExchangetracks schema changes to ensure that data loss does not occur.

¨ Y. Oracle LogMiner uses the online catalog and PowerExchange cannot track schema changes.

When PowerExchange is configured to use the online catalog for formatting log data, it still uses catalogcopies to determine the restart point for change data extraction operations. Therefore, you must copy theonline catalog to the Oracle redo logs on a regular basis.

Change data extraction operations generally initialize faster when PowerExchange is configured to createLogMiner sessions with the online catalog instead of a catalog copy. However, when LogMiner uses theonline catalog, it does not track DDL changes, and cannot format log records for tables that have schemachanges.

If LogMiner uses the online catalog and you make schema changes made while LogMiner is reading logdata, LogMiner passes unformatted log records for subsequent changes to PowerExchange. If youspecify N for the BYPASSUF parameter, or allow it to default, PowerExchange fails the extraction requestafter Oracle passes the first unformatted record. Otherwise, PowerExchange skips the unformattedrecord and continues processing, which results in change data loss. Therefore, specify N for theONLINECAT parameter, or allow it to default, if you have the following requirements:

¨ You specify Y for the BYPASSUF parameter and need to change the schema of tables registered forcapture while change data extraction operations are running.

¨ You need to start an extraction from a point in the Oracle redo logs that contains table data that wascaptured under a previous schema.

96 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 109: Implement CDC

Default is N.

ORACOLL=collection_id

Required. Oracle collection identifier, which must match the value specified in the ORACLEID statement.

SELRETRY=retry_number

Optional. Number of times that PowerExchange immediately loops back to the Oracle LogMiner callbefore implementing a graduated-scale wait loop.

After the call to LogMiner has been retried the specified number of times, PowerExchange implements await interval between each subsequent retry. The wait interval begins at one millisecond and graduallyincreases to one second. When LogMiner returns data, the wait interval is reset to 0, and the processbegins again for the next call to LogMiner.

If you specify a non-zero value, PowerExchange uses non-blocking SQL to ensure that a user request toshut down a extraction session is processed in a timely manner.

If you specify 0, PowerExchange does not use non-blocking SQL. This setting improves CPUconsumption but can prolong extraction session shutdown. On quiescent Oracle instances,PowerExchange does not honor a shutdown request until log data is returned from Oracle. On Oracleinstances where update activity is occurring, shutdown behavior does not noticeably change.

Valid values are from 0 through 2147483647.

Default is 1000.

SNGLINST={N|Y}

Optional. In RAC environments, controls whether PowerExchange uses only the archived redo logs froma specific Oracle instance and archive log destination.

Enter one of the following options:

¨ N. PowerExchange uses the specified Oracle instance to search for archived redo logs that containcopies of the Oracle catalog. After PowerExchange passes these logs to an Oracle LogMiner session,LogMiner determines the other archived redo logs to read.

¨ Y. PowerExchange uses only the archive log destination and Oracle instance that you specify inLOGDEST and LGTHREAD parameters to read archived redo logs. LogMiner does not read any otherarchived redo logs. After PowerExchange processes the logs from the specified location, the changedata extraction operation ends.

If you specify Y, you must also specify the LOGDEST and LGTHREAD parameters to identify the archivelog destination and Oracle instance to use. For all remaining Oracle instances in the RAC, you must runseparate change data extraction processes and then determine how to properly merge the change dataso that you can apply it to targets.

Default is N.

Oracle Catalog Parameters in the ORCL CAPI_CONNECTION StatementThe CATINT, CATBEGIN and CATEND parameters in the ORCL CAP_CONNECTION statement can significantlyaffect PowerExchange performance. These parameters control the frequency with which the Oracle catalog iscopied to the Oracle redo logs and the time period within which the copy operation can occur. When you restartPowerExchange extraction processing, PowerExchange directs Oracle LogMiner to begin reading change datafrom the redo logs starting from the SCN of the last Oracle catalog copy that was written to the logs prior to theend of the previous extraction session.

PowerExchange Configuration for Oracle LogMiner CDC 97

Page 110: Implement CDC

To configure the CATINT, CATBEGIN, and CATEND parameters, try various settings until you find a combinationthat provides for efficient restart processing. The default frequency of once a day might not be sufficient if youhave a high volume of transaction activity.

The following examples demonstrate how copying the Oracle catalog multiple times can affect the amount ofchange data that is reread from the archived redo logs when PowerExchange extraction processing is restarted.

Example 1Assume that the Oracle catalog was initially copied to the Oracle redo logs at SCN 10 and another copy has notyet been written to the logs. Change data was logged starting at SCN 40 and ending at SCN 60. APowerExchange extraction session extracted these changes before ending at SCN 100. Since the extractionsession ended, additional changes have been logged starting at SCN 160.

When you restart PowerExchange extraction processing, LogMiner must begin reading change data from the initialcatalog copy at SCN 10 because it is the latest catalog copy prior to the session end at SCN 100. As a result,PowerExchange reprocesses the data between SCN 10 and SCN 100, before continuing to the new change datathat begins at SCN 160. This reprocessing of data impacts PowerExchange performance.

Example 2Assume that the Oracle catalog was copied to the Oracle redo logs twice: at SCN 10 and at SCN 80. Change datawas logged starting at SCN 40 and ending at SCN 60. A PowerExchange extraction session extracted thesechanges before ending at SCN 100. Since the extraction session ended, additional changes have been loggedstarting at SCN 160.

When you restart PowerExchange extraction processing, LogMiner begins reading change data from the secondcatalog copy at SCN 80 because it is the latest catalog copy prior to the session end at SCN 100. As a result,PowerExchange reprocesses only the data between SCN 80 and SCN 100, before continuing to the new changedata that begins at SCN 160. With multiple catalog copies, PowerExchange needs to reprocess less change data.

98 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 111: Implement CDC

UOWC CAPI_CONNECTION StatementThe UOWC CAPI_CONNECTION statement specifies the Consumer API (CAPI) parameters needed for the UOWCleanser.

In the change stream for some data sources, changes from multiple UOWs are intermingled. The UOW Cleanserreconstructs the intermingled changes read from the change stream into complete UOWs in chronological orderbased on end time.

Data Sources: DB2 for i5/OSOracle LogMiner CDCz/OS CDC

RelatedStatements:

AS4JCAPI_CONNECTION fori5/OSORCLCAPI_CONNECTION forOracleLRAPCAPI_CONNECTION forz/OS

Required: Yes for the noted datasources

Syntax:

CAPI_CONNECTION=( [DLLTRACE=trace_id,] NAME=name, [TRACE=trace,] TYPE=(UOWC, CAPINAME=name, [BLKSIZE=block_size,] [DATACLAS=data_class,] [MEMCACHE=cache_size,] [RSTRADV=seconds,] [SPACEPRI=primary_space,] [SPACETYPE={BLK|TRK|CYL},] [STORCLAS=storage_class,] [UNIT=unit] ))

Parameters:

Enter the following parameters:

DLLTRACE=trace_id

Optional. User-defined name of the TRACE statement that activates internal DLL tracing for this CAPI.Specify this parameter only at the direction of Informatica Global Customer Support.

NAME=name

Required. Unique user-defined name for this CAPI_CONNECTION statement.

Maximum length is eight alphanumeric characters.

TRACE=trace

Optional. User-defined name of the TRACE statement that activates the common CAPI tracing. Specify thisparameter only at the direction of Informatica Global Customer Support.

TYPE=(UOWC, ... )

Required. Type of CAPI_CONNECTION statement. For the UOW Cleanser, this value must be UOWC.

PowerExchange Configuration for Oracle LogMiner CDC 99

Page 112: Implement CDC

BLKSIZE=block_size

Optional. Block size, in bytes, for the sequential UOW spill files that the UOW Cleanser creates when thememory cache cannot hold all changes for a UOW.

Valid values and defaults vary by platform:

¨ For Oracle LogMiner CDC sources, enter a value from 8 through 65535. Default is 32768.

¨ For i5/OS CDC sources, enter a value from 8 through 32760. Default is 32760.

¨ For z/OS CDC sources, enter a value from 8 through 32760. Default is 18452.

CAPINAME=name

Required. Value from the NAME parameter in the related source-specific CAPI_CONNECTION statement.

The source-specific CAPI_CONNECTION is one of the following statement types:

¨ AS4J CAPI_CONNECTION statement for i5/OS CDC sources

¨ LRAP CAPI_CONNECTION statement for z/OS CDC sources

¨ ORCL CAPI_CONNECTION statement for Oracle LogMiner CDC sources

DATACLAS=data_class

Optional. On z/OS, the SMS data class that the UOW Cleanser uses when allocating the sequential UOWspill files. If you do not specify this parameter, the SMS ACS routines can assign the data class.

MEMCACHE=cache_size

Optional. Memory cache size, in kilobytes, that PowerExchange allocates to reconstruct complete UOWs.

For each extraction session, PowerExchange keeps all changes for each UOW in the memory cache untilit processes the end-UOW record. If the memory cache is too small to hold all of the changes in a UOW,PowerExchange spills the changes to a sequential files on disk, called UOW spill files.

Each UOW spill file contains one UOW. A UOW might require multiple UOW spill files to hold all of thechanges for that UOW. If the change stream contains multiple large UOWs and the memory cache isinsufficient, PowerExchange might create numerous UOW spill files.

PowerExchange processes the change stream more efficiently if it does not need to use UOW spill files.In addition to degrading extraction performance, large numbers of UOW spill files can cause a disk spaceshortage.

Important: If the change stream contains only small UOWs, the default value might be sufficient.However, the default value is often too small to eliminate UOW spill files. Informatica recommends thatso you specify a larger value.

The location in which PowerExchange allocates the UOW spill files varies by operating system, asfollows:

¨ For i5/OS, PowerExchange uses CRTPF command to create a physical file for UOW spill files.

PowerExchange creates the UOW spill file names by using the C/C++ tmpnam() function.

¨ For Linux and UNIX, PowerExchange uses the current directory by default for UOW spill files. To usea different directory, specify the TMPDIR environment variable.

PowerExchange creates the UOW spill file names by using the operating system tempnam functionwith a prefix of dtlq.

Note: The UOW spill files are temporary files that are deleted when PowerExchange closes them.They are not visible in the directory while open.

100 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 113: Implement CDC

¨ For Windows, PowerExchange uses the current directory by default for UOW spill files. To use adifferent directory, specify the TMP environment variable.

PowerExchange creates the UOW spill file names by using the Windows _tempnam function with aprefix of dtlq.

¨ For z/OS, PowerExchange uses dynamic allocation to allocate temporary data sets for the UOW spillfiles. Generally, SMS controls the location of temporary data sets. If you do not use SMS to controltemporary data sets, the UNIT parameter controls the location for the UOW spill files.

Because PowerExchange allocates temporary data sets for the UOW spill files, z/OS assigns thesefiles system-generated data set names, which begin with SYSyyddd.Thhmmss.RA000.jobname.

Valid values are from 1 through 519720.

Warning: Because PowerExchange allocates the cache size for each extraction operation, use cautionwhen coding large values for MEMCACHE. Otherwise, many concurrent extraction sessions might causememory constraints.

Default is 1024, or 1 MB.

RSTRADV=nnnnn

Time interval, in seconds, that PowerExchange waits before advancing restart and sequence tokens for aregistered data source during periods when UOWs do not include any changes of interest for the datasource. When the wait interval expires, PowerExchange returns the next committed "empty UOW," whichincludes only updated restart information.

The wait interval is reset to 0 when PowerExchange completes processing a UOW that includes changesof interest or returns an empty UOW because the wait interval expired without any changes of interesthaving been received.

For example, if you specify 5, PowerExchange waits 5 seconds after it completes processing the lastUOW or after the previous wait interval expires. Then PowerExchange returns the next committed emptyUOW that includes the updated restart information and resets the wait interval to 0.

If RSTRADV is not specified, PowerExchange does not advance restart and sequence tokens for aregistered source during periods when no changes of interest are received. In this case, whenPowerExchange warm starts, it reads all changes, including those not of interest for CDC, from therestart point.

Valid values are 0 through 86400. No default is provided.

Warning: A value of 0 can degrade performance because PowerExchange returns an empty UOW aftereach UOW processed.

SPACEPRI=primary_space

Optional. On z/OS, the primary space value that the UOW Cleanser uses to allocate UOW spill files. TheUOW Cleanser does not use secondary space. Instead, when a spill file becomes full, the UOW Cleanserallocates another spill file of the same size. The SPACETYP parameter specifies the space units for thisvalue. Default is 50 cylinders.

SMS ACS routines can override the UOW spill file size.

Valid values are from 1 through 2147483647.

Default is 50 cylinders.

Note: On i5/OS, the UOW Cleanser allocates UOW spill files as physical files with SIZE(*NOMAX), whichmeans that the maximum spill file size is controlled by the system maximum file size. On Linux, UNIX,and Windows, PowerExchange allocates UOW spill files as temporary files that are 2 GB in size.

PowerExchange Configuration for Oracle LogMiner CDC 101

Page 114: Implement CDC

SPACETYPE={BLK|TRK|CYL}

Optional. On z/OS, the type of space units that the UOW Cleanser uses to allocate UOW spill files.

Enter one of the following options:

¨ BLK. Use blocks.

¨ CYL. Use cylinders.

¨ TRK. Use tracks.

Default is BLK.

STORCLAS=storage_class

Optional. On z/OS, the SMS storage class name that the UOW Cleanser uses to allocate UOW spill files.

UNIT=unit

Optional. On z/OS, the generic or esoteric unit name that the UOW Cleanser uses to allocate UOW spillfiles.

Management of Oracle LogMiner CDCYou might need to stop CDC for source tables occasionally, for example, to change the table definitions.

Stopping Oracle LogMiner CDCYou might need to stop Oracle change data capture for a source table to perform troubleshooting or routinemaintenance tasks.

To stop change data capture, use one of the following methods:

¨ Open the capture registration for the source table, and change the Status value from Active to History.

Warning: A capture registration that has a status of History cannot be activated again. This methodpermanently stops change data capture for a table based on a particular capture registration.

¨ Drop the supplemental log group by executing the following SQL:ALTER TABLE schema.table_name DROP SUPPLEMENTAL LOG GROUP

After you drop the supplemental log group, Oracle stops recording full before- and after-images of data thatchanged. If you reinstate the supplemental log group later, you should rematerialize the target database.

RELATED TOPICS:¨ “Stopping PowerCenter CDC Sessions” on page 142

Changing a Source Table Definition Used in Oracle LogMiner CDCOccasionally, you might need to change the definition of an Oracle source table that is registered for change datacapture. If your metadata changes affect the columns from which change data is captured, use this procedure toenable PowerExchange to switch to the updated table definition, while preserving access to previously captureddata.

102 Chapter 6: Oracle Change Data Capture with Oracle LogMiner

Page 115: Implement CDC

Perform this procedure whenever you add, alter, or drop columns for which change data is captured. You do notneed to perform this procedure if you are selectively capturing change data for a subset of columns and none ofthe selected columns are affected by the table definition changes.

Tip: If you no longer need to capture change data from a column in a table, you can remove that column from theextraction map without changing the capture registration. Change data for the column is still captured but is notextracted.

To change a source table definition used in Oracle LogMiner CDC:

1. Stop DELETE, INSERT, and UPDATE activity against the table.

2. Verify that any change data that was captured under the previous table definition has completed extractionprocessing. Then stop all workflows that extract change data for the table.

3. In the PowerExchange Navigator, open the original capture registration and set its status to History.

Note: PowerExchange does not capture change data based on capture registrations that have a status ofHistory or Inactive.

4. Use DDL to make the table changes.

5. Drop the supplemental log group for the table.

6. In the PowerExchange Navigator, create a new capture registration that reflects the metadata changes andset its status to Active.

Also select the Execute DDL now option so that when you finish the capture registration, the PowerExchangeNavigator runs the DDL for creating a new supplemental log group.

PowerExchange uses the newly activated capture registration for change data capture.

7. If necessary, change the target table definition to reflect the source table metadata changes.

8. If you use the PowerExchange Logger for Linux, UNIX, and Windows, restart the PowerExchange Loggerprocess so that it will begin using the new capture registration.

9. In PowerCenter Designer, import the altered source and target tables. Edit the mapping if necessary.

10. If necessary, rematerialize the target tables. After materialization completes, create new restart tokens.

11. Re-enable DELETE, INSERT, and UPDATE activity against the table.

12. Restart extraction processing.

RELATED TOPICS:¨ “Creating Restart Tokens for Extractions” on page 135

Management of Oracle LogMiner CDC 103

Page 116: Implement CDC

Part IV: Change Data ExtractionThis part contains the following chapters:

¨ Introduction to Change Data Extraction, 105

¨ Extracting Change Data, 125

¨ Managing Change Data Extractions, 140

¨ Monitoring and Tuning Options, 148

104

Page 117: Implement CDC

C H A P T E R 7

Introduction to Change DataExtraction

This chapter includes the following topics:

¨ Change Data Extraction Overview, 105

¨ Extraction Modes, 106

¨ PowerExchange-Generated Columns in Extraction Maps, 106

¨ Restart Tokens and the Restart Token File, 109

¨ Recovery and Restart Processing for CDC Sessions, 111

¨ Group Source Processing in PowerExchange, 116

¨ Commit Processing with PWXPC, 118

¨ Offload Processing, 123

Change Data Extraction OverviewUse PowerExchange in conjunction with PWXPC and PowerCenter to extract captured change data and write it toone or more targets. Review the topics in this chapter to learn key concepts about extraction processing so thatcan configure CDC sessions to extract change data efficiently and to enable proper restart and recovery.

To extract changes captured by PowerExchange, import the metadata for the capture source into PowerCenterDesigner. Use one of the following methods:

¨ For nonrelational data sources, import the extraction map from PowerExchange.

¨ For relational data sources, you can import either the metadata from the database or the extraction map fromPowerExchange. If you import metadata from the database, you might need to modify the source definition inDesigner to add PowerExchange-defined CDC columns or to remove any columns that are not included in theextraction map. If you import extraction maps, you do not need to manually add or remove these columns fromthe PowerCenter source definition.

After you import the metadata, you can use the source definitions in PowerCenter to create mappings, sessions,and workflows for extracting the change data from PowerExchange.

RELATED TOPICS:¨ “PowerExchange-Generated Columns in Extraction Maps” on page 106

105

Page 118: Implement CDC

Extraction ModesYou can use different modes to extract change data captured by PowerExchange. The extraction mode isdetermined by the PowerCenter connection type and certain PowerExchange CDC configuration parameters.Some extraction modes are available only if you use PowerExchange Condense or the PowerExchange Logger forLinux, UNIX, and Windows.

Depending on your extraction requirements, use one of the following extractions modes:

Real-time extraction mode

Continuously extracts change data directly from the PowerExchange Logger for MVS log files in near realtime. Extraction processing continues until the CDC session is stopped or interrupted.

To implement this mode, configure a PWX CDC Real Time application connection in PowerCenter for yourdata source type.

Batch extraction mode

Extracts change data from PowerExchange Condense condense files on MVS that are closed at the time thesession runs. After processing the condense files, the CDC session ends.

To implement this mode, configure the following items:

¨ In PowerCenter, configure a PWX CDC Change application connection for your data source type.

¨ In the PowerExchange Navigator, set the Condense option to Part or Full in your capture registrations.

Continuous extraction mode.

Continuously extracts change data from open and closed PowerExchange Logger for Linux, UNIX, andWindows log files in near real time.

To implement this mode, configure the following items:

¨ On the remote Linux, UNIX, or Windows system, configure the PowerExchange Logger for Linux, UNIX,and Windows to log change data that was originally captured on MVS.

¨ In PowerCenter, configure a PWX CDC Real Time application connection for your data source type.

¨ In the PowerExchange Navigator, set the Condense option to Part in your capture registrations.

RELATED TOPICS:¨ “Configuring PowerExchange to Capture Change Data on a Remote System” on page 162

¨ “Extracting Change Data Captured on a Remote System” on page 168

PowerExchange-Generated Columns in Extraction MapsBesides the table columns defined in capture registrations, extraction maps include columns that PowerExchangegenerates. These PowerExchange-generated columns contain CDC-related information, such as the change typeand timestamp.

When you import an extraction map in Designer, PWXPC includes the PowerExchange-generated columns in thesource definition.

When you perform a database row test on an extraction map, the PowerExchange Navigator displays thePowerExchange-generated columns in the results. By default, the PowerExchange Navigator hides these columns

106 Chapter 7: Introduction to Change Data Extraction

Page 119: Implement CDC

from view when you open the extraction map. To display these columns, open the extraction map, right-clickanywhere within the Extract Definition window, and select Show Auto Generated Columns.

Note: By default, all columns except the DTL__columnname_CNT and DTL__columnname_IND columns areselected in an extraction map. You must edit an extraction map to select these columns.

The following table describes the columns that PowerExchange generates for each change record:

Column Description Datatype Length

DTL__CAPXRESTART1 A binary value that represents the position of the end of theUOW for that change record followed by the position of thechange record itself.The length of a sequence token varies by data source type,except on z/OS where sequence tokens for all data sourcetypes have the same length.The value of DTL__CAPXRESTART1 is also known as thesequence token, which when combined with the restarttoken comprises the restart token pair.A sequence token for a change record is a strictlyascending and repeatable value.

VARBIN 255

DTL__CAPXRESTART2 A binary value that represents a position in the changestream that can be used to reconstruct the UOW state forthe change record, with the following exceptions:- Microsoft SQL Server CDC. A binary value that contains

the DBID of the distribution database and the name ofthe distribution server.

- Change data extracted from full condense files on z/OSor i5/OS. A binary value that contains the instance namefrom the registration group of the capture registration.

The length of a restart token varies by data source type. Onz/OS, restart tokens for all data source types have thesame length, except for change data extracted from fullcondense files.The value of DTL__CAPXRESTART2 is also known as therestart token, which when combined with the sequencetoken comprises the restart token pair.

VARBIN 255

DTL_CAPXRRN For DB2 on i5/OS only, the relative record number. DECIMAL 10

DTL__CAPXUOW A binary value that represents the position in the changestream of the start of the UOW for the change record.

VARBIN 255

DTL__CAPXUSER The user ID of the user that made the change to the datasource, with the following exceptions:- DB2 for i5/OS. If you specify LIBASUSER=Y on the

AS4J CAPI_CONNECTION statement, the value is thelibrary and file name to which the change was made.

- DB2 for z/OS. If you do not specify UIDFMT on the LRAPCAPI_CONNECTION, the value is the user ID of theuser that made the change. Otherwise, the UIDFMTparameter determines the value.

- Microsoft SQL Server. The value is null becauseMicrosoft SQL Server does not record this information inthe distribution database.

- Oracle. The value might be null. If known, Oracleprovides the user ID.

VARCHAR 255

PowerExchange-Generated Columns in Extraction Maps 107

Page 120: Implement CDC

Column Description Datatype Length

DTL__CAPXTIMESTAMP The timestamp for when the change was made to the datasource, as recorded by the source DBMS in the followingformat:YYYYMMDDhhmmssnnnnnn

Where:- YYYYMMDD is the date in year (YYYY), month (MM),

and day (DD) format.- hhmmssnnnnnn is the time in hours (hh), minutes (mm),

seconds (ss), and microseconds (nnnnnn) format.

Note: Oracle does not support microseconds in thetimestamp.

CHAR 20

DTL__CAPXACTION A single character that indicates the type of changeoperation. Valid values are:- I. INSERT operation.- D. DELETE operation.- U. UPDATE operation.

CHAR 1

DTL__CAPXCASDELIND For DB2 for z/OS sources only, a single character thatindicates whether DB2 has deleted the row because thetable specifies the ON DELETE CASCADE clause. Validvalues are:- Y. Indicates that DB2 deleted this row because of a

cascade delete rule.- N. Indicates that DB2 did not delete this row because of

a cascade delete rule.

CHAR 1

DTL__BI_columnname For UPDATE operations, the value of the before image ofthe selected column in the change record.

Datatype ofthe sourcecolumn

Length of thesourcecolumn

DTL__CI_columnname For UPDATE operations, a single character that indicateswhether the selected column was changed. Valid valuesare:- Y. Indicates that the column changed.- N. Indicates that the column did not changed.- Null value. Indicates an INSERT or DELETE operation.

CHAR 1

DTL__columnname_CNT Binary count column. PowerExchange generates thiscolumn for variable length columns of types VARCHAR andVARBIN to determine the length of the column duringchange data extraction processing.Note: By default, binary count columns are not selected inan extraction map. You must edit an extraction map toselect these columns.

NUM32U 0

DTL__columnname_IND Null indicator column. PowerExchange generates thiscolumn for nullable columns to indicate the nullable valuefor the column.Note: By default, null indicator columns are not selected inan extraction map. You must edit an extraction map toselect these columns.

BIN 1

108 Chapter 7: Introduction to Change Data Extraction

Page 121: Implement CDC

Restart Tokens and the Restart Token FilePowerExchange uses a pair of token values, called a restart token pair, to determine where to begin extractingchange data in the change stream for a CDC session. For a new CDC session, you should generate restart tokenvalues that represent the point-in-time in the change stream where you materialized the targets. Each source in aCDC session can have unique values for its restart token pair in the restart token file.

A restart token pair matches the position in the change stream for a change record and has the following parts:

Sequence token

For each change record that PowerExchange reads from the change stream, a binary value that representsthe change stream position of the end of the UOW for that change record followed by the position of thechange record itself, with the following exceptions:

¨ For Microsoft SQL Server CDC, a binary value that represents the position of the change record in thedistribution database.

¨ For change data extracted from full condense files on z/OS or i5/OS, a binary value that represents the fullcondense file and the position of the change record in that file.

A sequence token for a change record is a strictly ascending and repeatable value. The length of a sequencetoken varies by data source type, except on z/OS where sequence tokens for all data source types have thesame length.

Restart token

For each change record that PowerExchange reads from the change stream, a binary value that represents aposition in the change stream that can be used to reconstruct the UOW state for that record, with the followingexceptions:

¨ For Microsoft SQL Server CDC, a binary value that contains the DBID of the distribution database and thename of the distribution server.

¨ For change data extracted from full condense files on z/OS and i5/OS, a binary value that contains theinstance name from the registration group for the capture registration.

In some cases, the restart token might contain the position of the oldest open UOW. An open UOW is a UOWfor which PowerExchange has read the beginning of the UOW from the change stream but has not yet readthe commit record, or end-UOW.

The length of a restart token varies by data source type. On z/OS, restart tokens for all data source typeshave the same length, except for change data extracted from full condense files.

PowerExchange uses these restart token values to determine the point from which to start reading change datafrom the change stream, with the following exceptions:

¨ For Microsoft SQL Server CDC, PowerExchange uses the sequence token value to determine the point fromwhich to start reading change data from that distribution database, and the restart token value to verify that thedistribution database is the same as the distribution database specified on the CAPI connection.

¨ For change data extracted from full condense files on z/OS or i5/OS, PowerExchange uses the sequence tokenvalue to determine the point from which to start reading change data from the condense files, and the restarttoken value to verify that the instance is the same as the instance recorded for the change record.

After determining the start point in the change stream for a CDC session, PowerExchange begins to read andpass change data to PWXPC. PWXPC uses the sequence token value for each source in the CDC session todetermine the point at which to start providing the change data passed from PowerExchange to a specific source.

Restart Tokens and the Restart Token File 109

Page 122: Implement CDC

You should specify restart token values in the restart token file in the following situations:

¨ When creating a new CDC session, specify a restart token pair for each data source. Alternatively, you can usethe special override statement to specify a restart token pair for some or all data sources.

¨ When adding a data source to an existing CDC session, specify a restart token pair for the new source.

¨ If you need to override token values for a data source that is defined in an existing CDC session, specify theoverride token values.

Generating Restart TokensBefore you begin extracting change data, you must materialize the targets for the CDC session with data from thedata sources. Usually, to perform this task, you run a bulk data movement session. After you materialize thetargets and before you allow changes to be made to the data source again, you should generate restart tokensthat represent the point-in-time in the change stream when the materialization occurred.

PWXPC can generate restart tokens when it starts to extract change data for a CDC session. Additionally,PowerExchange provides a number of methods to generate restart tokens. To generate restart tokens that matchthe current end of the change stream, use one of the following methods:

¨ In the PWXPC restart token file for the CDC session, specify CURRENT_RESTART on the RESTART1 andRESTART2 special override statements.

¨ In the PowerExchange Navigator, use the SELECT CURRENT_RESTART SQL statement when you perform adatabase row test.

¨ Run the DTLUAPPL utility with the GENERATE RSTTKN option.

If you use the DTLUAPPL utility or the PowerExchange Navigator to generate restart tokens, edit the restart tokenfile that PWXPC uses to specify the token values before you start the CDC session.

Restart Token FileYou can use the restart token file to provide restart tokens for a new CDC session, or for a source that you add toan existing CDC session. You can also use the restart token file to override restart tokens for sources in anexisting CDC session.

Specify the name and location of the restart token file in the following attributes of the source PWX CDCapplication connection:

¨ RestartToken File Folder

¨ RestartToken File Name

When you run a CDC session, PWXPC reads the restart token file in the folder specified in the RestartToken FileFolder attribute of the source CDC connection. If this folder does not exist and the RestartToken File Folderattribute contains the default value of $PMRootDir/Restart, PWXPC creates this folder. PWXPC does not createany other restart token folder name. PWXPC then verifies that the restart token file exists. If the file does not exist,PWXPC uses the name specified in the RestartToken File Name attribute to create an empty restart token file.

PWXPC stores restart tokens for CDC sessions at the following locations:

¨ For relational targets, in a state table in the target database

¨ For nonrelational targets, in a state file on the PowerCenter Integration Service machine

When you restart a CDC session, PWXPC reads the restart tokens for each source in the CDC session from thestate table or file. PWXPC also reads the restart token file for the CDC session and overrides the restart tokens forany sources that have token values included in the file.

110 Chapter 7: Introduction to Change Data Extraction

Page 123: Implement CDC

Recovery and Restart Processing for CDC SessionsIf you select Resume from the last checkpoint for the Recovery Strategy attribute in a CDC session thatextracts change data from PowerExchange, PWXPC and PowerCenter provide recovery and restart processing forthat session. In the event of a session failure, the PowerCenter Integration Service recovers the session state ofoperation, and PWXPC recovers the restart information.

PWXPC saves restart information for all sources in a CDC session. The restart information for CDC sessions,which includes the restart tokens, originates from PowerExchange on the system from which the change data isextracted. You can include both relational and nonrelational targets in a single CDC session. PWXPC uses one ofthe following locations to store and retrieve restart information, based on the target type:

¨ Relational targets. Recovery state tables in the target databases. PWXPC, in conjunction with thePowerCenter Integration Service, commits both the change data and the restart tokens for that data in thesame commit, which ensures that the applied data and the restart tokens are in-sync.

¨ Nonrelational targets. Recovery state file in the shared location on the PowerCenter Integration Servicemachine. PWXPC, in conjunction with the PowerCenter Integration Service, writes the change data to thetarget files and then writes the restart tokens to the recovery state file. As a result, duplicate data might beapplied to the targets when you restart failed CDC sessions.

The PowerCenter Integration Service saves the session state of operation and maintains target recovery tables.The PowerCenter Integration Service stores the session state of operation in the shared location that is specifiedin $PMStorageDir. The PowerCenter Integration Service saves relational target recovery information in the targetdatabase.

When you run a CDC session that uses a resume recovery strategy, PWXPC writes the following message to thesession log to indicate that recovery is in effect:

PWXPC_12094 [INFO] [CDCRestart] Advanced GMD recovery in effect. Recovery is automatic.

When you recover or restart a CDC session, PWXPC uses the saved restart information to resume reading thechange data from the point of interruption. The PowerCenter Integration Service restores the session state ofoperation, including the state of each source, target, and transformation. PWXPC, in conjunction with thePowerCenter Integration Service, determines how much of the source data it needs to reprocess. PowerExchangeand PWXPC use the restart information to determine the correct point in the change stream from which to restartextracting change data and then applying it to the targets.

If you run a session with resume recovery strategy and the session fails, do not change the mapping, the session,or the state information before you restart the session. PowerCenter and PWXPC cannot guarantee recovery ifyou make any of these changes.

Restriction: If any of the targets in the CDC session use the PowerCenter File Writer to write CDC data to flatfiles, do not use a resume recovery strategy. Restart tokens for all targets in the CDC session, including relationaltargets, will be compromised if a flat file target is in the same session. Data loss or duplication might occur.

PowerCenter Recovery Tables for Relational TargetsWhen the PowerCenter Integration Service runs a session that has a resume recovery strategy, it writes torecovery tables on the target database system. When the PowerCenter Integration Service recovers the session, ituses information in the recovery tables to determine where to begin loading data to target tables. PWXPC usesinformation in the recovery tables to determine where to begin reading the change stream.

If you want the PowerCenter Integration Service to create the recovery tables, grant table creation privilege to thedatabase user name configured in the target database connection. Otherwise, you must create the recovery tablesmanually.

Recovery and Restart Processing for CDC Sessions 111

Page 124: Implement CDC

For relational targets, the PowerCenter Integration Service creates the following recovery tables in the targetdatabase:

¨ PM_RECOVERY. Contains target load information for the session run. The PowerCenter Integration Serviceremoves the information from this table after each successful session and initializes the information at thebeginning of subsequent sessions.

¨ PM_TGT_RUN_ID. Contains information the PowerCenter Integration Service uses to identify each target onthe database. The information remains in the table between session runs. If you manually create this table, youmust create a row and enter a value other than zero for LAST_TGT_RUN_ID to ensure that the sessionrecovers successfully.

¨ PM_REC_STATE. Contains state and restart information for CDC sessions. PWXPC stores the applicationname and restart information for all sources in the CDC session. The PowerCenter Integration Service storesany state information for the session. Unlike the session state information, restart information persists in thistable across successful sessions. The PowerCenter Integration Service updates it with each commit to thetarget tables.

If you edit or drop the recovery tables before you recover a session, the PowerCenter Integration Service cannotrecover the session. Also, PWXPC cannot restart the CDC session from the point of interruption.

If you disable recovery, the PowerCenter Integration Service does not remove the recovery information from thetarget database. Also, PWXPC no longer updates the restart information in the target database.

Recovery State TableThe recovery state table, PM_REC_STATE, contains state and CDC restart information for a CDC session. Thistable resides in the same target database as the target tables.

The PowerCenter Integration Service creates an entry in the state table for each CDC session. These entries cancomprise more than one row. CDC sessions with heterogeneous target tables have state table entries in eachunique relational target database and an entry in a state file on the PowerCenter Integration Service machine foreach nonrelational target. For example, a CDC session that targets Oracle and SQL Server tables and a MQSeries queue has an entry in the state table in the target Oracle database, in the state table in the target SQLServer database, and in the state file on the PowerCenter Integration Service machine.

Each session entry in a state table contains a number of repository identifiers and execution state data such as thecheckpoint number and CDC restart information. The following columns can contain PWXPC-specific restartinformation:

¨ APPL_ID. Contains the value the PWXPC creates by appending the task instance ID of the CDC session to thevalue that you specify in the Application Name attribute in the source PWX CDC application connection. Whenthis value matches an APPL_ID value for a row in the state table, the PowerCenter Integration Service, inconjunction with PWXPC, selects the row from the state table for the CDC session.

¨ STATE_DATA. Contains the restart information for the session in a variable-length, 1,024-byte binary column.When the PowerCenter Integration Service commits change data is to the targets tables, it also commits therestart information for that data in this column. PWXPC uses the restart information from this column to performrestart processing for the CDC session.

If the amount of restart information for a session exceeds 1,024 bytes, the PowerCenter Integration Serviceadds additional rows to accommodate the remainder of the restart information. For each row added, thePowerCenter Integration Service increases the value of the SEQ_NUM column by one, starting from zero.

PowerCenter Recovery Files for Nonrelational TargetsIf you configure a resume recovery strategy for a CDC session, the PowerCenter Integration Service stores thesession state of operation in the shared location, $PMStorageDir, on the PowerCenter Integration Servicemachine. For nonrelational targets, the PowerCenter Integration Service also stores the target recovery status in a

112 Chapter 7: Introduction to Change Data Extraction

Page 125: Implement CDC

recovery state file in the shared location on the PowerCenter Integration Service machine. PWXPC stores therestart information for nonrelational target files in this state file.

Recovery State FileFor all nonrelational targets in a session, the PowerCenter Integration Service uses a recovery state file on thePowerCenter Integration Service machine. Nonrelational target files include MQ Series message queues,PowerExchange nonrelational targets, and other PowerCenter nonrelational targets.

CDC sessions with heterogeneous target tables have state table entries in each unique relational target databaseand an entry in a state file on the PowerCenter Integration Service machine for each nonrelational target.

The PowerCenter Integration Service creates the recovery state file in the shared location, $PMStorageDir. Thefile name has the following prefix:

pm_rec_state_appl_id

PWXPC creates the value for the appl_id variable in the file name by appending the task instance ID of the CDCsession to the value that you specify in the Application Name attribute in the source PWX CDC applicationconnection. The PowerCenter Integration Service uses various task and workflow repository attributes to completethe file name. The message CMN_65003, which the PowerCenter Integration Service writes to the session log,contains the complete file name.

Application NamesPWXPC, in conjunction with the PowerCenter Integration Service, uses the application name you specify as part ofthe key when it stores and retrieves the restart information for the CDC session. When you configure the PWXCDC application connection for each CDC session, specify a unique value in the Application Name attribute.

PWXPC appends the repository task instance ID for the CDC session to the Application Name value to create theAPPL_ID value in the recovery state table and the appl_id portion in the recovery state file name.

Because the value of the APPL_ID column and the state recovery file contains the task instance ID for thesession, changes to the CDC session such as adding and removing sources or targets affects restart processing.When you change the CDC session to add or remove sources and targets, you must use the restart token file toprovide restart tokens and then cold start the CDC session.

Restart Processing for CDC SessionsEach source in a CDC session has its own restart point. The method you use to start a CDC session controls howPWXPC determines the restart information for the sources in that session.

Use one of the following methods to start CDC sessions:

¨ Cold start. When you cold start a CDC session, PWXPC uses the restart token file to acquire restart tokens forall sources, does not read the state table or file, and makes no attempt to recover the session. The CDCsession continues to run until stopped or interrupted.

¨ Warm start. When you warm start a CDC session, PWXPC reconciles the restart tokens for sources providedin the restart token file, if any, with any restart tokens that exist in the state tables or file. If necessary, PWXPCperforms recovery processing. The session continues to run until stopped or interrupted.

¨ Recovery start. When you recover a CDC session, PWXPC reads the restart tokens from any applicable statetables and file. If necessary, PWXPC performs recovery processing. PWXPC then updates the restart token filewith the restart tokens for each source in the CDC session, and the session ends.

Before you run a CDC session for the first time, you should create and populate the restart token file with restarttokens for each source in the session. Each restart token pair should match a point in the change stream wherethe source and target are in a consistent state. For example, you materialize a target table from a source and donot change the source data after materialization. To establish a starting extraction, or restart, point in the change

Recovery and Restart Processing for CDC Sessions 113

Page 126: Implement CDC

stream, code a special override statement with the CURRENT_RESTART option in the restart token file that hasthe file name that you specified in the PWX CDC application connection in the CDC session. When you cold startthe CDC session, PWXPC requests that PowerExchange use the current end-point in the change stream as theextraction start point. After the CDC session starts, you can resume change activity to the sources.

If you cold start a CDC session and a restart token file does not exist, the PowerCenter Integration Service stillruns the session. Because you did not provide any restart information, PWXPC passes null restart tokens for allsources to PowerExchange and indicates that the restart tokens for each source are NULL in messagePWXPC_12060. PowerExchange then assigns the default restart point to each source.

Warning: If you use null restart tokens, the CDC session might not produce the correct results. When you coldstart CDC sessions, provide valid restart tokens.

Default Restart Points for Null Restart TokensThe default restart points that PowerExchange uses when it receives null restart tokens vary by data source type.

The following table describes the default restart points for null restart tokens, by data source type and extractionmethod:

Data Source Type Batch and Continuous Extraction Mode Real-time Extraction Mode

All MVS sources Oldest condense file, as recorded in the CDCT. Best available restart point as determined by thePowerExchange Logger for MVS, which is one of thefollowing:- Oldest restart point for which an archive log is

available- Current active log if there are no available archive

logs.

DB2 for i5/OS Oldest condense file, as recorded in the CDCT. Oldest journal receiver still attached on the journalreceiver chain.

DB2 for Linux,UNIX, and Windows

Oldest PowerExchange Logger for Linux,UNIX, and Windows log file, as recorded in theCDCT.

Current log position at the time the PowerExchangecapture catalog was created.

Microsoft SQLServer

Oldest PowerExchange Logger for Linux,UNIX, and Windows log file, as recorded in theCDCT.

Oldest data available in the Publication database.

Oracle Oldest PowerExchange Logger for Linux,UNIX, and Windows log file, as recorded in theCDCT.

Current Oracle catalog dump.

PowerExchange uses the default restart point only if all sources in a CDC session have null restart tokens. If somesources have non-null restart tokens, PWXPC assigns the oldest restart point from those tokens to any sources forwhich no restart tokens are specified.

For example, a new CDC session contains the sources A, B, and C. The restart token file contains restart tokensfor sources A and B. The restart point for source A is older than that for source B. Source C does not have existingor supplied restart tokens. Because some sources in the CDC session have explicit restart points, PWXPC doesnot assign null restart tokens to source C. Instead, PWXPC assigns the restart point for source A to source Cbecause this restart point is the oldest one supplied.

Determining the Restart Tokens for Cold Start ProcessingWhen you cold start a CDC session, PWXPC uses the restart token file to determine the restart tokens for allsources. PWXPC ignores any entries in the state tables or state file for the sources in the CDC session.

114 Chapter 7: Introduction to Change Data Extraction

Page 127: Implement CDC

More specifically, PWXPC uses one of the following methods to determine the restart tokens:

¨ If the restart token file is empty or does not exist, PWXPC assigns null restart tokens to all sources in the CDCsession.

¨ If the restart token file contains only explicit override statements, PWXPC performs the following processing:

- Assigns the restart tokens in the explicit override statements to the specified sources.

- Assigns the oldest supplied restart point to any sources for which an explicit override statement was notspecified.

¨ If the restart token file contains only the special override statement, PWXPC assigns the restart tokens in thespecial override statement to all sources.

¨ If the restart token file contains a special override statement and explicit override statements, PWXPC performsthe following processing:

- Assigns the restart tokens in the explicit override statements to the specified sources.

- Assigns the restart tokens in the special override statement to all remaining sources.

Determining the Restart Tokens for Warm Start ProcessingWhen you warm start a CDC session, uses the state tables and state file, in conjunction with restart token file, todetermine the restart tokens for all sources.

More specifically, PWXPC uses one of the following methods to determine the restart tokens:

¨ If the restart token file is empty or does not exist and there is no matching entry in a state table or state file,PWXPC assigns null restart tokens to all sources in the session.

¨ If the restart token file is empty or does not exist and if some but not all sources have a matching entry in astate table or a state file, PWXPC performs the following processing:

- Assigns any restart tokens found in a state table and state file to the appropriate sources.

- Assigns the oldest available restart point to all sources that do not have restart tokens.

¨ If the restart token file is empty or does not exist and if all sources have an entry in a state table or state file,PWXPC uses the restart tokens from the state tables or state file.

¨ If the restart token file contains explicit override statements and no sources have a matching entry in a statetable or no state file, PWXPC performs the following processing:

- Assigns the restart tokens in the explicit override statements to the specified sources.

- Assigns the oldest supplied restart point to all sources that do not have restart tokens.

¨ If the restart token file contains explicit override statements and if some but not all sources have a matchingentry in a state table or a state file, PWXPC performs the following processing:

- Assigns the restart tokens in the explicit override statements to the specified sources.

- Assigns restart tokens from a state table or state file to the appropriate sources, provided that the tokenshave not been supplied in the restart token file.

- Assigns the oldest available restart point to all sources that do not have restart tokens supplied in the restarttoken file or from a state table or state file.

¨ If the restart token file contains explicit override statements and if all sources have an entry in a state table or astate file, PWXPC performs the following processing:

- Assigns the restart tokens in the explicit override statements to the specified sources.

- Assigns the restart tokens from state tables or the state file to all remaining sources that do not have restarttokens supplied in the restart token file.

Recovery and Restart Processing for CDC Sessions 115

Page 128: Implement CDC

¨ If the restart token file contains only the special override statement, PWXPC assigns the restart tokens in thespecial override statement to all sources.

¨ If the restart token file contains a special override statement and explicit override statements, PWXPC performsthe following processing:

- Assigns the restart tokens in the explicit override statements to the specified sources.

- Assigns the restart tokens in the special override statement to all remaining sources.

Group Source Processing in PowerExchangeWhen you extract change data using PWX CDC application connections, PowerExchange uses group sourceprocessing for all source definitions that you include in a single mapping. With group source processing,PowerExchange reads data from the same physical source in a single pass. This processing enhances throughputand reduces resource consumption by eliminating multiple reads of the source data.

When you run a CDC session, PWXPC passes a source interest list that contains all of the sources.PowerExchange ruses the source interest list to determine the sources for which to read data from the changestream. When PowerExchange encounters changes for a source in the interest list, it passes the change data toPWXPC. PWXPC then provides the change data to the appropriate source in the mapping.

If you use PWXPC connections for bulk data movement operations, PowerExchange uses group sourceprocessing for the following multiple-record, nonrelational data sources:

¨ IMS unload data sets

¨ Sequential data sets and flat files

¨ VSAM data sets

PowerExchange uses group source processing to read all records for a single multi-group source qualifier in amapping. When you run a bulk data movement session, PWXPC passes PowerExchange the source data mapinformation from the source definition metadata, which includes the data set or file name if available. If PWXPCdoes not pass the data set or file name, PowerExchange determines it from the PowerExchange data map.PowerExchange reads the data set or file and passes all of the data records to PWXPC. PWXPC then providesthe data records to the appropriate source record type in the multi-group source qualifier.

Using Group Source with Nonrelational SourcesPowerExchange can use group source processing for some nonrelational data sources that support multiplerecord types in a single file.

A single mapping can contain one or more multi-record source definitions and single-record source definitions. Ifyou use PWX NRDB Batch application connections, PWXPC creates a connection to PowerExchange for eachsource definition in the mapping and reads the source data.

For data sources with multiple record types, the PowerExchange data map defines a record and a table for eachunique record type. The table represents the relational view of the related record.

For IMS, VSAM, and sequential or flat file data sources, you can use Designer to import data maps with multiplerecord types to create PowerCenter source definitions. If you want the source definition to represent only a singlerecord type, import a single table from the data map. If you want the source definition to include all record types,import the data map as a multi-record data map.

To import the data map as a multi-record data map, select Multi-Record Datamaps in the Import fromPowerExchange dialog box. If you import a multi-record data map, the source definition has a group for each

116 Chapter 7: Introduction to Change Data Extraction

Page 129: Implement CDC

table in the data map. A group contains metadata for the fields in the table. If you import a single table from a multi-record data map, the source definition has only a single group.

When you run a session that contains a mapping with source definitions for each table in a multi-record data map,PowerExchange reads the data set or file once for each source definition. When you run a session that contains amapping with a single source definition for all records in a multi-record data map, PowerExchange uses groupsource processing to read all of the records in the data set or file in a single pass.

For example, if you have a sequential file that contains three different record types, you can create a sourcedefinition for each record type. Then create a mapping that contains the three source definitions. When you run asession that contains the mapping, PowerExchange reads the sequential file three times.

Alternatively, if you import the data map as a multi-record data map and create a single multi-record sourcedefinition, you can use this multi-record source definition in a mapping. When you run a session that contains thismapping, PowerExchange reads the sequential file one time to extract the data.

When you import IMS data maps as multi-record data maps, you can use the source definitions only to processIMS unload data sets. You cannot use multi-record IMS source definitions to read all segments from an IMSdatabase in a single pass. To perform bulk data movement operations on IMS databases, create mappings thathave a source definition for each segment in the IMS database.

Using Group Source with CDC SourcesWhen you use PWX CDC application connections to extract change data, PowerExchange automatically usesgroup source processing and reads the change stream in a single pass for all source definitions in the mapping.All sources in the mapping must be the same data source type and must read the same change stream.

To create source definitions in Designer that can be used to extract change data, import source metadata by usingone of the following methods:

¨ Import a PowerExchange extraction map by using the Import from PowerExchange dialog box.

¨ Import the table definitions from relational databases, by using either the Import from PowerExchange dialogbox or the Import from Database dialog box.

Restriction: To read change data for nonrelational sources, you must import extraction maps fromPowerExchange.

Informatica recommends that you use extraction maps to create source definitions for all CDC sources. When youcreate source definitions from extraction maps, the mapping and session creation process is simpler for thefollowing reasons:

¨ The source definition contains the extraction map name, which eliminates the need to provide it when youconfigure the session.

¨ The source definition contains the PowerExchange-defined CDC columns, which eliminates the need to addthese columns to the source definition. The PowerExchange-defined columns include the change indicator andbefore image columns as well as the DTL__CAPX columns.

When you extract change data, PowerExchange uses group source processing for all source definitions in themapping. All source definitions must be for the same data source type, such as DB2, IMS, VSAM, or Oracle. Donot include multiple data source types in the mapping. Otherwise, the session fails with message PWXPC_10080.

For example, you cannot run a CDC session that contains a mapping with both VSAM and IMS source definitions,even though the change stream is the same. To extract change data for both IMS and VSAM data sources, createunique a mapping and session for the VSAM sources and a separate, unique mapping and session for the IMSsources. PowerExchange reads the change stream twice, once for the session with VSAM sources and once forthe session with IMS sources.

Group Source Processing in PowerExchange 117

Page 130: Implement CDC

If you create a workflow that contains multiple CDC sessions, PowerExchange uses a connection for eachsession, even if the sessions extract change data from the same change stream, such as the PowerExchangeLogger for MVS.

The following example mapping shows three DB2 sources, for which the source definitions were created fromextraction maps:

If you include this mapping in a session that uses a PWX DB2zOS CDC application connection, PowerExchangeuses group source processing to read the change stream and extract the changes for all three source tables.PowerExchange extracts change data in chronological order, based on when the UOWs were completed.PowerExchange passes the change data to PWXPC, and PWXPC provides the changes to the appropriate sourcequalifier.

Note: Because the example mapping uses source definitions created from extraction maps, it cannot be used forbulk data movement operations. However, mappings that use source definitions created from database relationalmetadata can be used for either change data extraction or bulk data movement.

Commit Processing with PWXPCThe PowerCenter Integration Service, in conjunction with PWXPC, commits data to the target based on commitproperties and the commit type. Commit properties specify the commit interval and the number of UOWs orchange records that you want to use as a basis for the commit. The commit type determines when thePowerCenter Integration Service commits data to the target.

By default, the Commit Type attribute on the session Properties tab specifies Target, which indicates target-based commit processing. For CDC sessions, the PowerCenter Integration Service always uses source-basedcommit processing, and PWXPC controls the timing of commit processing. When you run a CDC session thatspecifies target-based commit processing, the PowerCenter Integration Service automatically changes the committype to source-based and writes message WRT_8226 in the session log.

PWXPC ignores the Commit Interval attribute. To control commit processing, configure attributes on the PWXCDC Change and Real Time application connections.

118 Chapter 7: Introduction to Change Data Extraction

Page 131: Implement CDC

RELATED TOPICS:¨ “Commitment Control Options” on page 133

Controlling Commit ProcessingTo control commit processing, you can specify certain PWX CDC Real Time or Change application connectionattributes.

The following table describes the connection attributes that control commit processing:

Connection Attribute Real Time orChangeConnections

Description

Maximum Rows Per commit Both Maximum number of change records that PWXPC processes beforeit flushes the data buffer to commit the change data to the targets. Ifnecessary, PWXPC continues to process change records acrossUOW boundaries until this maximum rows threshold is met. PWXPCdoes not wait for a UOW boundary to commit the change data.Default is 0, which means that PWXPC does not use maximum rows.

Minimum Rows Per commit Real Time Minimum number of change records that PowerExchange reads fromthe change stream before it passes any commit records in thechange stream to PWXPC. Before reaching this minimum value,PowerExchange skips commit records and passes only the changerecords to PWXPC.Default is 0, which means that PowerExchange does not useminimum rows.

Real-time Flush Latency in milli-seconds

Real Time Number of milliseconds that must pass before PWXPC flushes thedata buffer to commit the change data to the targets. When thislatency period expires, PWXPC continues to read the changes in thecurrent UOW until the end of that UOW is reached. Then, PWXPCflushes the data buffer to commit the change data to the targets.Default is 0, which means that PWXPC uses 2,000 milliseconds.

UOW Count Both Number of UOWs that PWXPC processes before it flushes the databuffer to commit the change data to the targets.Default is 1.

You can specify values for the all of these commitment control attributes. However, PWXPC commits change dataonly when one of the following values is met:

¨ Maximum Rows Per commit

¨ Real-time Flush Latency in milli-seconds

¨ UOW Count

If you specify a value for the Minimum Rows Per commit attribute, this threshold must be met before a commitcan occur. However, PWXPC flushes the data buffer to commit the change data to the targets only whenMaximum Rows Per commit, Real-time Flush Latency in milli-seconds, or UOW Count is met, whichever isfirst.

After PWXPC commits the change data, it resets the UOW count, the maximum and minimum rows, and the real-time flush latency timer. PWXPC continues to read change data. Whenever one of the commitment control valuesis met, PWXPC commits that data to the targets. Commit processing continues until the CDC session is stopped,ends, or terminates abnormally. When the PWXPC CDC reader ends normally, PWXPC issues a final commit to

Commit Processing with PWXPC 119

Page 132: Implement CDC

flush all complete, buffered UOWs and their final restart tokens to the targets. Prior to ending, the PWXPC CDCreader writes the following message to the session log:

PWXPC_12075 [INFO] [CDCRestart] Session complete. Next session will restart at: Restart 1 [restart1_token] : Restart 2 [restart2_token]

Restriction: If you enable the Commit On End Of File attribute on the session Properties tab, duplicate data canoccur in the targets because the PowerCenter Integration Service commits any remaining change data in thebuffer to the targets. This final commit by the PowerCenter Integration Service occurs after the PWXPC CDCreader has committed all complete UOWs in the buffer, along with their restart tokens, to the targets. As a result,the final restart tokens might represent a point in the change stream that is earlier than final change data that thePowerCenter Integration Service commits to the targets. To prevent possible duplicate data when you restart CDCsessions, set the Commit Type attribute to Source and disable the Commit On End Of File attribute.

Maximum and Minimum Rows per CommitThe Maximum Rows Per commit attribute controls the size of the UOWs written to the targets. TheMinimum Rows Per commit attribute controls the size of the UOWs read from the change stream. You can usethese attributes to mitigate the effects of processing very small or very large UOWs.

Maximum Rows per CommitIf you have very large UOWs, you can use the Maximum Rows Per commit attribute to specify the maximumnumber of change records that PWXPC reads before it commits the change data to the targets. This attributecauses PWXPC to commit change data without waiting for a UOW boundary, which is called a subpacket commit.By using a subpacket commit for large UOWs, you can minimize storage use on the PowerCenter IntegrationService machine and lock contention on the target databases.

Warning: Because PWXPC can commit change data to the targets between UOW boundaries, relational integrity(RI) might be compromised. Do not use this connection attribute if you have targets in the CDC session with RIconstraints.

Generally, you should use the maximum rows attribute only if you have large UOWs that cannot be processedwithout impacting either the PowerCenter Integration Service machine or the target databases. For example, if youhave an application that makes 100,000 changes before it issues a commit, you can use the maximum rowsattribute to commit the change data before PWXPC reads all 100,000 change records. When the maximum rowslimit is met, PWXPC flushes the change data from the buffer on the PowerCenter Integration Service machine andcommits the data to the targets. After the commit processing, the RDBMS can release the locks in the targetdatabases for these change records and the PowerCenter Integration Service can reuse the buffer space for newchange records.

Minimum Rows per CommitIf your change data has many small UOWs, you can use the Minimum Rows Per commit attribute to createlarger UOWs of a more uniform size. Use this attribute to specify the minimum number of change records thatPowerExchange must pass to PWXPC before passing a commit record. Until the minimum rows value is met,PowerExchange discards any commit records that it reads from the change stream and passes only changerecords to PWXPC. After the minimum rows limit is met, PowerExchange passes the next commit record toPWXPC and then resets the minimum rows counter.

Online transactions that run in transaction control systems such as CICS and IMS often commit after making onlya few changes, which results in many, small UOWs in the change stream. PowerExchange and PWXPC canprocess fewer, larger UOWs more efficiently than many small UOWs. Therefore, if you use the minimum rows limitto increase the size of UOWs, you can improve CDC processing efficiency.

A minimum rows limit does not impact the relational integrity of the change data because PowerExchange doesnot create new commits points in the change stream data. PowerExchange simply skips some of the originalcommit records in the change stream.

120 Chapter 7: Introduction to Change Data Extraction

Page 133: Implement CDC

Target LatencyTarget latency is the total time that PWXPC uses to extract change data from the change stream and that thePowerCenter Integration Service uses to apply that data to the targets. If this processing occurs quickly, targetlatency is low.

The values you select for the commitment control attributes affect target latency. You must balance target latencyrequirements with resource consumption on the PowerCenter Integration Service machine and the targetdatabases.

Lower target latency results in higher resource consumption because the PowerCenter Integration Service mustflush the change data more frequently and the target databases must process more commit requests.

You can affect target latency by setting the commit control attributes.

The following default values can result in the lowest latency:

¨ 0 for Maximum Rows Per commit, which disables this option

¨ 0 for Minimum Rows Per commit, which disables this option

¨ 0 for Real-time Flush Latency in milli-seconds, which is equivalent to 2000 milliseconds or 2 seconds

¨ 1 for UOW Count

These values can decrease target latency because PWXPC commits changes after each UOW, or on UOWboundaries. However, these values also cause the highest resource consumption on the source system, thePowerCenter Integration Service machine, and the target databases. Alternatively, these values might decreasethroughput because change data flushes too frequently for the PowerCenter Integration Service or the targetdatabases to handle.

To lower resource consumption and potentially increase throughput for CDC sessions, specify a value greater thanthe default value for only one of the following attributes:

¨ Maximum Rows Per commit

¨ UOW Count

¨ Real-time Flush Latency in milli-seconds

Disable the unused attributes.

Examples of Commit ProcessingThe following examples show how the commitment control attributes affect commit processing with PWXPC.

Subpacket Commit and UOW Count - ExampleThis example uses the Maximum Rows Per commit and UOW Count attributes to control commit processing.The change data is composed of UOWs of the same size. Each UOW contains 1,000 change records. Thecommitment control attributes have the following values:

¨ 300 for Maximum Rows Per commit

¨ 0 for Minimum Rows Per commit, which disables this attribute

¨ 0 for Real-time Flush Latency in milli-seconds, which is equivalent to 2 seconds

¨ 1 for UOW Count

Based on the maximum rows value, PWXPC flushes the data buffer after reading the first 300 records in a UOW.This action commits the change data to the targets. PWXPC continues to commit change data to the targets every300 records.

Commit Processing with PWXPC 121

Page 134: Implement CDC

PWXPC commits on UOW boundaries only for the UOW count and real-time flush latency interval. If the real-timeflush latency interval expires before PWXPC reads 300 change records, PWXPC still commits based on themaximum rows value because that threshold is met before a UOW boundary occurs.

When the end of the UOW is read, PWXPC commits the change data because the UOW Count value is 1.PWXPC resets the UOW and maximum row counters and the real-time flush latency timer each time it commits.Because all of the UOWs have the same number of change records, PWXPC continues to read change data andto commit the data to the targets at the same points in each UOW.

In this example, PWXPC commits change data at the following points:

¨ 300 change records based on the maximum rows value

¨ 600 change records based on the maximum rows value

¨ 900 change records based on the maximum rows value

¨ 1,000 change records based on the UOW count value

UOW Count and Time-Based Commits - ExampleThis example uses the UOW Count and Real-time Flush Latency in milli-seconds attributes to control commitprocessing. The change data consists of UOWs of varying sizes. The commitment control attributes have thefollowing values:

¨ 0 for Maximum Rows Per commit, which disables this attribute

¨ 0 for Minimum Rows Per commit, which disables this attribute

¨ 5000 for Real-time Flush Latency in milli-seconds, which is equivalent to 5 seconds

¨ 1000 for UOW Count

Initially, PWXPC reads 900 complete UOWs in 5 seconds. Because the real-time flush latency interval hasexpired, PWXPC flushes the data buffer to commit the change data to the targets. PWXPC then resets both theUOW counter and real-time flush latency timer. When PWXPC reaches UOW 1,000, PWXPC does not commitchange data to the targets because the UOW counter was reset to 0 after the last commit.

PWXPC reads the next 1,000 UOWs in 4 seconds, which is less than the real-time flush latency timer. PWXPCcommits this change data to the target because the UOW counter has been met. After this commit, PWXPC thenresets the real-time flush latency timer and the UOW counter.

PWXPC continues to read change data and commit the data to the targets, based on the UOW count or the real-time flush latency flush time, whichever limit is met first.

In this example, PWXPC commits change data at the following points:

¨ After UOW 900 because the real-time latency flush latency timer matched first

¨ After UOW 1,900 because the UOW count matched first during the second commit cycle

Minimum Rows and UOW Count - ExampleThis example uses the Minimum Rows Per commit and UOW Count attributes to control commit processing. Thechange data consists of UOWs of the same size. Each UOW contains ten change records. The commitmentcontrol attributes have the following values:

¨ 0 for Maximum Rows Per commit, which disables this attribute

¨ 100 for Minimum Rows Per commit

¨ -1 for Real-time Flush Latency in milli-seconds, which is disables this attribute

¨ 10 for UOW Count

122 Chapter 7: Introduction to Change Data Extraction

Page 135: Implement CDC

PWXPC passes the minimum rows value to PowerExchange and requests change data from the change stream.Because the minimum rows value is 100, PowerExchange skips the commit records of the first nine UOWs. WhenPowerExchange reads the last change record in the tenth UOW, the minimum rows limit is met. So,PowerExchange passes the commit record for the tenth UOW to PWXPC and resets the minimum rows counter.PWXPC increases the UOW counter to one.

PowerExchange and PWXPC continue to read the change data until the UOW counter is 10. At this point, PWXPCflushes the data buffer to commit the change data to the targets and resets the UOW counter.

In this example, PWXPC commits change data after 1,000 change records, which is also after every 10 UOWsbecause each UOW contains 10 change records and the UOW Count is 10.

Offload ProcessingYou can use CDC offload processing and multithreaded processing to improve performance and efficiency of real-time CDC sessions.

You can use CDC offload processing to distribute processing to the PowerCenter Integration Service machinerunning the extraction, which reduces processing on the source system. You can also use CDC offload processingto copy change data to a remote system by using the PowerExchange Logger for LINUX, UNIX, and Windows.

You can use multithreaded processing to increase parallelism on the PowerCenter Integration Service machines.

CDC Offload ProcessingWhen you extract change data, PowerExchange maps the captured data to the columns in the extraction map.PowerExchange also performs any data manipulation operations that you defined in the extraction map, such aspopulating change-indicator and before-image columns or running expressions. This column-level processing ofchange data occurs in the PowerExchange Listener and can be CPU-intensive.

By default, PowerExchange performs column-level processing on the system on which the changes are captured.For MVS, DB2 for i5/OS, and Oracle sources, PowerExchange also runs the UOW Cleanser to reconstructcomplete UOWs from the change data in the change stream on the system.

To reduce the overhead of column-level and UOW Cleanser processing, you can use CDC offload processing.CDC offload processing moves the column-level and UOW Cleanser processing to the PowerCenter IntegrationService machine running the extraction. CDC offload processing can also be used by the PowerExchange Loggerfor Linux, UNIX, and Windows to copy change data to PowerExchange Logger log files on a remote system. Youcan then extract the change data from the remote system rather than the original source system.

Use CDC offload processing to help increase concurrency and throughput and decrease costs in the followingsituations:

¨ You have insufficient resources on the machine where the change data resides to run the number of concurrentextraction sessions you require.

¨ You have insufficient resources on the machine where the change data resides to provide the necessarythroughput you require.

¨ You have spare cycles on the PowerCenter Integration Service machine and those cycles are cheaper than thecycles on the machine on which the changes are captured.

Offload Processing 123

Page 136: Implement CDC

Multithreaded ProcessingIf you use CDC offload processing for change data extractions, you can also use multithreaded processing, whichmight improve help improve throughput even more. By default, PowerExchange performs column-level processingon the change stream as a single thread. If you use multithreaded processing, PowerExchange might be able toextract changes faster and more efficiently by processing more than one UOW simultaneously.

PowerExchange multithreaded processing splits a UOW into multiple threads on the PowerCenter IntegrationService machine. After the column-level processing completes, PowerExchange merges the threads and passesthe UOW to the PWXPC CDC reader for processing. Multithreaded processing works most efficiently whenPowerExchange on the source machine is supplying data fast enough to take full advantage of the multiplethreads on the PowerCenter Integration Service machine. If PowerExchange completely utilizes a single processoron the PowerCenter Integration Service machine, then multithreaded processing may provide increasedthroughput.

124 Chapter 7: Introduction to Change Data Extraction

Page 137: Implement CDC

C H A P T E R 8

Extracting Change DataThis chapter includes the following topics:

¨ Overview of Extracting Change Data, 125

¨ Task Flow for Extracting Change Data, 126

¨ Testing a Change Data Extraction, 126

¨ Configuring PowerCenter CDC Sessions, 128

¨ Creating Restart Tokens for Extractions, 135

¨ Displaying Restart Tokens, 135

¨ Configuring the Restart Token File, 136

Overview of Extracting Change DataUse PowerExchange in conjunction with PWXPC and PowerCenter to extract captured change data and write thedata to one or more targets. To extract change data that PowerExchange captures, you must import metadata forthe CDC sources and the targets of the change data in Designer. After creating the source and target definitions inDesigner, you must create a mapping and then an application connection, session, and workflow in WorkflowManager. You can create multiple mappings, sessions, and workflows based on the same source and targetdefinitions, if appropriate.

For relational data sources, you can import the metadata from either database definitions or PowerExchangeextraction maps. For nonrelational sources, you must import PowerExchange extraction maps.

Tip: Informatica recommends that you import the metadata from PowerExchange extraction maps instead of fromdatabase definitions. When you import extraction maps, the source definition contains all of the PowerExchange-generated CDC columns, such as the before image (BI) and change indicator (CI) columns. Additionally, PWXPCderives the extraction map name from the source definition so you do not need to code the extraction map namefor each source in the session properties.

Before starting a CDC session, you should create restart tokens to define an extraction start point in the changestream. Restart tokens might also be required for resuming extraction processing in a recovery scenario.

To stop a CDC session using real-time extraction mode based on certain user-defined events, you can configureevent table processing. Also, you can offload column-level extraction processing and any UOW Cleanserprocessing from the source system to the following remote locations:

¨ PowerCenter Integration Service machine

¨ A remote machine where the PowerExchange Logger for Linux, UNIX, and Windows runs

If you use offload processing with real-time extractions, you can also use multithreaded processing.

125

Page 138: Implement CDC

Task Flow for Extracting Change DataPerform the following tasks in the PowerExchange Navigator, PowerCenter Designer, and PowerCenter WorkflowManager to configure and start extraction processing.

Before you begin, complete configuration of the data source and PowerExchange for CDC, and create captureregistrations in the PowerExchange Navigator.

1. Edit the extraction map if necessary.

You can make the following changes:

¨ Deselect any column for which you do not want to extract the change data. PowerExchange still captureschange data for these columns.

¨ Add change indicator (CI) and before image (BI) columns.

2. To test the extraction map, perform a database row test on the extraction map in PowerExchange Navigator.

3. In Designer, import metadata for the sources and targets.

4. In Designer, configure a mapping to extract and process change data.

5. In Workflow Manager, configure a connection and session.

6. Create restart tokens for the CDC session.

7. Configure the restart token file.

8. If you want to stop extraction processing based on certain events, implement event table processing.

9. If you want to offload column-level extraction processing and UOW Cleanser processing from the sourcesystem to the PowerCenter Integration Service machine or PowerExchange Logger for Linux, UNIX, andWindows machine, configure offload processing. For real-time extractions, you can also configuremultithreaded processing.

10. Start the CDC session.

RELATED TOPICS:¨ “Creating Restart Tokens for Extractions” on page 135

¨ “Starting PowerCenter CDC Sessions” on page 140

¨ “Monitoring and Tuning Options” on page 148

¨ “Testing a Change Data Extraction” on page 126

Testing a Change Data ExtractionPerform a database row test in the PowerExchange Navigator to ensure that PowerExchange can retrieve datawhen the extraction map is used in a CDC session.

A database row test verifies that:

¨ PowerExchange has captured change data for a data source defined in a capture registration.

¨ PowerExchange Condense or the PowerExchange Logger for Linux, UNIX, and Windows has captured changedata for a capture registration, if applicable.

¨ The extraction map properly maps the captured change data.

126 Chapter 8: Extracting Change Data

Page 139: Implement CDC

To test change data extraction:

1. In the Resource Explorer of the PowerExchange Navigator, open the extraction group that includes theextraction map that you want to test.

2. Open the extraction map.

3. Select the extraction map and click File > Database Row Test.

4. In the Database Row Test dialog box, enter or edit the following information:

Field Description

DB Type An extraction mode indicator:- CAPXRT. Real-time extraction mode or continuous extraction mode.- CAPX. Batch extraction mode.

Location Node name for the location of the system on which the captured change data resides. Thisname must be defined in a NODE statement in the dbmover.cfg file on the Windows machinefrom which you run the database row test.

UserID and Password Optionally, a user ID and password that provides access to the source change data.

Application Name At least one character to represent the application name. For a row test, a unique applicationname is not required. PowerExchange does not retain the value that you specify.

SQL Statement A SQL SELECT statement that PowerExchange generates for the fields in the extraction map.You can edit this statement, if necessary.

In the statement, a table is identified in the following format:Schema.RegName_TableName

Where:- Schema is schema for the extraction map.- RegName is the name of the capture registration that corresponds to the extraction map.- TableName is the table name of the data source.

Note: If you enter CAPX in the DB Type field, you can only extract change data after PowerExchangeCondense or the PowerExchange Logger for Linux, UNIX, and Windows has closed at least one condense orlog file. Otherwise, PowerExchange displays no data in PowerExchange Navigator and writes the PWX-04520message in the PowerExchange message log on the extraction system. PowerExchange also writes thismessage if no change data for the data source has been captured, condensed, or logged.

5. Click Advanced.

6. In the CAPX Advanced Parameters or CAPXRT Advanced Parameters dialog box, enter information,including the following:

¨ If you use continuous extraction mode, enter the CAPX CAPI_CONNECTION name in the CAPIConnection Name field.

¨ If you use the PowerExchange Logger for Linux, UNIX, and Windows to offload change data to systemremote from the system on which it was captured, enter location of the extraction maps in the Locationfield.

7. Click OK.

8. Click Go.

The database row test returns each change from the extraction start point by column. The results include thePowerExchange-defined CDC columns, the DTL__ columns, which provide information such as the changetype, change timestamp, and user ID of the user who made the change.

Testing a Change Data Extraction 127

Page 140: Implement CDC

Configuring PowerCenter CDC SessionsAfter you import metadata for CDC data sources and targets into PowerCenter, you can create a mapping and aCDC session to extract change data. Before running CDC sessions, you must configure numerous session andconnection attributes.

Changing Default Values for Session and Connection AttributesCertain PowerCenter session and application connection attributes have default values that are only appropriatefor bulk data movement. You must change the values of these attributes for CDC sessions.

The following table summarizes these attributes and their recommended values:

Attribute Name AttributeLocation

RecommendedValue

Description

Commit Type Properties Tab Source Default is Target. The PowerCenter Integration Serviceautomatically overrides it to Source. However, you cannotdisable Commit On End Of File unless you change Commit Type to Source.

Commit On EndOf File

Properties Tab Disabled Default is enabled. The PowerCenter Integration Serviceperforms a commit when the session ends. This commitoccurs after PWXPC commits the restart tokens, which cancause an out-of-sync condition between the restart tokensand the target data. As a result, duplicate data can occurwhen CDC sessions restart.

RecoveryStrategy

Properties Tab Resume from lastcheckpoint

Default value is Fail task and continue workflow. To properlyrestart CDC session, PowerExchange CDC and PWXPCrequire that this option is set to Resume from lastcheckpoint.

Stop on errors Config Object Tab 1 Default value is 0. By default, the PowerCenter IntegrationService does not consider errors when writing to targets asfatal. The following types of error are non-fatal:- Key constraint violations- Loading nulls into a not null field- Database trigger responsesIf write errors occur, you might experience change data lossbecause PWXPC has advanced the restart tokens values.To maintain target data and restart token integrity, you mustset this option to 1.

Application Name ApplicationConnection

Code a uniquename for each CDCsession.

Default is the first 20 characters of the WorkFlow Name.Warning: The default might not result in a unique name.

RestartTokenFile Folder

ApplicationConnection

Default value Use the default value of $PMRootDir/Restart, which PWXPCcreates if it does not exist.

128 Chapter 8: Extracting Change Data

Page 141: Implement CDC

Attribute Name AttributeLocation

RecommendedValue

Description

RestartTokenFile Name

ApplicationConnection

Code a uniquename for each CDCsession.

If no value is entered for Application Name, the default isthe workflow name. Otherwise, the value for ApplicationName is used.Warning: The default may not result in a unique name.

Number of Runsto KeepRestartToken File

ApplicationConnection

1 or higher Default is 0. PWXPC keeps only one backup copy of therestart token initialization and termination files. Specify avalue greater than 0 so a history is available for recoverypurposes.

Configuring Application Connection AttributesTo extract change data, you must configure certain application connection attributes. For a complete list of allPWX CDC application connection attributes, see PowerExchange Interfaces for PowerCenter.

Image TypeFor update operations, use the Image Type attribute to configure the format of the change data that a CDCsession extracts.

Select one of the following options for the Image Type attribute:

¨ AI. After images only.

¨ BA. Before and after images.

Default is BA.

If you select BA for the Image Type attribute, PowerExchange provides the before-image (BI) and after-image (AI)data for the updated row as separate SQL operations:

¨ A DELETE with the before-image data

¨ An INSERT with the after-image data

Note: To select BA with batch or continuous extraction mode, you must configure PowerExchange Condense orthe PowerExchange Logger for Linux, UNIX, and Windows to log before and after images. Otherwise, you can onlyselect after images.

If you select AI for the Image Type attribute, PowerExchange provides the after-image data for updated row as aSQL UPDATE operation.

You can also configure one or more data columns in an extraction map with before-image (BI) columns. Use thePowerExchange Navigator to update the extraction map with before-image columns, which adds additionalcolumns to the extraction map with the name of DTL__BI_columnname. If you use BI columns, select AI for theImage Type attribute. PowerExchange then includes before-image data in any BI columns, along with the after-image data, in a single SQL UPDATE operation.

When you configure BI columns, you can make decisions about UPDATE operations in a mapping because thebefore and after-image data is contained in a single record. For example, you can use BI columns to handleupdate operations that change the value of a key column of a row. Some relational databases, such as DB2 for z/OS, allow update operations to key columns. The RDBMS understands that this operation is equivalent to deletingthe row and then re-adding it with a new primary key and logs the change as an update.

If you select AI for the Image Type attribute, PowerExchange provides these changes as an UPDATE operation.Because some relational databases do not allow updates to primary key columns, you cannot apply these changes

Configuring PowerCenter CDC Sessions 129

Page 142: Implement CDC

as updates. If you configure BI columns for key columns, you can then use the Flexible Key Customtransformation to be change any UPDATE operations for key columns into a DELETE operation followed by anINSERT operation.

Event Table ProcessingYou can use event table processing to stop the extraction of changes based on user-defined events, such as anend-of-day event. For example, to stop an extraction process every night, after all of the changes for the day havebeen processed, write a change to the event table at midnight. This change triggers PowerExchange to stopreading change data and shut down the extraction process after the current UOW completes.

Event table processing has the following rules and guidelines:

¨ You can only use event table processing with real-time or continuous extraction modes.

¨ You must create the event table, and define the applications that can update the table.

¨ You must register the event table for change data capture from the PowerExchange Navigator.

¨ A CDC session monitors a single event table. Each user-defined event requires its own event table and aseparate extraction process.

¨ The event table and all of the source tables in the CDC session must be of the same source type.

To implement event table processing:

1. Create an event table.

The event table must be of the same source type and on the same machine as the change data that isextracted. For example, if you extract DB2 change data on MVS, the event table must be a DB2 table in thesame DB2 subsystem as the DB2 source tables for the extraction.

2. In the PowerExchange Navigator, create a capture registration and extraction map for the event table.

When you create a capture registration, the PowerExchange Navigator generates an extraction map.

3. In PowerCenter, create a CDC session, and specify the extraction map name in the Event Table attribute onthe PWX CDC Real Time application connection.

4. When the defined event occurs, update the event table.

When PowerExchange reads the update to the event table, PowerExchange places an end-of-file (EOF) intothe change stream. PWXPC processes the EOF, passes it to the PowerCenter Integration Service, and thenshuts down the PowerExchange reader. The PowerCenter Integration Service completes writing all of thedata currently in the pipeline to the targets and then ends the CDC session.

CAPI Connection Name OverridePowerExchange allows a maximum of eight CAPI_CONNECTION statements in the DBMOVER configuration file.You can use multiple CAPI_CONNECTION statements to extract changes from more than one data source typewith a single PowerExchange Listener on a single machine. For example, you can extract changes for Oracle andDB2 for Linux, UNIX, and Windows through a single PowerExchange Listener by specifying multipleCAPI_CONNECTION statements in the dbmover.cfg file.

To specify the CAPI_CONNECTION statement that PowerExchange uses to extract change data in a CDCsession, code the name in the CAPI Connection Name Override attribute.

You must code CAPI_CONNECTION statements on the system where the change data resides so thatPowerExchange can extract change data for a data source type. If you use CDC offload processing, you must alsocode the CAPI_CONNECTION statements in the dbmover.cfg file on the PowerCenter Integration Service machine.

130 Chapter 8: Extracting Change Data

Page 143: Implement CDC

Idle TimeTo indicate whether a real-time or continuous extraction mode CDC session should run continuously or shutdownafter reaching the end-of-log (EOL), use the Idle Time attribute.

Enter one of the following values for the Idle Time attribute:

¨ -1. The CDC session runs continuously. PowerExchange returns end-of-file (EOF) only when the CDC sessionis manually stopped.

¨ 0. After reaching EOL, PowerExchange returns EOF and the CDC session ends.

¨ n. After reaching EOL, PowerExchange waits for n seconds and, if no new change data of interest arrives, theCDC session ends. Otherwise, the CDC session continues until PowerExchange waits for n seconds withoutreading new change data of interest.

Default is -1.

PowerExchange determines the EOL by using the current end of the change stream at the point thatPowerExchange started to read the change stream. PowerExchange uses the concept of EOL because thechange stream is generally not static, and so the actual end-of-log is continually moving forward. AfterPowerExchange reaches EOL, it writes the PWX-09967 message in the PowerExchange message log.

Typically, real-time and continuous extraction mode CDC sessions use the default value of -1 for the Idle Timeattribute. If necessary, you can manually stop a never-ending CDC session by using the PowerCenter WorkflowMonitor, pmcmd commands, or the PowerExchange STOPTASK command.

Alternatively, you can set the Idle Time attribute to 0. After PowerExchange reaches EOL, it returns an EOF toPWXPC. PWXPC and the PowerCenter Integration Service then perform the following processing:

1. PWXPC flushes all buffered UOWs and the ending restart tokens to the targets.

2. The CDC reader ends.

3. After the PowerCenter Integration Service finishes writing the flushed data to the targets, the writer ends.

4. After any post-session commands and tasks execute, the CDC session ends.

If you set the Idle Time attribute to a positive number, the following processing occurs:

1. PowerExchange reads the change stream until it reaches EOL, and then timing for the idle time begins.

2. If more data is in the change stream after EOL, PowerExchange continues to read the change stream, lookingfor change data of interest to the CDC session, as follows:

¨ If the idle time expires before PowerExchange reads a change record of interest for the CDC session,PowerExchange stops reading the change stream.

¨ If PowerExchange reads a change record of interest to the CDC session, PowerExchange restarts thetimer, passes the change data to PWXPC, and continues to read the change stream. This processingcontinues until the idle time expires.

3. After the idle time expires, PowerExchange passes an EOF to PWXPC.

4. PWXPC and the PowerCenter Integration Service perform the same processing as when the Idle Timeattribute is set to 0 and the CDC session ends.

If you set the Idle Time attribute to a low value, the CDC session might end before all available change data in thechange stream has been read. If you want a CDC session to end periodically, Informatica recommends that youset the Idle Time attribute to 0 because active systems are rarely idle.

When a CDC session ends because either the idle time value has been reached or a PowerExchange STOPTASKcommand has been issued, PWXPC writes the following message in the session log:

[PWXPC_10072] [INFO] [CDCDispatcher] session ended after waiting for [idle_time] seconds. Idle Time limit is reached

If you stop a never-ending CDC session with the PowerExchange STOPTASK command, PWXPC substitutes86400 for the idle_time variable in the PWXPC_10072 message.

Configuring PowerCenter CDC Sessions 131

Page 144: Implement CDC

Note: If you specify values for the Reader Time Limit and Idle Time attributes, the PowerCenter IntegrationService stops reading data from the source when the first one of these terminating conditions is reached. Becausethe reader time limit does not result in normal termination of a CDC session, Informatica recommends that you useonly the idle time limit.

Restart Control OptionsPWXPC uses the restart information to tell PowerExchange from which point to start reading the captured changedata. To specify restart information, PWXPC provides options that you must configure for each CDC session.

The following table describes the restart attributes you must configure for CDC sessions:

Connection Attribute Description

Application Name Application name for the CDC session. Specify a unique name for each CDC session. Theapplication name is case sensitive and cannot exceed 20 characters.Default is the first 20 characters of the workflow name.

RestartToken File Folder Directory name on the PowerCenter Integration Service machine that contains the restarttoken override file.Default is $PMRootDir/Restart.

RestartToken File Name File name in the RestartToken File Folder that contains the restart token override file.PWXPC uses the contents of this file, if any, in conjunction with the state information todetermine the restart point for the CDC session.Default is the Application Name, if specified, or the workflow name, if Application Name is notspecified.

Informatica recommends that you specify a value for the Application Name attribute, because the default valuemight not result in a unique name. The values for Application Name and RestartToken File Name attributesmust be unique for every CDC session. Non-unique values for either of these attributes can cause unpredictableresults that include session failures and potential data loss.

PowerExchange Flush LatencyPowerExchange reads change data into a buffer on the source machine, or on the PowerCenter IntegrationService machine if you use CDC offload processing. The PowerExchange Consumer API (CAPI) interface flushesthe buffer that contains the data to PWXPC on the PowerCenter Integration Service machine for processing whenthe one of the following conditions occurs:

¨ The buffer becomes full.

¨ The CAPI interface timeout, also called the PowerExchange flush latency, expires.

¨ A commit point occurs.

PowerExchange uses the flush latency value as the CAPI interface timeout value on the source machine, or on thePowerCenter Integration Service machine if you use CDC offload processing.

For CDC sessions that use real-time or continuous extraction mode, set the flush latency in thePWX Latency in seconds attribute of the PWX CDC Real Time application connection. For CDC sessions thatuse batch extraction mode, PowerExchange always uses two seconds for the flush latency.

Restriction: The value of PWX Latency in seconds impacts the speed with which a CDC session responds to astop command from Workflow Monitor or pmcmd, because PWXPC must wait for PowerExchange to returncontrol before it can handle the stop request. Informatica recommends that you use the default value of 2 secondsfor the PWX Latency in seconds attribute.

132 Chapter 8: Extracting Change Data

Page 145: Implement CDC

PowerExchange writes the message PWX-09957 in the PowerExchange message log to reflect the CAPI interfacetimeout value set from the flush latency value. If you select Retrieve PWX Log Entries on the applicationconnection, PWXPC also writes this message in the session log.

After PowerExchange flushes the change data to PWXPC, PWXPC provides the data to the appropriate sources inthe CDC session for further processing and the PowerCenter Integration Service commits the data to the targets.

Commitment Control OptionsPWXPC, in conjunction with PowerExchange and the PowerCenter Integration Service, controls the timing ofcommit processing for CDC sessions based on the values you code for the commitment control options.

To control commit processing, set one or more of the following connection attributes:

Maximum Rows Per commit

Maximum number of change records in a source UOW that PWXPC processes before it flushes the databuffer to commit the change data to the targets. If necessary, PWXPC continues to process change recordsacross UOW boundaries until the maximum rows limit is met. PWXPC does not wait for a UOW boundary tocommit the change data. After the maximum rows limit is met, PWXPC issues a real-time flush to commit thechange data and the restart tokens to the targets and writes the PWXPC_12128 message to the session log.PWXPC resets the maximum rows limit when a real-time flush occurs because either the maximum rows limitor UOW count is met or the real-time flush latency timer expires.

Note: The Maximum Rows Per commit attribute is a count of records within a UOW, unlike the UOW Countattribute that is a count of complete UOWs.

Default is 0, which means that PWXPC does not use maximum rows.

PWXPC uses the maximum rows limit to commit data before an end-UOW is received, a process also calledsub-packet commit. If you specify either 0 or no value, commits occur only on UOW boundaries. Otherwise,PWXPC uses the value that you specify to commit change records between UOW boundaries.

Warning: Because PWXPC can commit the change data to the targets between UOW boundaries, relationalintegrity (RI) might be compromised. Do not use this connection attribute if you have targets in the CDCsession with RI constraints.

The maximum rows limit is cumulative across all sources in the CDC session. PWXPC issues a real-time flushwhen the limit value is reached, regardless of the number of sources to which the changes were originallymade.

Use a maximum rows limit when extremely large UOWs in the change stream might cause locking issues onthe target database or resource issues on the node running the PowerCenter Integration Service. When youspecify a low maximum rows limit, the session consumes more system resources on the PowerCenterIntegration Service and target systems because PWXPC flushes data to the targets more frequently.

For example, a UOW contains 900 changes for one source followed by 100 changes for a second source andthen 500 changes for the first source. If you set the maximum rows value to 1000, PWXPC issues the commitafter reading 1,000 change records. In this example, the commit occurs after PWXPC processes the 100changes for the second source.

Minimum Rows Per commit

For real-time or continuous extraction mode, minimum number of change records that PowerExchange readsfrom the change stream before it passes a commit record to PWXPC. Until the minimum rows limit is met,PowerExchange discards any commit records that it reads from the change stream and passes only changerecords to PWXPC. After the minimum rows limit is met, PowerExchange passes the next commit record toPWXPC and then resets the minimum rows counter.

Default is 0, which means that PowerExchange does not use minimum rows.

Configuring PowerCenter CDC Sessions 133

Page 146: Implement CDC

If you specify a minimum rows limit, PowerExchange changes the number of change records in a UOW tomatch or exceed the limit. PWXPC does not commit change data to the targets when the minimum rows limitoccurs. PWXPC only commits change data to the targets based on the values of theMaximum Rows Per commit, Real-Time Flush Latency in milli-seconds, and UOW Count attributes.

A minimum rows limit does not impact the relational integrity of the change data because PowerExchangedoes not create new commits points in the change stream data. It merely skips some of the original commitrecords in the change stream.

If your change data has many small UOWs, you can set the Minimum Rows Per commit attribute to createlarger UOWs of a more uniform size. Online transactions that run in transaction control systems such as CICSand IMS often commit after making only a few changes, which results in many, small UOWs in the changestream. PowerExchange and PWXPC process fewer, larger UOWs more efficiently than many small UOWs.By using the minimum rows limit to increase the size of UOWs, you can improve CDC processing efficiency.

Real-Time Flush Latency in milli-seconds

For real-time or continuous extraction mode, number of milliseconds that must pass before PWXPC flushesthe data buffer to commit the change data to the targets. After the flush latency interval expires and PWXPCreaches a UOW boundary, PWXPC issues a real-time flush to commit the change data and the restart tokensto the targets and writes the PWXPC_10082 message in the session log. PWXPC resets the flush latencyinterval when a real-time flush occurs because either the interval expires, or one of the UOW count ormaximum row limit is met.

Enter one of the following values for the flush latency interval:

¨ -1. Disables data flushes based on time.

¨ 0 to 2000. Interval set to 2000 milliseconds, or 2 seconds.

¨ 2000 to 86400.Interval set to the specified value.

Default is 0, which means that PWXPC uses 2,000 milliseconds.

If you set the flush latency interval value is 0 or higher, PWXPC flushes the change data for all completeUOWs after the interval expires and the next UOW boundary occurs. The lower you set the flush latencyinterval value, the faster you commit change data to the targets. Therefore, if you require the lowest possiblelatency for the apply of changes to the targets, specify a low value for the flush latency interval.

When you specify low flush latency intervals, the CDC session might consume more system resources on thePowerCenter Integration Service and target systems because PWXPC commits to the targets morefrequently. When you choose the flush latency interval value, you must balance performance and resourceconsumption with latency requirements.

UOW Count

Number of complete UOWs that PWXPC reads from the change stream before flushing the change data to thetargets. As PWXPC reads change data from PowerExchange and provides that data to the appropriate sourcein the CDC session, it counts the number of UOWs. After the UOW count value is reached, PWXPC issues areal-time flush to commit the change data and the restart tokens to the targets, and writes the PWXPC_10081message in the session log. PWXPC resets the UOW count when a real-time flush occurs because the UOWcount or maximum rows limit is met, or the flush latency interval expires.

Enter one of the following for the UOW count value:

¨ -1 or 0. PWXPC does not use the UOW Count attribute to control commit processing.

¨ 1 to 999999999. PWXPC flushes change data after reading the number of UOWs specified byUOW Count attribute.

Default is 1.

134 Chapter 8: Extracting Change Data

Page 147: Implement CDC

The lower you set the value for the UOW Count attribute, the faster that PWXPC flushes change data to thetargets. To achieve the lowest possible latency for applying change data to targets, set the UOW Countattribute to 1. However, the lowest possible latency for applying change data also results in the highestpossible resource consumption on the PowerCenter Integration Service and the target systems.

Commit processing for CDC sessions is not controlled by a single commitment control attribute. TheMaximum Rows Per commit, Real-Time Flush Latency in milli-seconds, and UOW Count values all result in areal-time flush of change data, which causes the data and restart tokens to be committed to the targets. When youchoose values for the UOW Count, Real-Time Flush Latency in milli-seconds, andMaximum Rows Per commit attributes, balance performance and resource consumption with latencyrequirements.

Warning: You must ensure that the session properties Commit Type attribute specifies Source and that theCommit at End of File attribute is disabled. By default, the Commit at End of File attribute is enabled, whichcauses the PowerCenter Integration Service to write additional data to the targets after the CDC reader hascommitted the restart tokens and shut down. As a result, when you restart the CDC session, duplicate data mightbe written to the targets.

For more information, see “Commit Processing with PWXPC” on page 118.

Creating Restart Tokens for ExtractionsBefore you extract change data, you must establish an extraction start point. An optimal extraction start pointmatches a time in the change stream that occurs after the target has been synchronized with the source but beforeany new changes occur for the source. Usually, this point is the end of the change stream because changes to thesource are inhibited until the target is materialized and restart tokens are generated.

You can generate current restart tokens for the end of the change stream by using one the following methods:

¨ PWXPC restart token file. Generate current restart tokens for CDC sessions that use real-time or continuousextraction mode by coding the CURRENT_RESTART option on the RESTART1 and RESTART2 specialoverride statements in the PWXPC restart token file. When the session executes, PWXPC requests thatPowerExchange provide restart tokens for the current end of the change stream, which PWXPC then uses asthe extraction start point.

¨ Database Row Test. Generate current restart tokens for sources by performing a database row test inPowerExchange Navigator and coding a SELECT CURRENT_RESTART SQL statement.

¨ DTLUAPPL utility. Generate current restart tokens for sources by using the GENERATE RSTKKN option inthe DTLUAPPL utility.

If you use a PowerExchange utility or the PowerExchange Navigator to generate restart tokens, edit the restarttoken file that PWXPC uses to specify the token values before you start the CDC session.

Displaying Restart TokensIn the PowerExchange Navigator, you can perform a database row test on an extraction map to display the restarttoken pair for each row of change data. The database row test output includes the following columns for the tokenvalues:

¨ DTL__CAPXRESTART1 column for the sequence token

¨ DTL__CAPXRESTART2 column for the restart token

Creating Restart Tokens for Extractions 135

Page 148: Implement CDC

If you include the DTL__CAPXRESTART1 and DTL__CAPXRESTART2 columns in your PowerCenter sourcedefinition, PowerExchange provides the restart tokens for each row when you extract change data in a CDCsession.

When a CDC session runs, PowerExchange and PWXPC display restart token values in the following messages:

¨ In the messages PWX-04565 and PWX-09959, the sequence token is in the Sequence field and restart token isin the PowerExchange Logger field.

¨ In the messages PWXPC_12060 and PWXPC_12068, the sequence token is in the Restart Token 1 field andthe restart token is in the Restart Token 2 field.

¨ In the messages PWXPC_10081, PWXPC_10082, and PWXPC_12128, the sequence token is the first tokenvalue and is followed by the restart token.

When you use the DTLUAPPL utility to generate restart tokens, use the PRINT statement to display the generatedvalues. In the PRINT output, DTLUAPPL displays the sequence token, without the usual trailing eight zeros, in theSequence field and displays the restart token in the Restart field.

Configuring the Restart Token FileWhen you configure the CDC session in PowerCenter, specify the name and location of the restart token file in thefollowing attributes of the source PWX CDC application connection:

¨ RestartToken File Folder. Specify the directory that contains the restart token file. If the folder does not existand the attribute contains the default value of $PMRootDir/Restart, PWXPC creates it. PWXPC does not createany other restart token folder name.

¨ RestartToken File Name. Specify the unique name of the restart token file. If you do not specify a value in thisattribute, PWXPC uses the value of the Application Name, if available. Otherwise, PWXPC uses the name ofthe workflow. Because this name must be unique, Informatica recommends that you always code a value forthe RestartToken File Name attribute.

When you run a CDC session, PWXPC verifies that the restart token file exists. If one does not exist, PWXPCuses the name specified in the RestartToken File Name attribute to create an empty restart token file.

Restriction: The value of RestartToken File Name attribute in must be unique for every CDC session. Non-unique file names can cause unpredictable results, such as change data loss and session failures.

To locate the restart token file name for a CDC session, check the following places:

¨ For existing CDC sessions, message PWXPC_12057 in the session log contains the restart token file folderand the restart token file name.

¨ In Workflow Manager, the PWX CDC application connection associated with the source in the CDC sessioncontains the restart token file name and folder location. If the restart token file name is not specified in theapplication connection, PWXPC uses the application name, if specified. Otherwise, PWXPC uses the workflowname.

Before you run a CDC session for the first time, configure the restart token file to specify the point in the changestream from which PowerExchange begins to extract change data. You can also configure the restart token file toadd new sources to a CDC session or to restart change data extraction from a specific point in the change stream.

136 Chapter 8: Extracting Change Data

Page 149: Implement CDC

Restart Token File StatementsYou can use the following types of statements in a the restart token file:

¨ Comment

¨ Explicit override. Specify a restart token pair for a specific source. You must provide the PowerExchangeextraction map name.

¨ Special override. Specify a restart token pair for one or more sources. You can provide a specific restart tokenpair or request that PowerExchange use the current restart point.

Restart Token File Statement SyntaxFor the comment statements, use the following syntax:

<!-- comment_text

For explicit override statements, use the following syntax:

extraction_map_name=sequence_tokenextraction_map_name=restart_token

For special override statements, use the following syntax:

RESTART1={sequence_token|CURRENT_RESTART}RESTART2={restart_token|CURRENT_RESTART}

The following rules and guidelines apply:

¨ Statements can begin in any column.

¨ All statements are optional.

¨ Do not include blank lines between statements.

¨ Comment lines must begin with:<!--

¨ Per file, you can specify one or more explicit override statements and one special override statement.

¨ An explicit override statement for a source takes precedence over any special override statement.

Comment StatementsYou can use the comment statement anywhere in the restart token file.

Comment statements must begin with:

<!--

Explicit Override StatementsUse the explicit override statement to specify the restart token pair for a specific source. Each source specificationconsists of a pair of restart tokens containing the source extraction map name with the restart token values. Definethe source by specifying the extraction map name. A source can have multiple extraction maps and, therefore,multiple extraction map names.

You can code explicit override statements for one or more sources in a CDC session. Alternatively, you can useexplicit override statements in conjunction with the special override statement to provide restart tokens for allsources in a CDC session.

When you warm start a CDC session, an explicit override statement for a source overrides the restart tokensstored in the state table or file for that source.

The explicit override statement has the following parameters:

Configuring the Restart Token File 137

Page 150: Implement CDC

extraction_map_name=restart1_token and extraction_map_name=restart2_token

The PowerExchange extraction map name and the sequence and restart tokens for the source.

extraction_ map_name

The extraction map name for the data source. To determine the extraction map name, check one of thefollowing:

¨ For CDC data map sources, the Schema Name Override and Map Name Override attributes in thesession properties. These attributes override the schema and map names of the source extractionmap.

¨ For CDC data map sources, the Schema Name and Map Name values in the source MetadataExtensions in Designer.

¨ For relational sources, the Extraction Map Name attribute in the session properties.

restart1_token

The sequence token part of the restart token pair, which varies based on data source type.

restart2_token

The restart token part of the restart token pair, which varies based on data source type.

Special Override StatementUse the special override statement to specify or generate restart tokens for one or more sources. You mustspecify both the RESTART1 and RESTART2 parameters.

You can use the special override statement to provide restart tokens for all sources in a CDC session.Alternatively, you can use explicit override statements in conjunction with the special override statement to provideor override restart tokens for all sources in a CDC session.

When you warm start a CDC session, the special override statement overrides the restart tokens stored in the thestate table or file for all sources, except those sources specified in explicit override statements.

The special override statement has the following parameters:

RESTART1={restart1_token|CURRENT_RESTART} and RESTART2={restart2_token|CURRENT_RESTART}

The sequence token and restart token in the restart token pair or the current end of the change stream.

restart1_token

The sequence token part of the restart token pair, which varies based on data source type.

restart2_token

The restart token part of the restart token pair, which varies based on data source type.

CURRENT_RESTART

PowerExchange generates current restart tokens. The PWXPC CDC reader opens a separate connectionto PowerExchange to request generation of current restart tokens, and then provides the generatedrestart tokens to all applicable sources.

Restriction: You can only use CURRENT_RESTART for CDC sessions that use real-time andcontinuous extraction mode. You cannot use this option for CDC sessions that use batch extractionmode.

You can also generate current restart tokens in the Database Row Test dialog box in thePowerExchange Navigator.

138 Chapter 8: Extracting Change Data

Page 151: Implement CDC

Restart Token File - ExampleIn the example, a CDC session contains seven source tables. This restart token file specifies explicit overridestatements to provide the restart tokens for three sources and the special override statement to provide the restarttokens for the remainder of the source.

The restart token file contains the following statements:

<!-- Restart Tokens for existing tables -->restart1=000000AD775600000000000000AD77560000000000000000Restart2=C1E4E2D34040000000AD5F2C00000000<!-- Restart Tokens for the Table: rrtb0001_RRTB_SRC_001 -->d1dsn9.rrtb0001_RRTB_SRC_001=0000060D1DB2000000000000060D1DB20000000000000000d1dsn9.rrtb0001_RRTB_SRC_001=C1E4E2D340400000013FF36200000000<!-- Restart Tokens for the Table: rrtb0001_RRTB_SRC_002 -->d1dsn9.rrtb0002_RRTB_SRC_002=000000A3719500000000000000A371950000000000000000d1dsn9.rrtb0002_RRTB_SRC_002=C1E4E2D34040000000968FC600000000<!-- Restart Tokens for the Table: rrtb0001_RRTB_SRC_004 -->d1dsn9.rrtb0004_RRTB_SRC_004=000006D84E7800000000000006D84E780000000000000000d1dsn9.rrtb0004_RRTB_SRC_004=C1E4E2D340400000060D1E6100000000

When you warm start the CDC session, PWXPC reads the restart token file to process any override statements forrestart tokens. In this case, the restart token file overrides all restart tokens for all sources in the CDC session.After resolving the restart tokens for all sources, PWXPC writes message PWXPC_12060 to the session log withthe following information:

===============================Session restart information:===============================Extraction Map Name Restart Token 1 Restart Token 2 Sourced1dsn9.rrtb0001_RRTB_SRC_001 0000060D1DB2000000000000060D1DB20000000000000000 C1E4E2D340400000013FF36200000000 Restart file d1dsn9.rrtb0002_RRTB_SRC_002 000000A3719500000000000000A371950000000000000000 C1E4E2D34040000000968FC600000000 Restart file d1dsn9.rrtb0003_RRTB_SRC_003 000000AD775600000000000000AD77560000000000000000 C1E4E2D34040000000AD5F2C00000000 Restart file (special override)d1dsn9.rrtb0004_RRTB_SRC_004 000006D84E7800000000000006D84E780000000000000000 C1E4E2D340400000060D1E6100000000 Restart file d1dsn9.rrtb0005_RRTB_SRC_005 000000AD775600000000000000AD77560000000000000000 C1E4E2D34040000000AD5F2C00000000 Restart file (special override)d1dsn9.rrtb0006_RRTB_SRC_006 000000AD775600000000000000AD77560000000000000000 C1E4E2D34040000000AD5F2C00000000 Restart file (special override)d1dsn9.rrtb0007_RRTB_SRC_007 000000AD775600000000000000AD77560000000000000000 C1E4E2D34040000000AD5F2C00000000 Restart file (special override)

PWXPC indicates the source of the restart token values for each source. For the sources that had explicit overridestatements in the restart token file, PWXPC writes “Restart file” in the Source column.

For the sources to which PWXPC assigns the special override restart tokens, PWXPC writes “Restart file (specialoverride)” in the Source column.

Configuring the Restart Token File 139

Page 152: Implement CDC

C H A P T E R 9

Managing Change Data ExtractionsThis chapter includes the following topics:

¨ Starting PowerCenter CDC Sessions, 140

¨ Stopping PowerCenter CDC Sessions, 142

¨ Changing PowerCenter CDC Sessions, 144

¨ Recovering PowerCenter CDC Sessions, 146

Starting PowerCenter CDC SessionsUse Workflow Manager, Workflow Monitor, or pmcmd to start a workflow or task for a CDC session. You can startthe entire workflow, part of a workflow, or a task in the workflow. You can do a cold start, warm start, or recoverystart. The method you use determines how PWXPC acquires the restart information.

Use one of the following methods to start a CDC session:

Cold start

To cold start a CDC session, use the Cold Start command in Workflow Manager or Workflow Monitor. You canalso use the pmcmd starttask or startworkflow commands with the norecovery option. A CDC session thatuses real-time or continuous extraction mode runs continuously until it is stopped or interrupted. A CDCsession that uses batch extraction mode runs until it reaches the end of log (EOL) or it is stopped orinterrupted.

When you cold start a CDC session, PWXPC uses the restart token file to acquire restart tokens for allsources. PWXPC does not read the state tables or file or makes any attempt to recover the session.

Warm start

To warm start a CDC session, use the Start or Restart commands in Workflow Manager or Workflow Monitor.You can also use the pmcmd starttask or startworkflow commands. A CDC session that uses real-time orextraction mode runs continuously until it is stopped or interrupted. A CDC session that uses batch extractionmode runs until it reaches EOL or it is stopped or interrupted.

When you warm start a CDC session, PWXPC reconciles any restart tokens provided in the restart token filewith any restart tokens that exist in the state tables or file. If necessary, PWXPC performs recoveryprocessing.

Recovery start

To start recovery for a CDC session, use the Recover command from Workflow Manager or Workflow Monitor.You can also use the pmcmd recoverworkflow command or the starttask or startworkflow commands with therecovery option. When recovery completes, the CDC session ends.

140

Page 153: Implement CDC

When you recover a CDC session, PWXPC reads the restart tokens from any applicable state tables or file. Ifnecessary, PWXPC performs recovery processing. PWXPC updates the restart token file with the restarttokens for each source in the CDC session, and then the session ends. To begin extracting change dataagain, either cold start or warm start the session.

Cold Start ProcessingCold start workflows and tasks by using the Cold Start command in Workflow Manager or Workflow Monitor. Youcan also use the pmcmd starttask or startworkflow commands with the norecovery option.

After you request a cold start for a CDC session, the following processing occurs:

1. PWXPC writes the following message in the session log:PWXPC_12091 [INFO] [CDCRestart] Cold start requested

2. PWXPC reads the restart tokens from only the restart token file and associates a restart token with eachsource in the session.

3. PWXPC creates the initialization restart token file with the initial restart tokens.

4. PWXPC commits the restart tokens for each source to the appropriate state tables or file and then writes themessage PWXPC_12104 to the session log.

5. PWXPC passes the restart tokens to PowerExchange. PowerExchange begins extracting change data andpassing the data to PWXPC for processing.

6. PWXPC continues processing change data from PowerExchange and commits the data and restart tokens tothe targets. This processing continues until the session ends or is stopped.

Warm Start ProcessingWarm start workflows and tasks by using the Start or Restart command in Workflow Manager or Workflow Monitor.You can also use the pmcmd starttask or startworkflow commands.

When you warm start a workflow or task, PWXPC automatically performs recovery. You do not need to recoverfailed workflows and tasks before you restart them.

After you request a warm start for a CDC session, the following processing occurs:

1. PWXPC writes the following message in the session log:PWXPC_12092 [INFO] [CDCRestart] Warm start requested. Targets will be resynchronized automatically if required

2. PWXPC queries the PowerCenter Integration Service about the commit levels of all targets. If all targets in thesession have the same commit level, PWXPC skips recovery processing.

3. PWXPC reconciles the restart tokens from the restart token file and from the state tables or file.

Restriction: If a CDC session requires recovery processing, PWXPC does not use the restart token file.Consequently, you cannot override restart tokens for sources.

4. PWXPC creates the initialization restart token file with the reconciled restart tokens.

5. If recovery is required, PWXPC re-reads the change data for the last unit-of-work (UOW) that was committedto the targets with the highest commit level and flushes the data to those targets with lower commit levels.The PowerCenter Integration Service commits flushed change data and restart tokens to any relationaltargets and updates any nonrelational files.

6. If recovery is not required and the reconciled restart tokens differ from those in the state tables or file,PWXPC commits the reconciled restart tokens and then writes message PWXPC_12104 to the session log.

Starting PowerCenter CDC Sessions 141

Page 154: Implement CDC

7. PWXPC passes the restart tokens to PowerExchange. PowerExchange begins extracting change data andpassing the data to PWXPC for processing.

8. PWXPC continues processing change data from PowerExchange and commits the data and restart tokens tothe targets. This processing continues until the session ends or is stopped.

Recovery ProcessingRecover workflows and tasks by selecting the Recover command in Workflow Manager or Workflow Monitor. Youcan also use the pmcmd recoverworkflow command, or the starttask or startworkflow command with the recoveryoption.

You can use recovery to populate the restart token file with the restart tokens for all sources in a CDC session sothat you can then cold start the CDC session or to ensure that the targets and restart tokens are in a consistentstate. However, you do not need to recover failed workflows and tasks before you restart them because PWXPCautomatically performs recovery processing when you warm start a workflow or task.

After you request recovery for a CDC session, the following processing occurs:

1. PWXPC writes the following message in the session log:PWXPC_12093 [INFO] [CDCRestart] Recovery run requested. Targets will be resynchronized if required and processing will terminate

2. PWXPC queries the PowerCenter Integration Service about the commit levels of all targets. If all targets in thesession have the same commit level, PWXPC skips recovery processing.

3. PWXPC reads the restart tokens from the recovery state tables or file.

Restriction: If a CDC session requires recovery processing, PWXPC does not use the restart token file.Consequently, you cannot override restart tokens for sources.

4. PWXPC creates the initialization restart token file with the reconciled restart tokens.

5. If recovery is required, PWXPC re-reads the change data for the last UOW that was committed to the targetswith the highest commit level and flushes the data to those targets with lower commit levels. ThePowerCenter Integration Service commits any flushed change data and restart tokens to any relationaltargets, and updates any nonrelational files.

6. PWXPC updates the restart token file with the final restart tokens, creates the termination restart token file,and ends.

To process change data from the point of recovery, warm start or cold start the workflow or task.

Stopping PowerCenter CDC SessionsYou can stop CDC sessions from PowerCenter or PowerExchange. In PowerCenter, issue the Stop or Abortcommand in Workflow Monitor. You can also use pmcmd stoptask, stopworkflow, aborttask, or abortworkflowcommands. In PowerExchange, issue the STOPTASK command or run the DTLUTSK utility.

Use one of the following methods to stop a running CDC session:

Stop

Use the Stop command in Workflow Monitor or the pmcmd stoptask or stopworkflow commands. After thePWXPC CDC reader and PowerCenter Integration Service process all of the data in the pipeline and shutdown, the session ends.

142 Chapter 9: Managing Change Data Extractions

Page 155: Implement CDC

STOPTASK

Use the PowerExchange STOPTASK command. You can run the STOPTASK command on the source systemthat is extracting the change data, from the PowerExchange Navigator, or by using pwxcmd or the DTLUTSKutility. When you issue the STOPTASK command, PowerExchange stops the extraction task in thePowerExchange Listener and passes an EOF to the PowerCenter Integration Service, which ends the session.

Abort

Use the Abort command in Workflow Monitor or the pmcmd aborttask or abortworkflow commands. When youabort a CDC session, the PowerCenter Integration Service waits 60 seconds to allow the readers and thewriters time to process all of the data in the pipeline and shut down. If the PowerCenter Integration Servicecannot finish processing and committing data within this timeout period, it kills the DTM process and ends thesession.

Stop Command ProcessingStop CDC sessions and workflows by using the Stop command in Workflow Monitor or the pmcmd stopttask orstopworkflow command. You can also use the PowerExchange STOPTASK command.

After you issue a stop command in PowerCenter or PowerExchange, the following processing occurs:

1. If you use a PowerCenter stop command, the PowerCenter Integration Service requests PWXPC to stop.

If you use a PowerExchange stop command, PowerExchange sends an EOF to PWXPC.

2. When PWXPC receives an EOF, it flushes any complete and uncommitted UOWs with the associated restarttokens to the targets. PWXPC then writes the messages PWXPC_12101 and PWXPC_12068 to the sessionlog.

3. The PowerCenter Integration Service processes all of data in the pipeline and writes it to the targets.

4. The PowerCenter Integration Service sends an acknowledgment to PWXPC indicating that the targets havebeen updated.

5. PWXPC writes the termination restart token file, and then writes the message PWXPC_12075 to the sessionlog.

6. The PWXPC CDC reader shuts down.

7. The PowerCenter Integration Service performs any post-session tasks and ends the session.

Terminating ConditionsTo stop a CDC session based on a user-defined event or at EOL, configure a termination condition in the session.A terminating condition determines when the PWXPC stops reading change data from the sources and ends theCDC session. After PWXPC reaches a terminating condition, it flushes the change data to the targets and passesan EOF to the PowerCenter Integration Service. The PowerCenter Integration Service commits the data to thetargets and ends the session.

You can configure the following termination conditions for CDC sessions:

¨ Event table processing. If you specify an extraction map table in the Event Table attribute of the PWX CDCReal Time application connection, PowerExchange, after it reads a change record for the event table, passesEOF to PWXPC to end the CDC session.

¨ Idle Time. If you specify 0 for the Idle Time attribute on a PWX CDC Real Time application connection,PowerExchange, after it reaches EOL, passes EOF to PWXPC to end the CDC session.

¨ Batch extraction mode. If you use batch extraction mode by configuring a PWX CDC Change applicationconnection, PowerExchange, after it reads all closed PowerExchange Condense condense files orPowerExchange Logger for Linux, UNIX, and Windows log files, passes PWXPC EOF to end the CDC session.

Stopping PowerCenter CDC Sessions 143

Page 156: Implement CDC

Changing PowerCenter CDC SessionsYou can add new sources and targets to an existing CDC sessions. Afterward, you must cold start the session.

Because a cold start is required, you must also get the latest restart tokens for the original sources prior torestarting the session. To do so, you can perform a recovery.

To change a PowerCenter CDC session:

1. Stop the workflow.

2. After the workflow ends, recover the CDC session.

When you recover tasks, PWXPC writes the ending restart tokens for all sources in a CDC session to therestart token file that you specified on the PWX CDC application connection.

3. Make changes to the session or workflow, if necessary.

4. Verify that the restart token file in the source CDC connection points to the same restart token file updated inthe recovery.

5. If you add sources to the CDC session, add statements to the restart token file that provide restart tokens forthe new sources.

6. If you remove sources from the CDC session, update the restart token file to remove their restart tokens.

7. Cold start the CDC session.

Examples of Creating a Restart PointThe following examples show different methods of creating a restart point for a source table that is added to anexisting CDC session. The first example uses the CURRENT_RESTART option of the special override statementin the restart token file to generate current restart tokens. The second example uses DTLUAPPL to generatecurrent restart tokens.

Adding a New Source and Use CURRENT_RESTART to Create Restart Tokens- ExampleIn this example, a new source table, RRTB_SRC_004, is added to an existing CDC session that contains threesources. The restart points for the existing sources are maintained. For the new source, the example uses theCURRENT_RESTART option in the restart token file to generate a restart token that represents the current end ofthe change stream.

To add a new source and use CURRENT_RESTART to create restart tokens:

1. To stop the workflow, select the Stop command in Workflow Monitor.

2. After the workflow stops, select the Recover Task command in Workflow Monitor to run a recovery session.

PWXPC writes the following messages in the session log:PWXPC_12060 [INFO] [CDCRestart]

===============================Session restart information:===============================Extraction Map Name Restart Token 1 Restart Token 2 Sourced1dsn9.rrtb0002_RRTB_SRC_002 000000AD220F00000000000000AD220F0000000000000000 C1E4E2D34040000000AD0D9C00000000 GMD storaged1dsn9.rrtb0001_RRTB_SRC_001 000000AD220F00000000000000AD220F0000000000000000 C1E4E2D34040000000AD0D9C00000000 GMD storaged1dsn9.rrtb0003_RRTB_SRC_003 000000AD220F00000000000000AD220F0000000000000000 C1E4E2D34040000000AD0D9C00000000 GMD storage

PWXPC also writes the restart tokens in the restart token file specified in the CDC application connection.

3. Edit the mapping, session, and workflow to add the new source, RRTB_SRC_004.

4. Edit the restart token file to specify the CURRENT_RESTART option for the new source.

144 Chapter 9: Managing Change Data Extractions

Page 157: Implement CDC

The updated file appears as follows:<!-- existing sourcesd1dsn9.rrtb0001_RRTB_SRC_001=000000AD220F00000000000000AD220F0000000000000000d1dsn9.rrtb0001_RRTB_SRC_001=C1E4E2D34040000000AD0D9C00000000d1dsn9.rrtb0002_RRTB_SRC_002=000000AD220F00000000000000AD220F0000000000000000d1dsn9.rrtb0002_RRTB_SRC_002=C1E4E2D34040000000AD0D9C00000000d1dsn9.rrtb0003_RRTB_SRC_003=000000AD220F00000000000000AD220F0000000000000000d1dsn9.rrtb0003_RRTB_SRC_003=C1E4E2D34040000000AD0D9C00000000<!-- new sourceRESTART1=CURRENT_RESTART RESTART2=CURRENT_RESTART

5. Cold start the session.

PWXPC connects to PowerExchange and generates restart tokens that match the current end of the changestream for the new source, RRTB_SRC_004. PWXPC then passes the restart tokens to PowerExchange tobegin change data extraction. Because the restart points for the other sources are earlier than the one justgenerated for RRTB_SRC_004, PWXPC does not pass any change data to this new source until the firstchange following its generated restart point is read.

Adding a New Source and Use DTLUAPPL to Create Restart Tokens - ExampleIn this example, a new source table, RRTB_SRC_004, is added to an existing CDC session containing threesources. The restart points for the existing sources are maintained. The DTLUAPPL utility is used to generate arestart token that represent the current end of the change stream.

1. To stop the workflow, select the Stop command in Workflow Monitor.

2. After the workflow stops, select the Recover Task command from Workflow Monitor to run a recovery session.

PWXPC writes the following messages in the session log:PWXPC_12060 [INFO] [CDCRestart]

===============================Session restart information:===============================Extraction Map Name Restart Token 1 Restart Token 2 Sourced1dsn9.rrtb0002_RRTB_SRC_002 000000AD220F00000000000000AD220F0000000000000000 C1E4E2D34040000000AD0D9C00000000 GMD storaged1dsn9.rrtb0001_RRTB_SRC_001 000000AD220F00000000000000AD220F0000000000000000 C1E4E2D34040000000AD0D9C00000000 GMD storaged1dsn9.rrtb0003_RRTB_SRC_003 000000AD220F00000000000000AD220F0000000000000000 C1E4E2D34040000000AD0D9C00000000 GMD storage

PWXPC also writes the restart tokens in the restart token file specified in the CDC application connection.

3. Edit the mapping, session, and workflow to add the new source, RRTB_SRC_004.

4. Run DTLUAPPL with RSTTKN GENERATE to generate restart tokens for the current end of the changestream. Use the following DTLUAPPL control cards:

mod APPL dummy DSN7 rsttkn generate mod rsttkn rrtb004end appl dummyprint appl dummy

The PRINT command produces the following output:Registration name=<rrtb004.1> tag=<DB2DSN7rrtb0041> Sequence=<00000DBF240A0000000000000DBF240A00000000> Restart =<C1E4E2D3404000000DBF238200000000>

Add eight zeros to the end of the Sequence value to create the sequence value for the restart token file.

5. Edit the restart token file to add the new source and its tokens.

The updated file contains the following lines:<!-- existing sourcesd1dsn9.rrtb0001_RRTB_SRC_001=000000AD220F00000000000000AD220F0000000000000000d1dsn9.rrtb0001_RRTB_SRC_001=C1E4E2D34040000000AD0D9C00000000d1dsn9.rrtb0002_RRTB_SRC_002=000000AD220F00000000000000AD220F0000000000000000d1dsn9.rrtb0002_RRTB_SRC_002=C1E4E2D34040000000AD0D9C00000000d1dsn9.rrtb0003_RRTB_SRC_003=000000AD220F00000000000000AD220F0000000000000000d1dsn9.rrtb0003_RRTB_SRC_003=C1E4E2D34040000000AD0D9C00000000<!-- new source

Changing PowerCenter CDC Sessions 145

Page 158: Implement CDC

d1dsn9.rrtb0004_RRTB_SRC_004=00000DBF240A0000000000000DBF240A0000000000000000d1dsn9.rrtb0004_RRTB_SRC_004=C1E4E2D3404000000DBF238200000000

6. Cold start the session.

PWXPC passes these restart tokens to PowerExchange to begin change data extraction. Because the restartpoints for the other sources are earlier than the one just generated for RRTB_SRC_004, PWXPC does notpass any change data to this new source until the first change following the generated restart point is read.

Recovering PowerCenter CDC SessionsUse Workflow Manager, Workflow Monitor, or pmcmd to recover a workflow or task for a CDC session that fails.You can recover the entire workflow or a task in the workflow.

A CDC session can fail for the following reasons:

¨ Permanent errors, such as source or target data errors

¨ Transitory or environmental errors, such as infrastructure problems, server failures, and network availabilityissues

If you run a session with a resume recovery strategy and the session fails, do not edit the state information or themapping for the session before you restart the session.

If a session fails because of transitory or environmental errors, restart the session after you have corrected theerrors. When you warm start a CDC session, PWXPC automatically performs recovery, if required. Alternatively,you can recover a CDC session, and then restart the session.

If a CDC session fails because of permanent errors, such as SQL or other database errors, you must correct theerrors before restarting the CDC session. With some failures, you can correct the error and then restart the CDCsession. In other cases, you might need to rematerialize the target table from the source table before you startextracting and applying change data again. If you rematerialize the target table, you should provide restart tokensthat match the materialization point in the change stream, and then cold start the CDC session.

Restriction: If a CDC session requires recovery processing, you cannot override the restart tokens becausePWXPC does not read the restart token file.

Example of Session RecoveryIn this example, a CDC session with relational targets is aborted in the Workflow Monitor. Then, the Restart Taskcommand is issued from the Workflow Monitor to restart the CDC session.

When you warm start the session, PWXPC automatically performs a recovery, and writes the following message inthe session log:

PWXPC_12092 [INFO] [CDCRestart] Warm start requested. Targets will be resynchronized automatically if required

PWXPC then reads the restart tokens from the state tables or file and writes the message PWXPC_12060 in thesession log. The PWXPC_12060 message records the restart tokens for the session and its sources, as shown inthe following example:

PWXPC_12060 [INFO] [CDCRestart]

===============================Session restart information:===============================Extraction Map Name Restart Token 1 Restart Token 2 Sourced1dsn8.rrtb0004_RRTB_SRC_004 00000FCA65840000000000000D2E004A00000000FFFFFFFF C1E4E2D3404000000D21B1A500000000 GMD storaged1dsn8.rrtb0009_RRTB_SRC_009 00000FCA65840000000000000D2E004A00000000FFFFFFFF C1E4E2D3404000000D21B1A500000000 GMD storaged1dsn8.rrtb0005_RRTB_SRC_005 00000FCA65840000000000000D2E004A00000000FFFFFFFF C1E4E2D3404000000D21B1A500000000 GMD storaged1dsn8.rrtb0006_RRTB_SRC_006 00000FCA65840000000000000D2E004A00000000FFFFFFFF C1E4E2D3404000000D21B1A500000000 GMD storaged1dsn8.rrtb0008_RRTB_SRC_008 00000FCA65840000000000000D2E004A00000000FFFFFFFF C1E4E2D3404000000D21B1A500000000 GMD storaged1dsn8.rrtb0003_RRTB_SRC_003 00000FCA65840000000000000D2E004A00000000FFFFFFFF C1E4E2D3404000000D21B1A500000000 GMD storage

146 Chapter 9: Managing Change Data Extractions

Page 159: Implement CDC

d1dsn8.rrtb0002_RRTB_SRC_002 00000FCA65840000000000000D2E004A00000000FFFFFFFF C1E4E2D3404000000D21B1A500000000 GMD storaged1dsn8.rrtb0001_RRTB_SRC_001 00000FCA65840000000000000D2E004A00000000FFFFFFFF C1E4E2D3404000000D21B1A500000000 GMD storaged1dsn8.rrtb0007_RRTB_SRC_007 00000FCA65840000000000000D2E004A00000000FFFFFFFF C1E4E2D3404000000D21B1A500000000 GMD storage

If PWXPC detects that recovery is required, PWXPC writes the message PWXPC_12069 in the session log. Thismessage usually includes the restart tokens for both the begin-UOW and the end-UOW for the oldest uncommittedUOW that PWXPC re-reads during recovery. PWXPC usually stores end-UOW restart tokens in the state table orfile. However, if you specify a maximum rows threshold, PWXPC can commit change data and restart tokensbetween UOW boundaries. As a result, the restart tokens might not represent an end-UOW.

The following example PWXPC_12069 message include “from” restart tokens that are the same as thosedisplayed in the example PWXPC_12060 message:

PWXPC_12069 [INFO] [CDCRestart] Running in recovery mode. Reader will resend the the oldest uncommitted UOW to resync targets: from: Restart 1 [00000FCA65840000000000000D2E004A00000000FFFFFFFF] : Restart 2 [C1E4E2D3404000000D21B1A500000000] to: Restart 1 [00000FCA65840000000000000D300D8000000000FFFFFFFF] : Restart 2 [C1E4E2D3404000000D21B1A500000000].

Because this session specifies a maximum rows threshold, the restart token values in the Restart 2 fields in boththe “from” and “to” restart tokens is the begin-UOW value. The sequence token values in the Restart 1 fieldsrepresent the start and end change records in the UOW that is displayed in the Restart 2 field.

During recovery processing, PWXPC reads the change data records between the points defined by the two restarttoken values in the PWXPC_12069 message and then issues a commit for the data and the restart tokens. ThePowerCenter Integration Service writes the flushed change data to the target tables and writes the restart tokensto the state table. Then the session ends.

Recovering PowerCenter CDC Sessions 147

Page 160: Implement CDC

C H A P T E R 1 0

Monitoring and Tuning OptionsThis chapter includes the following topics:

¨ Monitoring Change Data Extractions, 148

¨ Tuning Change Data Extractions, 154

¨ CDC Offload and Multithreaded Processing, 159

Monitoring Change Data ExtractionsPowerExchange, PWXPC, and PowerCenter issue messages that you can use to monitor the progress of CDCsessions. PWXPC can also display progress and statistical information about CDC sessions in the PowerCenterWorkflow Monitor.

Monitoring CDC Sessions in PowerExchangeIn PowerExchange, you can use the following information to monitor the extraction of change data by CDCsessions:

¨ Read progress messages. You can request that PowerExchange write messages that indicate the number ofchange records read by a CDC session.

¨ Extraction statistics messages. When extraction sessions end, PowerExchange writes messages thatinclude statistical information about the change records processed.

¨ Multithreaded processing statistics messages. You can request that PowerExchange write statisticalinformation about CDC sessions that use multithreaded processing.

¨ LISTTASK command output. You can use the LISTTASK command to display active CDC sessions.

Read Progress MessagesYou can request that PowerExchange write messages that indicate read progress to the PowerExchange log file.If you select the Retrieve PWX log entries option on a PWX CDC application connection, PWXPC writes theprogress messages in the session log.

To direct PowerExchange to write read progress messages, include the following parameters in the DBMOVERconfiguration file:

¨ PRGIND. Specify Y to have PowerExchange write PWX-04587 messages that indicate the number of recordsread for a CDC session. Default is N.

¨ PRGINT. Specify the number of records that PowerExchange reads before writing the PWX-04587 messagesto the PowerExchange log file. Default is 250 records.

148

Page 161: Implement CDC

The PWX-04587 messages have the following format:

PWX-04587 int_server/workflow_name/session_name: Records read=num_records

Where:

¨ int_server is the name of the PowerCenter Integration Service.

¨ workflow_name is the name of the workflow that contains the CDC session.

¨ session_name is the name of the CDC session.

¨ num_records is the cumulative number of records read since the CDC session started.

For example, to direct PowerExchange to write read progress messages after 100 records, the DBMOVERconfiguration file contains the following parameters:

PRGIND=YPRGINT=100

When a CDC session that has a session name of s_cdc_DB2_SQL_stats runs, PowerExchange writes thefollowing messages to the PowerExchange log file:

PWX-04587 intserv/wf_cdc_mon_stats/s_cdc_DB2_SQL_stats: Records read=100PWX-04587 intserv/wf_cdc_mon_stats/s_cdc_DB2_SQL_stats: Records read=200PWX-04587 intserv/wf_cdc_mon_stats/s_cdc_DB2_SQL_stats: Records read=300

PowerExchange continues to write PWX-04587 messages for this CDC session until the session ends. In thePowerExchange log file, each of these messages has a date and timestamp. You can use this information todetermine the speed with which PowerExchange processes change data from the change stream.

Extraction Statistics MessagesWhen a CDC session ends, PowerExchange writes the following messages that contain statistical informationabout the session:

¨ PWX-04578. PowerExchange writes this message for each source in the CDC session. This message includesthe number of insert, update, delete, commit, and total records read for the source.

¨ PWX-04588. PowerExchange writes this message for the entire CDC session. This message includes the totalnumber of records read for that CDC session.

Important: The statistical information in the PowerExchange messages represents the change data thatPowerExchange read for a CDC session. This information might not reflect the data that was applied to thetargets. For statistical information about the change data applied to the target, review the session log.

Multithreaded Processing StatisticsIf you use CDC offload processing, you can also use multithreaded processing to attempt to increase throughputon the PowerCenter Integration Service machine where the offloaded processing runs.

To monitor the effectiveness of multithreaded processing, specify the following parameter in the DBMOVERconfiguration file on the PowerCenter Integration Service machine:

SHOW_THREAD_PERF=number_records

Number of change records that PowerExchange reads during a statistics reporting interval before writing thestatistics messages PWX-31524 through PWX-31259 to the PowerExchange log file. If you select theRetrieve PWX log entries option on the connection in the CDC session, PWXPC writes these messages inthe session log.

You can use the information in the messages to tune multithreaded processing. For PowerExchange to writestatistics messages for threads, you must specify 1 or above for Worker Threads on the connection.Otherwise, PowerExchange does not use multithreaded processing or produce statistics messages.

Valid values are from 10000 through 50000000.

Monitoring Change Data Extractions 149

Page 162: Implement CDC

The messages that PowerExchange writes during each statistics interval contain the following information:

¨ PWX-31255. Cycle time, which is the total time that PowerExchange on the PowerCenter Integration Servicemachine spent processing the change data before passing it to PWXPC. This message includes the totalpercentage of time and average, minimum, and maximum times in microseconds.

¨ PWX-31256. I/O time, which is the time that PowerExchange on the PowerCenter Integration Service machinespent reading change data from the PowerExchange Listener on the source system. This message includes theI/O percentage of the total time and average, minimum, and maximum times in microseconds.

¨ PWX-31257. Parsing time, which is the time that PowerExchange on the PowerCenter Integration Servicemachine spent in column-level processing for the change records on all threads. This message includes theparsing percentage of the total time and average, minimum, and maximum times in microseconds.

¨ PWX-31258. External time, which is the time that PowerExchange on the PowerCenter Integration Servicemachine spent combining the change records from all threads back into a single UOW to pass to PWXPC andfor PWXPC to flush the data to PowerCenter. This message includes the external percentage of the total timeand average, minimum, and maximum times in microseconds.

¨ PWX-31259. Delay time, which is the time that the PowerExchange on the PowerCenter Integration Servicemachine waited to receive new change records to process from the PowerExchange Listener on the sourcesystem. This message includes the delay percentage of the total time and average, minimum, and maximumtimes in microseconds.

If the parsing and external processing times are higher than the I/O time, you might improve throughput byincreasing the number of threads for the CDC session.

For the following example, SHOW_THREAD_PERF=10000 is specified in the DBMOVER configuration file.PowerExchange writes the following sample messages after 10,000 change records have been read and the nextUOW boundary is reached:

PWX-31254 PowerExchange threading stats for last 10000 rows. Cycle (array) size is 25 rows. 0 out of array occured.PWX-31255 Cycle time: 100% (avg: 5709 min: 4741 max: 7996 usecs)PWX-31256 IO time: 4% (avg: 235 min: 51 max: 1021 usecs)PWX-31257 Parse time: 79% (avg: 4551 min: 4102 max: 5495 usecs)PWX-31258 Extern time: 20% (avg: 1145 min: 618 max: 3287 usecs)PWX-31259 Delay time: 0% (avg: 7 min: 4 max: 165 usecs)PWX-31254 PowerExchange threading stats for last 100000 rows. Cycle (array) size is 25 rows. 0 out of array occured.PWX-31255 Cycle time: 99% (avg: 5706 min: 4735 max: 7790 usecs)PWX-31256 IO time: 4% (avg: 234 min: 51 max: 950 usecs)PWX-31257 Parse time: 79% (avg: 4549 min: 4108 max: 5425 usecs)PWX-31258 Extern time: 20% (avg: 1144 min: 616 max: 3242 usecs)PWX-31259 Delay time: 0% (avg: 7 min: 4 max: 115 usecs)

DISPLAY ACTIVE or LISTTASK Command OutputIssue the PowerExchange Listener DISPLAY ACTIVE command to display CDC sessions that are active in thePowerExchange Listener.

You can issue the command from the command line. On Windows, if you want to issue the command from thePowerExchange Navigator, enter the equivalent LISTTASK command in the Database Row Test dialog box.Alternatively, issue the pwxcmd listtask command from a Linux, UNIX, or Windows system to a PowerExchangeListener running on the local system or a remote system.

The command output includes the PwrCntrSess field. This field provides the PowerCenter session name in thefollowing format:

integration_server_name/workflow_name/session_name

For example, if two active CDC sessions are active, the command produces the following output:

PWX-00711 Active tasks:PWX-00712 TaskId=1, Partner=10.10.10.01, Port=2480, PwrCntrSess=intserv1/workflow1/cdc_sess1,Application=appl_name1, Status=Active, AM=CAPXRT, Mode=Read, Process=, SessId=

150 Chapter 10: Monitoring and Tuning Options

Page 163: Implement CDC

PWX-00712 TaskId=2, Partner=10.10.10.02, Port=2480, PwrCntrSess=intserv2/workflow2/cdc_sess2,Application=appl_name2, Status=Active, AM=CAPXRT, Mode=Read, Process=, SessId=PWX-00713 2 active tasksPWX-00709 0 Dormant TCBs

Monitoring CDC Sessions in PowerCenterIn PowerCenter, you can use the following information to monitor the progress of CDC sessions:

¨ Session log messages. PWXPC and PowerCenter write messages to the session log. You can use thesemessages to monitor the progress of a CDC session.

¨ Performance details in Workflow Monitor. If you configure a CDC session to report performance details, youcan monitor the progress of the session in the Workflow Monitor.

Session Log MessagesYou can use messages that PWXPC and PowerCenter write to the session log to monitor the progress of CDCsessions.

When PWXPC flushes change data to commit the data to the targets, it writes one of the following messages tothe session log, displaying the reason for the flush:

PWXPC_10081 [INFO] [CDCDispatcher] raising real-time flush with restart tokens [restart1], [restart2] because the UOW Count [count] is reached

PWXPC_10082 [INFO] [CDCDispatcher] raising real-time flush with restart tokens [restart1], [restart2] because Real-time Flush Latency [latency] is reached

PWXPC_12128 [INFO] [CDCDispatcher] raising real-time flush with restart tokens [restart1], [restart2] because the Maximum Rows Per commit [count] is reached

You can use the restart tokens in the PWXPC flush messages to monitor the processing of the change data. Foreach PWXPC flush message, PowerCenter writes a WRT_8160 message after committing change data to thetargets. This messages displays the source-based commit statistics.

RELATED TOPICS:¨ “Using Connection Options to Tune CDC Sessions ” on page 157

¨ “Tuning Commit Processing ” on page 159

¨ “Viewing Performance Details in the Workflow Monitor” on page 151

Viewing Performance Details in the Workflow MonitorPerformance details include counters that you can use to assess the efficiency of a CDC session and change dataextraction processing. The details include a single source qualifier that reflects group source processing for thechange data.

From Workflow Monitor, you can view the details for the current CDC session while it is executing. If you noticedegradation of CDC session performance, you can use the performance details to determine the bottleneck.PWXPC does not store performance details in the repository so you cannot view previous performance details forCDC sessions.

Note: To view performance details for a CDC session that has ended, you must select performance details whilethe session is running. Otherwise, PWXPC does not display performance details.

To enable the collection of performance details, select Collect performance data on the Properties tab of theCDC session. During the execution of the CDC session, PWXPC refreshes the statistical information every 10seconds. If you have selected a resume recovery strategy in the CDC session, PWXPC displays data for allperformance counter fields.

Monitoring Change Data Extractions 151

Page 164: Implement CDC

To view performance details in the Workflow Monitor:

1. In Workflow Monitor, right-click a session and select Get Run Properties.

2. In the Properties window, click the Performance area.

The Performance Counter column displays a data source qualifier from the CDC session. TheCounter Value column displays the PowerCenter node name.

3. To view performance details, select the data source qualifier. The following table describes the fields thatPowerCenter displays in the Performance Counter column in the Performance area:

Performance Counter Field Description

1 PowerExchange CDC Reader Status: Current status of the PWXPC reader, as indicated by one of thefollowing values:- No Data To Process. In the last read, PowerExchange did not

pass data to PWXPC.- Restart Advance. PowerExchange passed restart tokens to

PWXPC but did not pass change data.- Processing Data. PowerExchange passed change data and

restart tokens to PWXPC for processing.

1.1 Time Last Data Row Read Time, in milliseconds, when PWXPC last received data fromPowerExchange.

1.2 Data Rows In Current Interval Number of change records received from PowerExchange duringthe current statistics interval.

1.3 End Packets In Current Interval Number of UOWs received from PowerExchange during the currentstatistics interval.

1.4 Data Read Rate In Current Interval (rows/sec) Number of change records read per second by PowerExchangeduring the current statistics interval.

The value varies, depending on the quantity of change data beingprocessed:- If PowerExchange is reading large amounts of change data from

the change stream, this value is usually large and reflects themaximum PowerExchange throughput.

- If PowerExchange is waiting for change data at the end of thechange stream, this value is small.

The following factors can increase this value:- Large network bandwidth- CDC offload processing- Multithreaded processing

1.5 Mean Data Read Rate (rows/sec) Mean number of change records that PowerExchange read persecond, from the start of the CDC session.

1.6 Max Data Read Rate (rows/sec) Maximum number of change records that PowerExchange read persecond during a statistics interval, from the start of the CDCsession.

152 Chapter 10: Monitoring and Tuning Options

Page 165: Implement CDC

Performance Counter Field Description

2 PowerCenter Processing Status: Overall status of the CDC session, as indicated by one of thefollowing values:- Idle. Waiting for change data.- Processing Data. Data is being processed.- Recovery Disabled. If a resume recovery strategy is not

selected, the PWXPC CDC reader cannot obtain PowerCenterstatus information.

2.1 Time Of Last Commit Timestamp of the last commit to a target.

2.2 Rows Processed To Commit In Current Interval Number of change records flushed by the PWXPC reader duringthe current statistics interval. This count includes the changerecords in all committed UOWs. Some of these UOWs might havestarted before the current statistics interval began.

2.3 Commit Rate In Current Interval (rows/sec) Processing rate, in number of change records per second, for thechange records for the UOW that was last committed during thecurrent statistics interval. This rate includes reading the UOW fromPowerExchange and committing the change data to the targets.

The following factors can influence this rate:- Number of available DTM buffers- Responsiveness of the target- Number of transformations in the pipeline

2.4 Mean Commit Rate (rows/sec) Mean number of change records per second for the rate displayedin 2.3 Commit Rate In The Current Interval.This value differs from the 2.6 Mean Throughput Rate in that ittakes into account only the time when the session is activelyprocessing data and does not reflect processing overlap inPowerCenter.

2.5 Max Commit Rate (rows/sec) Maximum number of change records per second for the commitrate displayed in 2.3 Commit Rate In The Current Interval,recorded from the start of the CDC session.

2.6 Mean Throughput (rows/sec) Mean rate of processing for the CDC session.

2.7 Max Throughput (rows/sec) Maximum throughput for the CDC session.

2.8 Commits In Current Interval Number of commits processed to completion by the target duringthe current statistics interval.

2.9 Commits Pending Number of commits that were issued by the PWXPC reader but thathave not yet reached the targets. A large value might indicateproblems with target responsiveness.

3 Capture Timestamps

3.1 Timestamp On Last End Packet Read The capture timestamp, DTL__CAPXTIMESTAMP, from the lastUOW read for a source in the CDC session.

3.2 Timestamp On Last Target Commit The capture timestamp, DTL__CAPXTIMESTAMP, from the lastUOW committed to the target.

Monitoring Change Data Extractions 153

Page 166: Implement CDC

Performance Counter Field Description

4 Totals

4.1 Elapsed Time Total elapsed time for the CDC session.

4.2 Rows Read Total number of change records read from PowerExchange.

4.3 End Packets Read Total number of UOWs read.

4.4 Time in PowerExchange Processing Total time of PowerExchange processing for the CDC session.

4.5 Rows Processed Total number of change records processed through PowerCenterand committed to the targets.

4.6 Commits to Target Total number of flushes that the PWXPC reader issued and thatwere committed to the targets.

4.7 TS on Last Commit minus TS at Commit (2.1-3.2)

Value that results from subtracting 3.2 Timestamp On Last TargetCommit value from the 2.1 Time Of Last Commit value. If thisresult is negative, the value is enclosed in parentheses.

Tuning Change Data ExtractionsYou can use PowerExchange configuration parameters and connection options in PowerCenter to tune CDCsessions. In addition, you can use CDC offload and multithreaded processing to improve throughput by movingprocessing for change data to a different machine.

Use the following methods to tune CDC sessions:

¨ Parameters and options. To tune sessions, you can use specify parameters and options in the DBMOVERconfiguration file and on PWX CDC connections.

¨ CDC offload processing. You can use CDC offload processing to distribute PowerExchange column-levelprocessing for change data to the PowerCenter Integration Service machine that runs the CDC session. Bydistributing processing, you can reduce PowerExchange processing overhead on the system on which thechange data resides. You can also use CDC offload processing with the PowerExchange Logger for Linux,UNIX, and Windows to capture change data on a different machine. CDC sessions can then extract changedata from the PowerExchange Logger log files on that machine, rather than from the change stream on theoriginal source machine.

¨ Multithreaded processing. If you use CDC offload processing, you can optionally use multithreadedprocessing to attempt to increase throughput. Multithreaded processing uses multiple threads on thePowerCenter Integration Service machine to perform the offloaded PowerExchange processing.

¨ Asynchronous network communication. PowerExchange uses asynchronous communication for most sendand receive operations, overlapping network processing with data processing. This feature is enabledautomatically and usually requires no tuning, but you can tune the feature if you need to.

154 Chapter 10: Monitoring and Tuning Options

Page 167: Implement CDC

Using PowerExchange Parameters to Tune CDC SessionsTo tune your PowerExchange installation, you can customize the following parameters in the DBMOVERconfiguration file:

APPBUFSIZE=size

Defines the maximum size, in bytes, of the buffer that PowerExchange uses to read or write data. This databuffer can exist on a source or target system.

If you are applying change data from the change stream on the source system to a remote target system,PowerExchange usually writes change data to its application data buffer on the source system until the bufferis full. PowerExchange then sends the data to a sending TCP/IP buffer on the source system. TCP/IPtransports the change data to a receiving TCP/IP buffer on the target system. PowerExchange on the targetsystem reads the change data from the TCP/IP buffer into its application data buffer. PWXPC then reads thechange data and passes it to PowerCenter. PowerCenter processes the data and applies it to the targets.

Enter an APPBUFSIZE value that is greater than the maximum size of any single data row to be sent.

Valid values are from 34816 through 1048576. Default is 128000.

If the target system is remote, enter the same APPBUFSIZE value in the DBMOVER configuration files on thesource and target systems. Also, verify that the APPBUFSIZE value matches the TCPIPBUFSIZE value in thesame DBMOVER configuration file. The TCPIPBUFSIZE parameter specifies the maximum size of the TCP/IPbuffer.

If the APPBUFSIZE value is not optimal, PowerExchange writes the PWX-01295 message in thePowerExchange log file on the source system. This message includes a recommended minimum value.

COMPRESS={Y|N}

Defines whether PowerExchange uses its proprietary compression algorithm to compress data before it issent to TCP/IP for transmission to the remote platform.

Default is Y.

PowerExchange uses the COMPRESS setting in the DBMOVER configuration file on the remote system thatcontacts the PowerExchange Listener. On the PWX CDC application connection, you can override thecompression setting in the DBMOVER configuration file. If you enable compression, the CPU consumption ofthe PowerExchange Listener on the source system might increase.

To avoid unnecessary CPU consumption, set COMPRESS to N in the PowerExchange DBMOVERconfiguration file on the PowerCenter Integration Service machine.

CAPI_CONNECTION=( ...,MEMCACHE=cache_value, ...))

Amount of memory cache, in kilobytes, that is allocated to reconstruct complete UOWs. You can specify theMEMCACHE parameter on the following CAPI_CONNECTION statement types:

¨ MSQL

¨ UDB

¨ UOWC

PowerExchange keeps all changes in each UOW in cache until it processes the end-UOW record, which isthe commit record. If the MEMCACHE value is too small to hold all of the changes in a UOW in cache, thechanges spill to a disk file.

Valid values are from 1 through 519720. Default is 1024.

You might need to increase this value if you have large UOWs. PowerExchange processes a UOW moreefficiently if all of the changes are cached in memory. If a UOW might be larger than 1024 KB in size,increase this parameter. For most environments, a value of 10240 (10 MBs) is a good starting value.

Tuning Change Data Extractions 155

Page 168: Implement CDC

Tip: PowerExchange uses the MEMCACHE value to allocate cache memory to each connection for changedata extractions. To prevent excessive memory use by a PowerExchange Listener, use a reasonable valuefor MEMCACHE based on your extraction processing needs and the number of CDC sessions that runconcurrently.

CAPI_CONNECTION=( ...,RSTRADV=rstr_secs, ...))

Number of seconds that PowerExchange waits before advancing the restart tokens for a data source byreturning an empty unit of work (UOW). You can specify the RSTRADV parameter on the followingCAPI_CONNECTION statement types:

¨ MSQL

¨ UDB

¨ UOWC

Empty UOWs contain restart tokens only, without any data. PowerExchange uses the restart tokens todetermine the start point in the change stream for change data extractions. The wait period for the RSTRADVvalue starts after a UOW for a data source is processed. PowerExchange resets the wait period after it readsthe next UOW for that source or when it returns an empty UOW because the wait period expires.

For sources with low change activity, you can use the RSTSADV parameter to periodically advance to therestart tokens for those sources. Advancing the restart tokens speeds up restart processing for CDC sessionsby minimizing the amount of change data that must be reprocessed.

For example, if you specify RSTRADV=5 and changes are not made to the data source for five seconds,PowerExchange returns an empty UOW to advance the restart point for the data source.

Valid values are from 0 through 86400. If you do not specify RSTRADV, PowerExchange does not returnempty UOWs to advance the restart point.

Consider the following issues when you set RSTRADV on CAPI_CONNECTION statements in thePowerExchange DBMOVER configuration file:

¨ A value of 0 adversely affects performance. PowerExchange returns an empty UOW with restart tokens toPWXPC after each UOW is processed.

¨ A low value can cause the UOW Count option on the PWX CDC connection to match more quickly thanexpected. When the UOW counter matches, PWXPC flushes its data buffer and commits restart tokens tothe targets. Excessive flush activity can adversely affect performance on the PowerCenter IntegrationService machine and target databases.

LISTENER=(node_name,TCPIP,port,send_bufsize,receive_bufsize,send_msgsize,receive_msgsize, ...)

Defines a port on which a PowerExchange Listener listens for local or remote connections. The positionalparameters the send_bufsize, receive_bufsize, send_msgsize, and receive_msgsize define the send andreceive buffer and message sizes. If you do not specify values for these parameters, PowerExchange usesthe operating system defaults, which vary based on operating system.

To maximize throughput, consider increasing the send and receive buffer and message sizes on theLISTENER statement on the source system. Contact your network administration to determine the best valuesto use on your system.

Note: Do not specify values for the send and receive buffer and message sizes that exceed the TCPmaximum receive buffer size.

NODE=(node_name,TCPIP,hostname,port,send_bufsize,receive_bufsize,send_msgsize,receive_msgsize, ...)

Defines a port the IP information that PowerExchange uses to communicate with a remote PowerExchangeListener. The positional parameters the send_bufsize, receive_bufsize, send_msgsize, and receive_msgsizedefine the send and receive buffer and message sizes. If you do not specify values for these parameters,PowerExchange uses the operating system defaults, which vary based on operating system.

156 Chapter 10: Monitoring and Tuning Options

Page 169: Implement CDC

To maximize throughput, consider increasing the send and receive buffer and message sizes on the NODEstatement on the target system. Contact your network administration to determine the best values to use onyour system.

Note: Do not specify values for the send and receive buffer and message sizes that exceed the TCPmaximum receive buffer size.

TRACE=(trace_id,trace_level,99)

Defines PowerExchange diagnostic traces that Informatica Global Customer Support uses to solve problemswith PowerExchange code.

TRACE statements can severely impact PowerExchange performance. You should use them only at thedirection of Informatica Global Customer Support. To enhance performance, remove or comment out allTRACE statements in the DBMOVER configuration files on all systems.

RELATED TOPICS:¨ “Using Connection Options to Tune CDC Sessions ” on page 157

Using Connection Options to Tune CDC SessionsIn PowerCenter, you can customize options on the PWX CDC connections to tune CDC sessions. The followingtable describes the connection options that you can use to tune CDC sessions:

Connection Option Description Tuning Suggestion

Compression Select this option to compress source dataduring the PowerCenter session.Default is disabled.

Do not use compression.

Encryption Type The type of data encryption thatPowerExchange uses.Default is None.

Do not use encryption.

Image Type Indicates whether PWXPC extracts afterimages (AI) only or both before and afterimages (BA) for change data.If you use the PowerExchange Logger forLinux, UNIX, and Windows and specifiedCAPT_IMAGE=BA in the pwxccl.cfgconfiguration file, you can set this option to AIor BA. If you specify AI, before images of thedata can still be embedded in Update rows ifyou add DTL_BI columns to the extractionmap. With DTL_BI columns, you canmanipulate before-image data in the mappings.Default is BA.

Set to AI.

UOW Count The number of UOWs that PWXPC reads fromthe source before it flushes the data buffer tocommit the change data to the targets.Default is 1.

To improve efficiency on the PowerCenterIntegration Service machine and the targetdatabases, reduce commit processing.

Real-time FlushLatency in mill-seconds

The frequency, in milliseconds, with whichPWXPC flushes the data buffer to commit thechange data to the targets.Default is 0, which is equivalent to two seconds.

To improve efficiency on the PowerCenterIntegration Service machine and the targetdatabases, reduce commit processing.

Tuning Change Data Extractions 157

Page 170: Implement CDC

Connection Option Description Tuning Suggestion

PWX Latency inseconds

Select the maximum time, in seconds, thatPowerExchange on the source platform waitsfor more change data before flushing data toPWXPC on the PowerCenter IntegrationService platform.Default is 2.

Use the default value.

Maximum Rows Percommit

Maximum number of change records thatPWXPC reads from the source before itflushes the data buffer to commit the changedata to the targets.Default is 0, which means that PWXPC doesnot use maximum rows.

To improve efficiency on the PowerCenterIntegration Service machine and the targetdatabases, reduce commit processing.

Minimum Rows Percommit

Minimum number of change records thatPowerExchange reads from the change streambefore it passes any commit records toPWXPC.Default is 0, which means that PWXPC doesnot use minimum rows.

If your UOWs contain only a few changes, selecta larger value for this option to increase the sizeof the UOWs.

Offload Processing Select this option to request CDC offloadprocessing.Default is No.

For more information about offload processing,see “CDC Offload and Multithreaded Processing” on page 159.

Worker Threads If you select Offload Processing, you can alsoset this option to have PowerExchange usemultiple threads for the offloaded processingon the PowerCenter Integration Servicemachine. Enter the number of threads that youwant PowerExchange to use.Valid values are from 1 through 64.Default is 0, which means that PowerExchangedoes not use multithreaded processing.

For more information about offload processing,see “CDC Offload and Multithreaded Processing” on page 159.

Array Size If the Worker Threads value is greater thanzero, the size of the storage array, in numberof records, for the threads.Valid values are from 25 through 100000.Default is 25.

Use 25.Warning: If you specify a large value, have largerecords, or run many sessions that usemultithreaded processing, you might experiencememory shortages on the PowerCenterIntegration Service machine.

TCPIP Activity Timeout Activity timeout.If no data, other than heartbeat data, is sent orreceived during this time interval (in seconds),PowerExchange aborts the connection andindicates a timeout error.A value of -1 means that no activity timeout isset.

For most applications, use the default of -1,specifying no activity timeout. Instead,PowerExchange will use heartbeat processing todetect failed connections.

For more information about connection options, see PowerExchange Interfaces for PowerCenter.

RELATED TOPICS:¨ “Tuning Commit Processing ” on page 159

¨ “CDC Offload and Multithreaded Processing” on page 159

158 Chapter 10: Monitoring and Tuning Options

Page 171: Implement CDC

Tuning Commit ProcessingIf the PowerCenter session log for a CDC session contains groups of PWXPC flush messages followed by groupsof source-based commit messages from PowerCenter, the CDC session might be reading change data faster thanthe data can be processed and written to the targets. To resolve this issue, you can adjust the values that you setfor following commitment control options on the PWX CDC connection:

¨ UOW Count. If the session log contains mostly PWXPC_10081 flush messages, you might need to increasethe value for this option.

¨ Real-time Flush Latency in milli-seconds. If the session log contains mostly PWXPC_10082 flush messages,you might need to increase the value for this option.

¨ Maximum Rows Per commit. If the session log contains mostly PWXPC_12128 flush messages, you mightneed to increase the value for this option.

PWXPC might also flush change data too frequently because the PWX CDC connection in the CDC session usestoo many of the commitment control options. In this case, use a single option to control commit processing anddisable the unused options.

If your change data has many small UOWs, you can use the Minimum Rows Per commit option to create largerUOWs of more uniform size. PowerExchange and PWXPC can process a few UOWs of larger size more efficientlythan many small UOWs. By using the Minimum Rows Per commit option to increase the size of UOWs, you canimprove CDC processing efficiency.

The following additional factors can also affect the efficiency with which change data is applied to the targets:

¨ Buffer Memory. The DTM Buffer Size and Default Buffer Block Size values can impact the performance ofthe CDC session. If you have enabled the collection of performance details in the CDC session, review thedifference between performance counters 4.5 Time in PowerExchange Processing and 4.6 Elapsed Time. Ifthe elapsed time is much larger that the PowerExchange processing time, buffer memory constraints mightexist.

¨ Target database. The performance of the target database can impact the performance of the CDC session.Contact your database administrator to ensure that access to the database is optimized.

CDC Offload and Multithreaded ProcessingYou can use CDC offload processing with the following types of change data extractions:

¨ CDC sessions that use real-time extraction mode

¨ PowerExchange Logger for Linux, UNIX, and Windows

When you use CDC offload processing with real-time extractions, the change data remains on the source systemand PowerExchange moves the column-level processing to the PowerCenter Integration Service machine thatruns the CDC session. For MVS, DB2 for i5/OS, and Oracle sources, PowerExchange also moves the UOWCleanser processing to the PowerCenter Integration Service machine.

When you use CDC offload processing with the PowerExchange Logger for Linux, UNIX, and Windows,PowerExchange does the following processing:

¨ Reads the change data from the source system and stores it in PowerExchange Logger log files

¨ For MVS, DB2 for i5/OS, and Oracle sources, moves the UOW Cleanser processing to the machine on whichthe PowerExchange Logger is running

The PowerExchange Logger stores the change data in log files on the Linux, UNIX, or Windows machine. CDCsessions can then use continuous extraction mode to extract the change data from the PowerExchange Logger logfiles instead of from the source system.

CDC Offload and Multithreaded Processing 159

Page 172: Implement CDC

You can use multithreaded processing for CDC sessions that select offload processing. By default,PowerExchange uses a single thread to process change data on the PowerCenter Integration Service machine.When you select multithreaded processing, PowerExchange uses multiple threads to process the change recordsin each UOW.

Planning for CDC Offload and Multithreaded ProcessingBefore you configure CDC offload and multithreaded processing, review the following considerations,requirements, and restrictions.

Restrictions and Requirements for CDC Offload ProcessingIf you use CDC offload processing, certain restrictions and requirements apply.

Consider the following restrictions and requirements before implementing offload processing:

¨ You must configure CAPI_CONNECTION statements for the data source in the DBMOVER configuration file onthe remote system. For real-time extraction mode, configure the CAPI_CONNECTION statements in thedbmover.cfg configuration file on the PowerCenter Integration Service machine. For the PowerExchangeLogger for Linux, UNIX, and Windows, configure the CAPI_CONNECTION statements in the dbmover.cfgconfiguration file that the PowerExchange Logger uses.

¨ If you set the optional Idle Time attribute on the PWXPC connection, you must specify -1 or 0 as the attributevalue. If you enter a value greater than 0, PWXPC uses 0.

¨ If you use batch extraction mode, Informatica recommends that you set the Idle Time connection attribute to 0so that the workflow session ends when the end-of-log (EOL) is reached. With this configuration, you can leavethe PowerExchange Logger running continuously.

¨ PowerExchange does not invoke MVS RACF security authorization for change data extraction. Specifically,PowerExchange does not validate any CAPX.CND profiles for extracting change data when a workflow runs.However, PowerExchange does validate CAPX.REG profiles during PowerExchange Logger processing.

¨ PowerExchange does not support CDC offload processing for capture registrations that have been createdfrom data maps that use any of the following options:

- User access methods

- User-defined fields that invoke programs by using the CALLPROG function

- Record-level exits

¨ To capture change data to PowerExchange Logger for Linux, UNIX, and Windows log files, you must configurecapture registrations for partial condense processing. When you define the capture registration in thePowerExchange Navigator, select Part in the Condense list. If any of the capture registrations for z/OS or i5/OS data sources specify Full for the Condense option, the PowerExchange Logger ignores them.

¨ For z/OS data sources, if you use offload processing with the PowerExchange Logger for Linux, UNIX, andWindows and a group definition file, do not include the SCHEMA statement in the group definition file.PowerExchange does not support SCHEMA statements for z/OS data sources.

¨ Each PowerExchange Logger for Linux, UNIX, and Windows process must read all of the capture registrationsthat it uses from a single CCT file on the remote system. Also, each PowerExchange Logger process muststore the names of its log files in a unique CDCT file on the local system.

160 Chapter 10: Monitoring and Tuning Options

Page 173: Implement CDC

Considerations for Multithreaded ProcessingIn specific situations, multithreaded processing might improve performance for a CDC session. Before youconfigure multithreaded processing options, review the following considerations:

¨ Use multithreaded processing when the PWX reader thread of a CDC session uses 100% of a single CPU on amulti-CPU server on the PowerCenter Integration Service platform while processing change data. When asingle CPU is consumed, spreading the PowerExchange processing across multiple threads improvesthroughput. Otherwise, additional threads do not improve throughput.

¨ If the network processing between the source and PowerCenter Integration Service machines is slow, tryspecifying 1 for the Worker Threads option to help improve throughput. When you specify one or more workerthreads, PowerExchange overlaps network processing with the processing of the change data on thePowerCenter Integration Service machine.

¨ For optimal performance, the value for the Worker Threads option should not exceed the number of installedor available processors on the PowerCenter Integration Service machine.

Enabling Offload and Multithreaded Processing for CDC SessionsTo use CDC offload processing and multithreaded processing, you must configure connection options in the CDCsession and CAPI_CONNECTION statements in the PowerExchange DBMOVER configuration file.

To enable CDC offload and multithreaded processing for CDC sessions:

1. Configure the following options on the PWX CDC Real Time application connection for the CDC session:

Connection Option Description

Location Specifies the node name of the system on which the change data resides. This nodename must be the name of a NODE statement in the dbmover.cfg configuration file on thePowerCenter Integration Service machine.

Offload Processing Specifies whether to use CDC offload processing to move PowerExchange processing forthe change data from the source system to the PowerCenter Integration Service machine.

Select one of the following values:- No- Yes- Auto. PowerExchange determines whether to use offload processing.Default is No.

Worker Threads When you select CDC offload processing, specifies the number of threads thatPowerExchange uses on the PowerCenter Integration Service machine to process changedata. You must also enter a value for the Array Size.

Default is 0.

Array Size If the Worker Threads value is greater than zero, specifies the size of the storage arrayfor each thread, in numbers of records.

Default is 25.

CAPI Connection Name Specifies the name of the source CAPI_CONNECTION statement in the dbmover.cfg onthe PowerCenter Integration Service machine.

2. Copy the CAPI_CONNECTION statements from the DBMOVER configuration file on the source system to thedbmover.cfg configuration file on the PowerCenter Integration Service machine. For MVS sources, remove allMVS-specific parameters from the UOWC CAPI_CONNECTION statement.

CDC Offload and Multithreaded Processing 161

Page 174: Implement CDC

Use the following table to select the correct CAPI_CONNECTION statement types to configure, based onsource type:

CDC Source Type CAPI_CONNECTION Statements

DB2 for i5/OS AS4J and UOWC

DB2 for Linux, UNIX, and Windows UDB

Microsoft SQL Server MSQL

MVS sources LRAP and UOWC

Oracle ORCL and UOWC

Configuring PowerExchange to Capture Change Data on a RemoteSystem

You can use CDC offload processing with the PowerExchange Logger for Linux, UNIX, and Windows to capturechange data from source systems other than the system where the PowerExchange Logger runs. With CDCoffload processing, a PowerExchange Logger for Linux, UNIX, and Windows can capture change data from i5/OSand MVS systems as well as from other Linux, UNIX, or Windows systems.

CDC sessions use continuous extraction mode to extract the change data from the PowerExchange Logger logfiles instead of from the source system.

You must first install PowerExchange on the remote Linux, UNIX, or Windows system.

Before you start a PowerExchange Logger for Linux, UNIX, and Windows process on a remote system, configurethe pwxccl.cfg and the dbmover.cfg configuration files on that system. When you use CDC offload processing,each PowerExchange Logger must have unique pwxccl.cfg and dbmover.cfg configuration files.

To extract the change data from the PowerExchange Logger on the remote system, you must also configure andstart a PowerExchange Listener on that system. The dbmover.cfg file that the PowerExchange Listener uses mustspecify the same CAPT_PATH value as the dbmover.cfg file that the PowerExchange Logger uses. Alternatively,you can use the same dbmover.cfg file for the PowerExchange Logger and the PowerExchange Listener.

The following steps describe how to configure a PowerExchange Logger and PowerExchange Listener to offloadchange data from source systems and capture that data to PowerExchange Logger log files on Linux, UNIX, orWindows.

RELATED TOPICS:¨ “Extracting Change Data Captured on a Remote System” on page 168

Configuring pwxccl.cfgConfigure the pwxccl.cfg configuration file for the PowerExchange Logger on the remote system where thePowerExchange Logger will run.

PowerExchange provides a sample pwxccl.cfg file in the PowerExchange installation directory, which you cancopy and then edit. For CDC offload processing, customize the following parameters:

162 Chapter 10: Monitoring and Tuning Options

Page 175: Implement CDC

CAPTURE_NODE

Specifies the node name of the system on which the change data was originally captured.

This node name must match the node name in a NODE statement in the dbmover.cfg configuration file thatthe PowerExchange Logger uses.

CAPTURE_NODE_EPWD

Specifies an encrypted password for the CAPTURE_NODE_UID user ID.

If you specify CAPTURE_NODE_UID, you must specify a password for that user ID by using eitherCAPTURE_NODE_EPWD or CAPTURE_NODE_PWD. If you specify CAPTURE_NODE_EPWD, do not alsospecify CAPTURE_NODE_PWD.

Tip: You can create an encrypted password in the PowerExchange Navigator by selecting File > EncryptPassword.

CAPTURE_NODE_PWD

Specifies a clear text password for the CAPTURE_NODE_UID user ID.

If you specify CAPTURE_NODE_UID, you must specify a password for that user ID by using eitherCAPTURE_NODE_EPWD or CAPTURE_NODE_PWD. If you specify CAPTURE_NODE_PWD, do not alsospecify CAPTURE_NODE_EPWD.

CAPTURE_NODE_UID

Specifies a user ID that permits PowerExchange to read capture registrations and change data on the remotenode that is specified in the CAPTURE_NODE parameter. Whether this parameter is required depends on theoperating system of the remote node and the SECURITY setting in the DBMOVER configuration file for thePowerExchange Listener on that node.

If the CAPTURE_NODE is an MVS or i5/OS system with a SECURITY setting of 1 or 2, you must specify avalid operating system user ID. If the SECURITY setting is 2, PowerExchange uses the specified user ID tocontrol access to capture registrations and change data. However, if the SECURITY setting is 1,PowerExchange uses the user ID under which the PowerExchange Listener job runs.

If the CAPTURE_NODE is an MVS or i5/OS system with a SECURITY setting of 0, do not specify thisparameter. PowerExchange uses the user ID under which the PowerExchange Listener job runs to controlaccess to capture registrations and change data.

If the CAPTURE_NODE is a Linux, UNIX, or Windows system, specify a user ID that is valid for the datasource type:

¨ For a DB2 for Linux, UNIX, or Windows source, enter a valid operating system user ID that has DB2DBADM or SYSADM authority.

¨ For an Oracle source, enter a database user ID that permits access to Oracle redo logs and OracleLogMiner.

¨ For a SQL Server instance that uses SQL Server Authentication, enter a database user ID that permitsaccess to the SQL Server distribution database. For a SQL Server instance that uses WindowsAuthentication, PowerExchange uses the user ID under which the PowerExchange Listener was started. Inthis case, do not specify this parameter unless you want to specify another user.

CHKPT_BASENAME

Specifies an existing path and base name file name to use for generating the PowerExchange Loggercheckpoint files.

CONDENSENAME

Optional. Specifies a name for the command-handling service for a PowerExchange Logger for Linux, UNIX,and Windows process to which you issue pwxcmd commands.

CDC Offload and Multithreaded Processing 163

Page 176: Implement CDC

This service name must match the service name in the associated SVCNODE statement in the DBMOVERconfiguration file.

CONN_OVR

Specifies the name of the CAPI_CONNECTION statement in the dbmover.cfg file that the PowerExchangeLogger uses. This CAPI_CONNECTION statement defines the connection to the change stream for the datasource type.

For data sources that include UOW Cleanser (UOWC) CAPI_CONNECTION statements, specify the name ofthis statement. For all other data sources, specify the CAPI_CONNECTION name for the data source type.

DB_TYPE

Specifies the data source type.

Use the following table to select the correct DB_TYPE to configure, based on source type:

CDC Source Type DB_TYPE Value

Adabas ADA

Datacom DCM

DB2 for i5/OS AS4

DB2 for Linux, UNIX, and Windows UDB

DB2 for z/OS DB2

IDMS log-based IDL

IMS IMS

Microsoft SQL Server MSS

Oracle ORA

VSAM VSM

DBID

Specifies the source collection identifier that is defined in the registration group. The PowerExchangeNavigator displays this value in the Resource Inspector when you open the registration group. When usedwith DB_TYPE, it defines selection criteria for capture registrations in the CCT file.

164 Chapter 10: Monitoring and Tuning Options

Page 177: Implement CDC

Use the following table to select the correct DBID value, based on source type:

CDC Source Type DBID Value

Adabas The Instance name that is displayed for the registration group in thePowerExchange Navigator.

Datacom One of the following values:- The MUF Name value that is displayed for the registration group in

the PowerExchange Navigator.- For Datacom synchronous CDC, the MUF parameter value in the

DTLINPUT data set specified in the MUF JCL.- For Datacom table-based CDC, the REG_MUF parameter value in the

ECCRDCMP member of the RUNLIB library.

DB2 for i5/OS One of the following values:- The Instance name that is displayed for the registration group in the

PowerExchange Navigator.- The INST parameter value in the AS4J CAPI_CONNECTION

statement in the DBMOVER member of the CFG file.

DB2 for Linux, UNIX, and Windows The Database name that is displayed for the registration group in thePowerExchange Navigator.

DB2 for z/OS One of the following values:- The Instance name that is displayed for the registration group in the

PowerExchange Navigator.- The RN parameter value from the DB2 statement in the REPDB2OP

member of the RUNLIB library.

IDMS Log-based One of the following values:- The Logsid value that is displayed for the registration group in the

PowerExchange Navigator.- The LOGSID parameter value in the ECCRIDLP member of the

RUNLIB library.

IMS One of the following values:- The IMSID value that is displayed for the registration group in the

PowerExchange Navigator.- For IMS log-based CDC, the first parameter of the IMSID statement in

the CAPTIMS member of the RUNLIB library.

Microsoft SQL Server The Instance name that is displayed for the registration group in thePowerExchange Navigator.

Oracle ORCL and UOWC

VSAM The Instance name that is displayed for the registration group in thePowerExchange Navigator.

EPWD

A deprecated parameter. Use CAPTURE_NODE_EPWD instead. If both CAPTURE_NODE_EPWD andEPWD are specified, CAPTURE_NODE_EPWD takes precedence.

EXT_CAPT_MASK

Specifies an existing path and unique prefix to be used for generating the PowerExchange Logger log files.

CDC Offload and Multithreaded Processing 165

Page 178: Implement CDC

PWD

A deprecated parameter. Use CAPTURE_NODE_PWD instead. If both CAPTURE_NODE_PWD and PWD arespecified, CAPTURE_NODE_PWD takes precedence.

RESTART_TOKEN and SEQUENCE_TOKEN

Optional. Specifies a restart point for starting change data processing when the PowerExchange Logger iscold started.

The format of the restart tokens varies based on data source type and, if specified, must match the formatrequired by the DB_TYPE specified. If you do not specify these parameters, the PowerExchange Logger usesthe end of the change stream as the restart point when cold started.

UID

A deprecated parameter. Use CAPTURE_NODE_UID instead. If both CAPTURE_NODE_UID and UID arespecified, CAPTURE_NODE_UID takes precedence.

RELATED TOPICS:¨ “PowerExchange Logger for Linux, UNIX, and Windows” on page 19

Configuring dbmover.cfg on the PowerExchange Logger MachineOn the remote system where the PowerExchange Logger will run, configure the dbmover.cfg file that thePowerExchange Logger and PowerExchange Listener will use.

Note: Unless the change data is captured on the PowerCenter Integration Service machine, you must run aPowerExchange Listener so CDC sessions can extract the offloaded change data.

The dbmover.cfg file that the PowerExchange Listener uses must specify the same CAPT_PATH value as thedbmover.cfg that the PowerExchange Logger uses. Alternatively, you can use the same dbmover.cfg configurationfile for the PowerExchange Logger and PowerExchange Listener. This step assumes that you use the samedbmover.cfg file.

PowerExchange provides a sample dbmover.cfg file in the PowerExchange installation directory, which you cancopy and then edit. For CDC offload processing, set the following parameters:

CAPT_PATH

Specifies the path to the directory where the CDCT file resides. The CDCT file contains information about thePowerExchange Logger log files, such as file names and number of records.

Each PowerExchange Logger that uses CDC offload processing to capture change data requires its ownCDCT file.

CAPX CAPI_CONNECTION

Specifies parameters for continuous extraction of change data from PowerExchange Logger log files. Incontinuous extraction mode, extractions run in near real time and read the data in the PowerExchange Loggerlog files as the change stream.

In the DFLTINST parameter of the CAPX CAPI_CONNECTION, specify the DBID value from thePowerExchange Logger pwxccl.cfg configuration file.

LOGPATH

Specifies the path to the PowerExchange log files that contain PowerExchange Logger messages.

NODE

Specifies the TCP/IP connection information for a PowerExchange Listener.

166 Chapter 10: Monitoring and Tuning Options

Page 179: Implement CDC

Configure a NODE statement for the system on which the change data was originally captured. Specify thenode name for this statement in the CAPTURE_NODE parameter of the PowerExchange Logger pwxccl.cfgconfiguration file.

Source-specific CAPI_CONNECTION

Specifies CAPI parameters that are specific to the data source type and that PowerExchange uses to connectto the change stream.

Copy the CAPI_CONNECTION statements from the DBMOVER configuration file on the source system wherethe change data resides. Use the following table to select the correct CAPI_CONNECTION statement types toconfigure, based on source type:

CDC Source Type CAPI_CONNECTION Statements

DB2 for i5/OS AS4J and UOWC

DB2 for Linux, UNIX, and Windows UDB

Microsoft SQL Server MSQL

MVS sources LRAP and UOWC

Oracle ORCL and UOWC

For MVS sources, remove MVS-specific parameters from the UOWC CAPI_CONNECTION statement.

SVCNODE

Optional. Specifies the TCP/IP port on which a command-handling service for a PowerExchange Listener orPowerExchange Logger for Linux, UNIX, and Windows process listens for pwxcmd commands.

TRACING

Optional. Enables alternative logging. By using alternative logging, you can separate PowerExchange Loggermessages from other PowerExchange messages.

Configuring dbmover.cfg on the PowerCenter Integration Service MachineIn the dbmover.cfg configuration file on the PowerCenter Integration Service machine, add a NODE statement forthe PowerExchange Listeners that run on the following systems:

¨ The system where the change data was originally captured and where the capture registrations reside

¨ The system where the change data is stored in PowerExchange Logger for Linux, UNIX, and Windows log files

Configuring Capture Registrations for the PowerExchange LoggerFor the PowerExchange Logger on Linux, UNIX, and Windows to capture change data from a remote system,capture registrations for the remote source must specify Part for the Condense option.

If capture registrations do not specify Part for the Condense option, delete the capture registrations andcorresponding extraction maps. Then create the capture registrations again. PowerExchange generatescorresponding extraction maps. You can edit the PowerExchange-generated extraction maps or create additionalones.

Tip: Do not add DTL_BI or DTL_CI columns to the extraction maps if you set the CAPT_IMAGE parameter to AI inthe pwxccl.cfg configuration file. With the AI setting, the PowerExchange Logger captures after images only.

CDC Offload and Multithreaded Processing 167

Page 180: Implement CDC

Consequently, PowerExchange cannot populate BI columns with before images. Also, with this setting,PowerExchange writes Nulls to CI columns for any INSERT or DELETE operations.

Starting the PowerExchange Logger and PowerExchange ListenerStart the PowerExchange Logger and PowerExchange Listener on the remote system that will capture the changedata.

Note: If the remote system also runs the PowerCenter Integration Service, you can use local mode to extract thedata instead of a PowerExchange Listener.

Extracting Change Data Captured on a Remote SystemAfter you have captured change data on a remote system in the PowerExchange Logger for Linux, UNIX, andWindows log files, you can use continuous extraction mode to extract the change data in a CDC session. In theCDC session, select the appropriate PWX CDC Real Time connection for the source type. For example, if youcaptured change data for a DB2 for Linux, UNIX, and Windows source to PowerExchange Logger log files on aremote system, use a PWX DB2LUW CDC Real Time connection to extract the data.

Customize the following connection options to extract offloaded change data:

¨ Location. Specify the node name for the PowerExchange Listener that runs on the remote system where thechange data was stored in PowerExchange Logger log files.

¨ Map Location. Specify the node name for the PowerExchange Listener that runs on the source system wherethe change data was originally captured. The PowerExchange Listener on the original source system stores thecapture registrations.

¨ Map Location User and Map Location Password. Specify a user ID and password that can access thecapture registrations for the change data.

If the PowerExchange Listener on the source system is running on MVS or i5/OS and is configured withsecurity, specify a valid operating system user ID. You do not need to specify this parameter if thePowerExchange Listener is running without security.

If the PowerExchange Listener on the data source system is running on Linux, UNIX, or Windows, specify avalid database user ID.

¨ CAPI Connection Name Override. Specify the name of the CAPX CAPI_CONNECTION in the dbmover.cfgconfiguration file used by the PowerExchange Listener on the remote system where the change data is storedin PowerExchange Logger log files.

For more information about configuring PWX CDC Real Time application connections, see PowerExchangeInterfaces for PowerCenter.

Configuration File Examples for CDC Offload ProcessingThe following examples show the configuration required for CDC offload processing.

Extracting Change Data from Oracle Using CDC Offload Processing - ExampleIn this example, a CDC session that uses real-time connections to extract change data from an Oracle source ischanged to use CDC offload processing. The source change data remains on Oracle system but all column-leveland UOW Cleanser processing is moved to the PowerCenter Integration Service machine.

The Oracle system has the following CAPI_CONNECTION statements in the dbmover.cfg configuration file thatthe PowerExchange Listener uses to read change data:

/* UOW CleanserCAPI_CONNECTION=(NAME=UOWCORA,TYPE=(UOWC,CAPINAME=CAPIORA,RSTRADV=600))

168 Chapter 10: Monitoring and Tuning Options

Page 181: Implement CDC

/* Oracle CDCCAPI_CONNECTION=(NAME=CAPIORA,TYPE=(ORCL,catint=120,ORACOLL=PRODORA))

To extract change data from Oracle using CDC offload processing:

1. Configure the dbmover.cfg configuration file on the PowerCenter Integration Service machine for CDC offloadprocessing.

Copy the UOWC and ORCL CAPI_CONNECTION statements from the dbmover.cfg file on the Oracle systemto the dbmover.cfg configuration file on the PowerCenter Integration Service machine. In this example, thefollowing CAPI_CONNECTION statements are copied into the dbmover.cfg:

CAPI_CONNECTION=(NAME=UOWCORA,TYPE=(UOWC,CAPINAME=CAPIORA,RSTRADV=600))CAPI_CONNECTION=(NAME=CAPIORA,TYPE=(ORCL,catint=120,ORACOLL=PRODORA))

2. Stop the CDC session.

3. Update the following options on the PWX CDC Real Time application connection in the CDC session:

¨ Select Yes for the Offload Processing option.

¨ In the CAPI Connection Name option, specify the name of the UOWC CAPI_CONNECTION statement. Inthis example, the name is UOWCORA.

4. Restart the CDC session.

Capturing and Extracting Change Data from a Remote UNIX System - ExampleIn this example, change data for Oracle sources is captured by the PowerExchange Logger for Linux, UNIX, andWindows on a different UNIX system from where the Oracle instance runs. The Oracle sources are registered forcapture on the UNIX system where the Oracle instance runs. A CDC session then extracts the change data for theOracle sources from PowerExchange Logger log files on the remote UNIX system, rather than from the systemwhere the change data was originally captured.

The original UNIX system has the following CAPI_CONNECTION statements in the dbmover.cfg file that thePowerExchange Listener uses to read change data:

/* UOW CleanserCAPI_CONNECTION=(NAME=UOWCORA,TYPE=(UOWC,CAPINAME=CAPIORA,RSTRADV=600))/* Oracle CDCCAPI_CONNECTION=(NAME=CAPIORA,TYPE=(ORCL,catint=120,ORACOLL=PRODORA))

The instance name used to register the Oracle tables for capture on the original UNIX system is called PRODORA.

The following procedure assumes that PowerExchange is installed and configured on the remote UNIX systemwhere the PowerExchange Logger for Linux, UNIX, and Windows will run.

To capture and extract change data from a remote UNIX system:

1. Configure the PowerExchange Logger for Linux, UNIX, and Windows on the remote UNIX system bycompleting the following steps:

¨ Configure pwxccl.cfg.

¨ Configure dbmover.cfg on the PowerExchange Logger machine.

In this example, the dbmover.cfg on the remote UNIX system has the following parameters:/*/* dbmover.cfg /*LISTENER=(unix1,TCPIP,2480)NODE=(ORA2,TCPIP,prodora2,2480)...logpath=/pwx/logs/oracondCAPT_XTRA=/pwx/capture/oracond/camapsCAPT_PATH=/pwx/capture/oracondORACLEID=(PRODORA,ORAINST2,ORAINST2,ORAINST2)/*/* Source-specific CAPI ConnectionCAPI_CONNECTION=(NAME=UOWCORA,TYPE=(UOWC,CAPINAME=CAPIORA,RSTRADV=600))

CDC Offload and Multithreaded Processing 169

Page 182: Implement CDC

CAPI_CONNECTION=(NAME=CAPIORA,TYPE=(ORCL,catint=120,ORACOLL=PRODORA))/*/* CAPX CAPI Connection for continuous extractionCAPI_CONNECTION=(NAME=CAPXORA,TYPE=(CAPX,DFLTINST=PRODORA,FILEWAIT=60,RSTRADV=600))

In this example, the pwxccl.cfg file has the following parameters:/*/* pwxccl.cfg /*DBID=PRODORA DB_TYPE=ORACONN_OVR=UOWCORACAPTURE_NODE=ORA2CAPTURE_NODE_UID=orauserCAPTURE_NODE_PWD=orapwdEXT_CAPT_MASK=/pwx/capture/oracond/condense CHKPT_NUM=3 CHKPT_BASENAME=/pwx/capture/oracond/condense.chkpt COND_CDCT_RET_P=50 COLL_END_LOG=0 NO_DATA_WAIT=1 NO_DATA_WAIT2=2 FILE_SWITCH_VAL=200000 FILE_SWITCH_CRIT=R CAPT_IMAGE=BA SIGNALLING=N UID=orauserPWD=orapwdVERBOSE=Y

2. After you configure the dbmover.cfg and the pwxccl.cfg configuration files, start the PowerExchange Listenerand PowerExchange Logger on the remote UNIX system.

3. On the PowerCenter Integration Service machine, customize the following statements:

¨ NODE statement to point to the PowerExchange Listener on the remote UNIX system, which is where thePowerExchange Logger runs.

¨ NODE statement to point to the PowerExchange Listener on the original UNIX system, which is where theOracle instance runs and the tables are registered for capture.

In this example, the following statements are added to the dbmover.cfg on the PowerCenter IntegrationService machine:

NODE=(unix1,TCPIP,unix1,2480)NODE=(ORA2,TCPIP,prodora2,2480)

4. Create and configure the PowerCenter mapping, session, and workflow to extract the change data.

5. To extract the change data from the remote UNIX system, configure a PWX Oracle CDC Real Timeapplication connection in the CDC session.

In this example, specify the following options to point to the remote UNIX system for the change data, theoriginal UNIX system for the extraction maps, and the CAPX CAPI_CONNECTION name to use continuousextraction mode:

¨ For the Location option, specify unix1.

¨ For the Map Location option, specify ORA2.

¨ For the Map Location User option, specify a valid Oracle user ID.

¨ For the Map Location Password option, specify the password for the Oracle user ID.

¨ For the CAPI Connection Name option, specify CAPXORA.

Cold start the CDC session to extract the change data from the PowerExchange Logger log files on theremote UNIX system.

170 Chapter 10: Monitoring and Tuning Options

Page 183: Implement CDC

I N D E X

Aalternative logging 25, 43application name

configuring for CDC sessions 132application names 113architectural diagrams

batch or continuous extraction processing 8real-time extraction processing 8

architecture, PowerExchange CDC 8archive log destination 84ARCHIVELOG mode

enabling for Oracle LogMiner CDC 84

Bbatch extraction mode 106

Ccache files 23CAPI connection statements

CAPI_CONNECTION statement 12CAPI_SRC_DFLT statement 12CAPX parameters 14introduction 14MEMCACHE parameter 155MSQL CAPI_CONNECTION statement 76ORCL CAPI_CONNECTION statement 91, 92RSTRADV parameter 155UDB CAPI_CONNECTION statement 62UOWC parameters 99

capture catalog tablecreating 60DTLUCUDB SNAPSHOT command 61initializing the table 61

capture registrationsgrouping in PowerExchange Logger group definition file 44settings for the PowerExchange Logger 27

CAPX CAPI_CONNECTION parametersparameters and syntax 14

catalog, Oraclecopying for Oracle LogMiner CDC 87parameters in ORCL CAPI_CONNECTION 97

CDC data mapextraction map 137

CDC sessionsbuffer memory 159commit processing 118default restart points 114methods of starting 113, 140monitoring in PowerCenter 151monitoring in PowerExchange 148offload processing 123, 159

recovery example 146restart points for warm starts 115restart token file 137stopping 142tuning 154

CDCT file 21, 53, 54change data capture (CDC)

architecture 8data source types 4DB2 for Linux, UNIX, and Windows CDC 56Oracle LogMiner CDC 80overview 2PowerExchange components 6SQL Server CDC 70task summary 10

change data extractioncreating restart tokens for extractions 135extracting data captured from a remote system 168extraction modes 106monitoring in PowerCenter 151monitoring in PowerExchange 148offload processing 159overview 3overview of extracting change data 125task flow 126testing extraction maps 126tuning CDC sessions 154

checkpoint files 22, 53close (pwxcmd) 16closeforce (pwxcmd) 16commit processing

configuring for CDC sessions 133controlling with connection attributes 119examples 121in CDC sessions 118minimum and maximum rows per commit 120target latency 121tuning 159

compatible parameter 84components, PowerExchange

for CDC 6PowerExchange Listener 6, 12PowerExchange Logger 6PowerExchange Navigator 7

configuration tasksDB2 for Linux, UNIX, and Windows CDC 58, 59Oracle LogMiner CDC 83PowerExchange Listener 12PowerExchange Logger 27SQL Server CDC 73

continuous extraction mode 106Controller task, PowerExchange Logger 20

171

Page 184: Implement CDC

Ddata maps

use in DB2 for Linux, UNIX, and Windows CDC 65data sources, types 4database row tests 126datatypes

SQL Server 71DB2 for Linux, UNIX, and Windows CDC

changing a source table definition 66configuring in DB2 58configuring in PowerExchange with the Logger 60configuring in PowerExchange without the Logger 59creating the capture catalog table 60dbmover.cfg parameters 61example dbmover.cfg statements 62IBM APARs 69initializing the capture catalog table 61overview 56planning 57prerequisites 57restrictions 58stopping 66troubleshooting 69user authority requirement 57using a data map 65

DB2 partitioned databasesreconfiguring 67

DB2 SQL1224 error 69DB2CODEPAGE environment variable 58DB2NOEXITLIST environment variable 58dbmover.cfg

APPBUFSIZE 155CAPI_CONNECTION statements 12CAPI_SRC_DFLT statement 12CAPT_PATH parameter 43CAPT_PATH statement 12CAPT_XTRA statement 12COMPRESS parameter 155DB2 for Linux, UNIX, and Windows CDC parameters 61DB2 for Linux, UNIX, and Windows example statements 62general CDC parameters 12LOGPATH parameter 43Oracle LogMiner CDC example statements 90Oracle LogMiner CDC parameters 90PowerExchange Logger parameters 43SQL Server CDC example statements 76SQL Server CDC parameters 75SVCNODE parameter 43TRACE parameter 155TRACING parameter 43types of CAPI connection statements for CDC 14

detail.log 25diagrams

batch or continuous extraction processing 8real-time extraction processing 8

DISPLAY ACTIVE command 150DTL__CAPXRESTART1

sequence token 135DTL__CAPXRESTART2

restart token 135DTLUAPPL

displaying restart tokens 135DTLUCUDB SNAPSHOT command 61DTLUTSK utility 142

Eextraction map columns, PowerExchange-generated

DTL__BI_columnname 106DTL__CAPXACTION 106DTL__CAPXCASDELIND 106DTL__CAPXRESTART1 106DTL__CAPXRESTART2 106DTL__CAPXRRN 106DTL__CAPXTIMESTAMP 106DTL__CAPXUOW 106DTL__CAPXUSER 106DTL__CI_columnname 106

extraction mapsPowerExchange-generated columns 106

extraction modes 106extraction of change data

creating restart tokens for extractions 135extracting data captured from a remote system 168extraction modes 106monitoring in PowerCenter 151monitoring in PowerExchange 148offload processing 159overview of extracting change data 125task flow 126testing extraction maps 126tuning CDC sessions 154

Ffile switches

description 25FILESWITCH command 49

Ggroup definition file

configuring for PowerExchange Logger 44example file 46GROUP statement 45REG statement 45SCHEMA statement 45statements and parameters 45

group sourcedescription 116processing CDC data for multiple source definitions 117

Iidle time

configuring for a CDC session 131description 131

integration with PowerCenter 7

Llisttask (pwxcmd) 17, 150lock files 23log files of PowerExchange Logger

file switches 25log files, PowerExchange Logger

maintaining 53naming 22

LogMiner, Oracle

172 Index

Page 185: Implement CDC

configuring for Oracle CDC 86

Mmaximum row count

configuring for a CDC session 133message log files 25Microsoft SQL Server CDC

changing a source table definition 79configuration tasks 73configuring in PowerExchange with the Logger 75configuring in PowerExchange without the Logger 74datatypes supported 71dbmover.cfg parameters 75example dbmover.cfg statements 76overview 70planning 71prerequisites 71restrictions 73stopping 78user authority requirements 71

minimal global supplemental logging 86minimum row count

configuring for a CDC session 133monitoring CDC sessions

PowerCenter output to monitor 151PowerCenter session log messages 151PowerExchange extraction statistics messages 149PowerExchange multithreaded processing statistics 149PowerExchange output to monitor 148PowerExchange read progress messages 148viewing performance details in PowerCenter 151

MSQL CAPI_CONNECTION statementparameters and syntax 76

multithreaded processingenabling for CDC sessions 161overview 123, 159planning considerations 160restrictions and requirements 160statistics messages 149

Ooffload processing

configuration examples 168enabling for CDC sessions 161Logger capture of changes from a remote source 162overview 123, 159planning considerations 160restrictions and requirements 160

oracapt_rac.sql 83oracapt.sql 83Oracle CDC

configuring Oracle LogMiner 86Oracle LogMiner

configuring for Oracle CDC 86Oracle LogMiner CDC

archive log destination 84changing a source table definition 102compatible parameter 84configuration in a RAC environment 87configuration script files 83configuring in Oracle 83configuring PowerExchange with the Logger 89configuring PowerExchange without the Logger 88copying the Oracle catalog 87dbmover.cfg parameters 90

enabling ARCHIVELOG mode 84example dbmover.cfg statements 90overview 80performance considerations 83planning 81restrictions and requirements 81SQL*Loader restrictions 82stopping 102supplemental logging requirement 86supported datatypes 81transaction_auditing parameter 84user privileges required 85

ORCL CAPI_CONNECTION statementCATBEGIN parameter 97CATEND parameter 97CATINT parameter 97Oracle catalog parameters 97parameters and syntax 91, 92

output files, PowerExchange Loggercache files 23CDCT file 21checkpoint files 22

Ppartitioned DB2 database

reconfiguring 67performance

CDC session performance details 151offload processing and multithreaded processing 159Oracle LogMiner CDC considerations 83

PowerExchange Client for PowerCenter (PWXPC) 7PowerExchange components

for CDC 6PowerExchange Listener 6, 12PowerExchange Logger 6PowerExchange Navigator 7

PowerExchange ListenerCLOSE command 16DISPLAY ACTIVE command 17, 150displaying active listener tasks 17overview 12starting 16stopping 16STOPTASK command 16

PowerExchange Logger for Linux, UNIX, and Windowsassessing performance 52backing up CDCT, checkpoint, and log files 54batch mode 26cache files 23CDCT file 21change capture from a remote source 162checkpoint files 22cold starting 49CONDENSE command 49configuring 27continuous mode 26controlling 49dbmover.cfg parameters 43DISPLAY ALL command 49DISPLAY CHECKPOINTS command 49DISPLAY CPU command 49DISPLAY EVENTS command 49DISPLAY MEMORY command 49DISPLAY RECORDS command 49DISPLAY STATUS command 49extracting remotely captured changes from Logger log files 168

Index 173

Page 186: Implement CDC

FILESWITCH command 49group definition file 44lock files 23log file switches 25log files 22maintaining CDCT file and log files 53memory requirement on Linux and UNIX 27message log files 25offload processing 162operational modes 25output files 21overview 19pwxccl.cfg parameters 28regenerating the CDCT file after a failure 54required capture registration settings 27running in background mode on Linux or UNIX 27SHUTCOND command 49SHUTDOWN command 49start point in change stream 48starting 47stopping 49subtasks 20

PowerExchange-generated extraction map columnsDTL__BI_columnname 106DTL__CAPXACTION 106DTL__CAPXCASDELIND 106DTL__CAPXRESTART1 106DTL__CAPXRESTART2 106DTL__CAPXTIMESTAMP 106DTL__CAPXUOW 106DTL__CAPXUSER 106DTL__CI_columnname 106DTL__columnname_CNT 106DTL__columnname_IND 106

pwxccl statementparameters 48syntax 47

pwxccl.cfgCAPT_IMAGE parameter 29CAPTURE_NODE parameter 29CAPTURE_NODE_EPWD parameter 29CAPTURE_NODE_PWD parameter 29CAPTURE_NODE_UID parameter 29CHKPT_BASENAME parameter 29CHKPT_NUM parameter 29COLL_END_LOG parameter 29COND_CDCT_RET_P parameter 29CONDENSE_SHUTDOWN_TIMEOUT parameter 29CONDENSENAME parameter 29configuring 28CONN_OVR parameter 29DB_TYPE parameter 29DBID parameter 29example file 42EXT_CAPT_MASK parameter 29FILE_FLUSH_VAL parameter 29FILE_SWITCH_CRIT parameter 29FILE_SWITCH_MIN parameter 29FILE_SWITCH_VAL parameter 29GROUPDEFS parameter 29LOGGER_DELETES_EXPIRED_CDCT_RECORDS parameter 29MAX_RETENTION_EXPIRY_DAYS parameter 29NO_DATA_WAIT parameter 29NO_DATA_WAIT2 parameter 29parameters 28PROMPT parameter 29RESTART_TOKEN parameter 29SEQUENCE_TOKEN parameter 29SIGNALLING parameter 29

UID parameter 29VERBOSE parameter 29

pwxcmdclose 16closeforce 16listtask 17listtask command 150

PWXPC 7

Rreal application clusters (RACs)

configuring for Oracle LogMiner CDC 87real-time extraction mode 106real-time flush latency

configuring for a CDC session 133reconfiguring DB2 partitioned database 67recovery

example 146PM_REC_STATE table 111, 112PM_RECOVERY table 111PM_TGT_RUN_ID table 111recovery information for nonrelational targets 112recovery state file for nonrelational targets 113recovery tables for relational targets 111

restart$PMRootDir/Restart 132, 136application name 132default restart points 114earliest restart points 114methods of starting CDC sessions 113, 140null restart tokens 114restart token file 110, 132restart token file folder 132RESTART1 138RESTART2 138

restart pointsdefaults 114earliest 114

restart tokenDTL__CAPXRESTART2 135

restart token fileexample 139explicit override 137overview 109special override 138syntax 137

restart tokenscreating for extractions 135displaying with DTLUAPPL 135DTL__CAPXRESTART1 135DTL__CAPXRESTART2 135null 114overview 109recovery state file 113recovery state table 112

row tests 126

Ssequence token

DTL__CAPXRESTART1 135SHOW_THREAD_PERF parameter 149source RDBMSs 4source table definitions

changing a DB2 table definition 66

174 Index

Page 187: Implement CDC

changing a SQL Server table definition 79changing an Oracle table definition 102

SQL Server CDCchanging a source table definition 79configuration tasks 73configuring in PowerExchange with the Logger 75configuring in PowerExchange without the Logger 74datatypes supported 71dbmover.cfg parameters 75example dbmover.cfg statements 76overview 70planning 71prerequisites 71restrictions 73stopping 78user authority requirements 71

SQL*Loaderrestrictions for Oracle CDC 82

STOPTASK commandCDC sessions, stopping 142

supplemental logging, Oracle 86

Ttask flow

CDC implementation 10extracting change data 126

terminating conditionsidle time for CDC sessions 131

testing a change data extraction 126transaction_auditing parameter 84troubleshooting

DB2 for Linux, UNIX, and Windows CDC 69tuning CDC sessions

APPBUFSIZE parameter 155buffer memory 159CAPI_CONNECTION MEMCACHE parameter 155CAPI_CONNECTION RSTRADV parameter 155commit processing tuning 159COMPRESS parameter 155DBMOVER tuning parameters 155methods 154PWX CDC connection options 157TRACE parameter 155

UUDB CAPI_CONNECTION statement

parameters and syntax 62UOW count

configuring for a CDC session 133UOWC CAPI_CONNECTION parameters

parameters and syntax 99user authority

DB2 for Linux, UNIX, and Windows CDC requirement 57Oracle LogMiner CDC requirements 85SQL Server CDC requirement 71

Wwarm starts

CDC session restart points 115

Index 175