Download - Www.infocrest.com1 11262004 Copyright ©2004 Advanced PowerCenter V 7.1.1 by

www.infocrest.com 1

11262004

Copyright ©2004

Advanced PowerCenter V 7.1.1

by

www.infocrest.com

11262004

2 Copyright ©2004

Table of ContentsPartitioning104 Architecture Review 105 Guidelines and Definitions 107 Partition Methods 111 Cache Partitioning 113 Limitations 114 Demo 116Performance & Tuning 130 Tuning 101 131 Target Bottlenecks 135 Source Bottlenecks 137 Mapping Bottlenecks 139 Session Bottlenecks 141 System Bottlenecks 144 Using Performance Counter 146Command Line Utilities 155 pmcmd 160 pmrep 163 pmrepagent 167Repository MX Views 168 Viewing Dependencies 172 Performance Historical Data 175 Documentation 177

Introduction and Objectives 3Mapping Techniques 8 Local Variables 10 Expression Transformation 11 Aggregator 15 Aggregate Functions 17 De-Normalizing 18 Lookup Transformation 21 Dynamic Lookups 27 Transaction Control 33 Transaction Based Processing 36 Joiner 39 Stored Procedure 43Debugger Tips 50Mapping Error Handling 58 Duplicate Keys 64 Audits 66Workflow Manager 68 Design 71 Sessions & Tasks 73 Worklets 79 Workflows 81 Triggers and Schedulers 85 Testing 92Session and Server Variables 95 Parameter Files 100 Incremental Loads 103

www.infocrest.com 3

11262004

Copyright ©2004

Introduction

www.infocrest.com 4

11262004

Copyright ©2004

Founded in 1993

Leader in Enterprise Data Integration Platform

More than 1500 customers

Global presence

High growth

Public since April 1999

Symbol: INFA

www.infocrest.com 5

11262004

Copyright ©2004

Training & Knowledge Transfer:Creator of the Informatica Boot CampINFA Project reviewsPM/PC Implementation methodologies

Founded in 1997 in Los Angeles, CAPrincipal: Jean Abi-JaoudéInformatica Partner since 1997Fortune 100 clients (GE, Qualcomm, Disney ..)

www.infocrest.com 7

11262004

Copyright ©2004

Class Objectives

Advanced Mapping Techniques Mapping Error Handling Workflow Techniques Parameter Files Partitioning Performance and Tuning Command Line Utilities MX Views

www.infocrest.com 8

11262004

Copyright ©2004

Mapping Techniques

www.infocrest.com 9

11262004

Copyright ©2004

Expression Transformation Using local variables, forward and self references Using an expression with a sequence generator to achieve very large primary key sequences

Aggregator Transformation Sorted input De-normalizer techniques Aggregate functions

Lookup Transformation Lookup Caches SQL overrides Dynamic lookups

Transaction Control Dynamic commits Transaction Based Processing

Joiner Transformation Using sorted input Self-joins

Stored Procedures Pre and post load procedures Calling unconnected stored procedures

Transformation Objects

www.infocrest.com 10

11262004

Copyright ©2004

Self Reference A local variable can refer to itself as in:

Local Variables

Forward Reference A local variable can refer to a variable that is further down the port list Using this method, you can compare data from 2 or more consecutive rows

Keep the current value of v_EmployeeID IF v_IsDupe is true

Un-initialized Values Strings: empty (not a NULL value!) Numbers: 0 Dates: 01/01/1753

Transformations Local variables available in Expression, Aggregator and Rank

v_CustomerIDLagged takes the value of v_CustomerID before v_CustomerID is evaluated


Copyright ©2004

Expression Transformation

Lagger Technique When you need to make a decision or set a value based on the contents of 2 or more

consecutive rows Expression holds one or more field values in memory for one or more consecutive rows Makes use of the top to bottom order of evaluation for variable ports One input port, two variable ports and one output port needed for each lagged field Add one variable port for each additional row you wish to hold in memory

Usage Example One row holds a start time, the next row holds an end time, hold the first row in memory

to compute the elapsed time

Use variable forward reference to assign v_ShipCountry to v_ShipCountry_Lagged before v_ShipContry’s value is changed


Copyright ©2004


Lagger Mapping Needs Dummy ‘pusher’ row to push the last row of data out of the lagger expression

Flat files: add an extra row using a pre-session command task Relational: add an UNION ALL statement in a SQL override

Filter or Router group to remove the first row coming out of the lagger expression This row holds empty (un-initialized values from the expression)

Source Qualifier with UNION ALL SQL override

Lagger expression to detect duplicates. Has a flag with 3 values: •dupe•not dupe•first row

Router to direct the flow to dupe and non dupes target. Router will reject the first row based on the flag value


Copyright ©2004


Generating Very Large Surrogate Key Values Sequence generator limited to 2 billion + Use an expression to increment a multiplier when the sequence reaches a given

threshold Use a mapping variable to store the multiplier value in the repository after each run


Copyright ©2004

Aggregator Sorted Input

Use it whenever you can Improves the pipeline

performance Only the rows needed

for one group are kept in memory

All aggregations are done using RAM, not disk cache

You need to pre-sort the data using either a SQL override in your source qualifier, a Sorter Transformation or pre-sorted flat files

You need to sort the data in the order of the ‘group by’ ports in your aggregator The Informatica Server will fail the session if the data is not sorted in strict ascending

or descending order The Informatica Server will revert to unsorted input behavior when:

There is no ‘group by’ port in your aggregator You select ‘Incremental Aggregation’ in the aggregator properties You have nested aggregate functions in your aggregator’s expressions


Copyright ©2004

Read

Transform

Write

Aggregator Why Sorted Input Improves Performance

Informatica Server processes data in buffered stages Read, Transform and Write stages normally operate simultaneously, feeding each

others buffers An unsorted Aggregator is a bottleneck, as the write stage must wait until all

aggregation is done before processing any row

Read

Transform

Write

unsorted sorted

Elapsed Time Elapsed Time


Copyright ©2004

Aggregate Functions Nesting functions

You can nest aggregate functions as in MAX(SUM(order_amount)) The Informatica server ignores ‘sorted input’ when you nest aggregate functions You cannot mix nested and non-nested functions in the same aggregator

Conditional clause

Use instead of an IIF statement for aggregate expressions Syntax is : FUNCT(input, [Boolean expression]) Example: AVG (price, price > 2000), computes the average price value when price is above

2000 The input will be taken into account for the aggregation if and only if the Boolean expression

returns TRUE

Availability

With output ports only, no variable port can have an aggregate expression


Copyright ©2004

Methods Using the first, last, min or max functions with a conditional clause

For simple cases Using local variables and forward and self reference techniques

For more complex aggregations or concatenation of string values Limitations

Use only when the input has a known number of repeating rows or a know maximum number of repeating rows

De-Normalizing

BookID Language

1324 English

1324 French

1324 Spanish

1325 English

BookID EnglishFlag FrenchFlag SpanishFlag

1324 Y Y Y

1325 Y N N

Definition Pivot table to arrange data in multiple field in a single row rather than one field in multiple

rows

Pivot


Copyright ©2004

De-Normalizing

Using Aggregate Functions Use with a conditional expression to extract the value of the first or last row that matches the

condition

Amount Year Quarter

254,556 2003 first

546,332 2003 second

129,034 2003 third

490,223 2003 fourth

165,768 2004 first

265,443 2004 second

510,412 2004 third

435,690 2004 fourth

Year Q1_Amount Q2_Amount Q3_Amount Q4_Amount

2003 254,556 546,332 129,034 490,223

2004 165,768 265,443 510,412 435,690

In this case, it does not matter what aggregate expression is used. Last, Max or Avg would do just as well.


Copyright ©2004

De-Normalizing

Group by bookid

Use forward reference technique to identify new groups Use expressions to set each

language flag. Set flag to ’Y’ if book is published in that language. Uses self-reference to keep the previous setting unless it is a new groupOutput ports hold the

value of the flag for each group

Using Local Variables Local variables inside an aggregator are evaluated at each row In the debugger, you can only see values for each group, if the aggregator is sorted Local variables are needed for custom aggregation, like string concatenations


11262004

Copyright ©2004

Lookup Transformation

Lookup Caches1 - Informatica engine issues a SELECT statement against the database when the first row arrives at the lookup. The result set is split into 2 caches

2 - Data cache holds the columns in the output ports

3 - Index cache holds the columns used in the lookup condition

4 - Caches are queried internally for each row passing through the lookup. The query is based on the lookup condition and the values of the lookup input ports


11262004

Copyright ©2004

TRANSF_1_1_1> DBG_21097 Default sql to create lookup cache: SELECT OrderID,CustomerID FROM Orders ORDER BY CustomerID,OrderIDTRANSF_1_1_1> DBG_21079 Creating Lookup Cache : (Thu Sep 02 07:58:34 2004)TRANSF_1_1_1> DBG_21682 Lookup table row count : 830TRANSF_1_1_1> DBG_21297 Lookup cache row count : 830

Lookup query Always look at the session log and review the SQL query issued to

populate lookup caches This query always includes an ORDER BY clause.

This clause is generated automatically by Informatica and cannot be modified directly

Since every column is included in this ORDER BY, it can become very expensive

The only way around is to override the SQL query, add your own ORDER BY and finish the statement with a comment delimiter (such as -- )

Your custom ORDER BY must include the columns used in the lookup condition, in the order their appear in the condition tab

The engine expects rows to be sorted in the caches, so an ORDER BY clause is mandatory

Session log extract



11262004

Copyright ©2004

Lookup query If you only need a subset of the lookup table, override the SQL and write a WHERE

clause You can perform joins in the SQL override as long as the joined tables come from

the same database You must use the syntax SELECT <column name> AS <port name>. Your session

will fail otherwise even though the query may validate in the transformation

Use this field to set the lookup condition: OrderCount >= Zero and leave the ‘Zero’ port unconnected

You can use mapping variables and parameters in your override, for dynamic queries

You have to cache the lookup when you have a SQL override, otherwise you will get a run-time error



11262004

Copyright ©2004


Sharing Lookup Caches Un-named

For multiple lookups on the same table in the same mapping Automatically shared if condition ports are the same (although operators can

be different) The first lookup transformation must extract the same data set as subsequent

lookups or extract a superset Cannot share a dynamic lookup

Named To share lookups among sessions Lookup structures must be identical (same conditions, same output ports) SQL overrides must be identical Session’s high precision setting must be the same for all sessions sharing the

lookup files You mix dynamic and static shared lookups, but you can’t run them in

simultaneous sessions Tip

Make your shared lookup transformation reusable


11262004

Copyright ©2004

Lookup Transformation Shared Lookup Usage

Data Warehousing Dimensions may need to be looked up several times during the load process Optimize your workflow by having the first lookup create a named persistent lookup file Alter your other lookups to read from these cache files

Debugging Sessions with large lookups are time-consuming to debug If you can’t use a subset of the lookup data, make the cache persistent The first run will have to create the cache files but subsequent runs will be much faster

Example: Orders lookup used 3 times, in sessions 1, 2 and 3

First session must re-cache from the source


11262004

Copyright ©2004

Dynamic Lookups Dynamic Lookups

Use when looking up a target table, when you need to keep lookup cache and target contents in sync

Dynamic lookups can insert new rows or update existing rows in the cache (but cannot delete rows)

Updated, inserted or unchanged rows are flagged by the lookup transformation via the NewLookupRow port

The row type is not changed by the dynamic lookup, you need to set up Update Strategies downstream to change the actual row type

Ports

Source fields

Lookup fields

Lookup fields not used in the condition must be manually associated with their corresponding source field

Select ignore in comparison for fields you don’t need to compare to determine the update status

Select ignore null in comparison so null source values don’t trigger an update


11262004

Copyright ©2004

Dynamic Lookups Ports (cont.)

You can associate a lookup field holding a surrogate key (integer or small integer) with the special Sequence-ID value

The startup value of Sequence-ID is the highest field value found in the initial cache data for the surrogate key field or 1 if the initial cache is empty

This value is incremented every time a new row is added to the lookup cache You can use this feature in-lieu of a Sequence Generator transformation

Select Sequence-ID from the drop-down menu

NewLookupRow Port Pass this field to a router to branch out the data flow based on the insert/update/no-change

status 3 possible values:

0 = no change 1 = row inserted in the lookup cache 2 = row updated in the lookup cache


11262004

Copyright ©2004

Dynamic Lookups Properties

Lookup caching must be enabled

Toggle dynamic lookup. Automatically adds the NewLookupRow port when on

When the lookup updates a row in the cache, you can choose to return either old (before the update) or new (after the update) values. Useful for slowly changing dimensions, when you need both values.


11262004

Copyright ©2004

Properties (cont.)

Updating the cache Lookup cache can be updated if the incoming source row exists in the cache but the values

of one or more fields differ between the source row and the cached row By default, the dynamic lookup does not perform both insert and update operations on the

lookup cache You have to turn on ‘insert else update’ or ‘update else insert’ properties to enable this

feature, depending on the row type coming into the transformation

Insert Else Update When the row type is ‘insert’ (the default) Off: lookup cache is not updated, only new source rows are inserted into the cache On: both inserts and updates are performed

Update Else Insert When the row type is ‘update’ (you need an Update Strategies transformation upstream to

change the row type to update). Off: no lookup cache inserts are performed, only updates On: both inserts and updates are performed

Dynamic Lookups


11262004

Copyright ©2004

Caveats SQL override with a WHERE clause

When using a subset of data for building the lookup cache Lookup cache and target may get out of sync if your source stream does not use the same

filter as your WHERE clause

Dynamic Lookups

SQL override includes a WHERE clause to exclude some productID

Filter must exclude the same productIDs from the source

When using the Sequence-ID feature to generate surrogate keys, you may get duplicate key values if your lookup cache does not hold the entire data set

Ignore NULLS in comparison You may need an Expression Transformation after the dynamic lookup to make sure you

are not passing NULL values from the source into the target Speed Penalty

Dynamic lookups are a bit slower than regular lookups because they have to update their cache contents on the fly

Biggest hit when the lookup cache is paging to disk and there are many updated or inserted rows


Copyright ©2004

Transaction Control Overview

Active (row level) transformation Enable dynamic commit or rollback points Works for relational targets or MQ series targets Naming convention: TC_<what it does>

Iconic view


Copyright ©2004

Transaction Control Setting Up the Mapping

Define a Boolean condition expression and a transaction type.Transaction type will be executed when the expression evaluates to TRUE

Transaction constants TC_COMMIT_BEFORE, TC_ROLLBACK_BEFORE:

Commit /rollback rows for the current transaction, not including the current row. Initiate a new transaction starting at the current row

TC_COMMIT_AFTER, TC_ROLLBACK_AFTER: Commit/rollback rows for the current transaction including the current row. Will Initiate a new transaction starting after the current row

TC_CONTINUE_TRANSACTION Do not commit or rollback at this point


Copyright ©2004

Transaction Control Setting Up the Session

Set to user defined automatically when you have a valid transaction control transformation

You can tell the server to rollback transactions from the last commit point when it encounters non-fatal errors

Checked by default, otherwise server rollbacks left over transactions if the last row does not coincide with a commit point


Copyright ©2004

Transaction Based Processing

Transformation Scope Property For Aggregator, Sorter, Joiner and Rank transformation

If ‘Transaction’ is selected, the transformation logic will only apply to rows within the same transaction

Target A ‘shipments’ table, containing one row per order per ship date

Challenge You need to aggregate the data by order id and ship date to populate this table The file is huge and you want to avoid either sorting the entire data set in the aggregator or sorting the data

in a pre-process

Transaction Based Solution The data is already sorted by order id. Any additional sorting can be done within the relatively small set of

order lines that comprise each order. Each set of line items will trigger a new transaction, and the scope of the sorter and the aggregator will be a

single transaction

Source: A very large flat file containing order details, at the line item level This file is already sorted by order id

Example scenario


Copyright ©2004

Transaction Processing Example

Mapping

Expression to detect a change in order ids. Compare the current order id with the previous order id, a change means we have the first row of a new order

Transaction control, set the type to TC_COMMIT_BEFORE every time a new order id is encountered

Sort by order id and ship date, scope set to ‘Transaction‘

Aggregate by order id and ship date, sorted input is On

Transaction control, to reset the commit point to target based

Sample input, sorted by order id but not by ship date

Sample output, one row per distinct order id and ship date, ordered by order id and ship date


11262004

Copyright ©2004

Sorted Input Not selected

Cache the Master table rows in memory (or disk)

Brings in detail rows one at the time and compares them to the master rows depending on your join type and condition

You get better performance if the master table is the smallest of the two sources even if, logically this table is not the master

Selected Works only if both sources are pre-sorted Saves time by not having to cache the

entire master source If your sources are not sorted, you can use

a SQL override or a Sorter transformation before the Joiner

Pulls rows from both sources and compares them using the join type and condition

Fastest way to join if your sources are already sorted, otherwise weigh in the cost of using a database sort or a Sorter transformation

Joiner


11262004

Copyright ©2004

Caching

TRANSF_1_1_1> DBG_21077 Create joiner cache on master relation : (Wed Sep 08 05:31:04 2004)TRANSF_1_1_1> CMN_1690 Created new data file [D:\Program Files\Informatica PowerCenter 7.1\Server\Cache\PMJNR530_19.dat] and index file [D:\Program Files\Informatica PowerCenter 7.1\Server\Cache\PMJNR530_19.idx] for Joiner [JNR_OldNew].TRANSF_1_1_1> DBG_21214 Finished joiner cache on master relation : (Wed Sep 08 05:31:04 2004)TRANSF_1_2_1> DBG_21603 Open master relation cache for detail joiner : (Wed Sep 08 05:31:04 2004)TRANSF_1_2_1> DBG_21215 Finished processing detail relation : (Wed Sep 08 05:31:04 2004)

Monitor caches files on server while session is running

Use RAM cache only for best performance

Specify a cache directory on a separate disk to reduce i/o contentions

Joiner


11262004

Copyright ©2004

Joiner

Self-Join

Join two instances of the same source

Join two branches of the same pipeline

Aggregate dollars per company

Join source data to aggregated data


11262004

Copyright ©2004

Joiner

Self-Join To join two branches of the same pipeline, you must use sorted data and turn on the

sorted input property If you cannot use sorted data, you must read your source twice and join the two

pipelines You can also join output from multi-group transformations such as Router, XML

source qualifier or Custom transformation If you join 2 branches with an Aggregator in one branch, the Aggregator must use

sorted input as well


Copyright ©2004

Stored Procedure Pre and Post Load Stored Procedures

Connections $Source alias is automatically selected for Source Pre

and Post-Load types $Target alias is selected for Target Pre and Post Load

types When changing from a Source to a Target type, the

corresponding connection alias is restored, overwriting any custom relational connections you might have selected.

Call Text Accepts only hard-coded values, mapping parameters or variables will not be

expanded Use Pre/Post SQL if you want dynamic stored procedure calls Use double quotes if your input parameters include spaces; single quotes are

ignored

Properties tab, Stored Procedure Type


Copyright ©2004


Execution Plan To select the execution order of procedures having the same type Interface is similar to the Target Load Plan function

1- Choose Stored Procedures Plan… in the Mappings menu

3- The Execution order property is updated to reflect the execution plan

2- Move stored procedures within a group using the up and down arrows


Copyright ©2004


Source Pre Load and Target Pre Load types Run before the mapping starts reading rows Similar to Pre SQL statement

Source Post Load and Target Post Load types Run after the mapping has finished writing to the target(s) Similar to Post SQL statements

Debugging You need to use a session instance (within a workflow) to have the debugger

execute your Pre/Post Load procedures

Under Config Objects Error Handling

Session error handling Either Stop or Continue If you choose Stop and your session fails on a Post Load procedure, your target

rows are already committed


Copyright ©2004

Stored Procedure Calling Normal Unconnected Stored Procedures

Using PROC_RESULT, one output parameter, return value is lost

Using a local variable, one output parameter, return value is kept Local variable and associated output parameter must have the same data-

type

Using PROC_RESULT and a local variable, two output parameters


Copyright ©2004

Calling Normal Unconnected Stored Procedures

Nesting Stored Procedure calls using PROC_RESULT The output of the innermost procedure is passed as an input parameter to the

outermost procedure You can use PROC_RESULT once per procedure call

Within a conditional expression Your expression must evaluate to TRUE for the procedure to be called

Within an expression attached to an output port The port must be connected forward to the data flow or your procedure will not

run

Stored Procedure


Copyright ©2004

Normal Stored Procedures Session Error Handling

Fatal errors in connected or unconnected normal procedures will always cause the session to fail

Non-fatal errors will increment the session error counter. Your session will fail when this counter reaches the threshold set in the session properties

Non-fatal error rows are skipped by the server and written the the session log and reject file

Stored Procedure


11262004

Copyright ©2004

Debugger Tips


11262004

Copyright ©2004

Reusable sessions Create a reusable session to debug your mapping when practical

Use for both debugger and workflow Cuts down on development time, specially if you have lots of sources and

targets Many parameters are not available in debugger sessions

Debugger Tips

Using Existing Sessions Choose a session instance to run the debugger within the context of a

workflow If you want to test mapping or workflow parameters and variables If you want your debug session to execute Pre or Post Load stored procedures

Choose a session instance or a reusable session to test all session attributes not available with a simple debug session

To run session components such as command tasks or emails


11262004

Copyright ©2004

Drawback Remember to validate and save the reusable session when you make a big

change in your mapping Adding source or target Adding transformation

Sometimes, you’ll have to disconnect and reopen your folder in the Workflow Manager to register mapping changes

Beware of overridden parameters, like SQL overrides Session override takes precedence over the mapping Session instance overrides take precedence over the reusable session

object

Configuration objects You can’t specify a line buffer size or a block buffer size in a debug session But you can create a configuration object with the settings you want and use it

with your debug session

Debugger Tips


11262004

Copyright ©2004

Workspace Organize your workspace to display

Debug & session logs at the bottom, using full window length Transformation instances and target instance above, side by side

Debugger Tips

First row Monitor the time it takes to bring in the first row of data

It is acceptable ? If not, review the SQL query Or, a flat file source may be rejecting all the rows

Line buffer too smallWrong date/time conversions

Remember to switch to session log pane after the debug session is initialized


11262004

Copyright ©2004

Source data Always examine source data closely during the first run

Look for bad or unexpected data formats (date/time, amounts) Look for truncated data Make sure the data is indeed what you expect

Data movement Follow data through each transformation during the first run

Pay attention to unnecessary type conversions and back and forth type conversions

Verify the logic in each complex expression Look at how the data moves within your mapping

Do you have too many data paths ? It there a way to simplify the logic ?

Record the time it takes to load cached lookups Review the SQL query in the session log Do you need to bring in all the fields in the lookup ? You may want to override the ORDER BY clause inserted automatically by the server Do you have a complex join condition ?

» Make sure the conditions with an equal sign ‘=‘ are put at the top of the list

Debugger Tips


11262004

Copyright ©2004

Evaluate Expression Find out the value of a port, a variable or enter your own

expression to evaluate Not available for every transformation, only for:

Aggregator Expression Filter Rank Router Update Strategy

Results are displayed in between square brackets as in [VALUE] Easy to spot unexpected padded char values, such as [VALUE ] Unexpected padded values cause trouble in comparisons

Lookups Expressions

Find out the value of a mapping parameter or variable Start value, enter the name of the variable or parameter Current value, use the variable or parameter with a variable function

SETMAXVARIABLE SETMINVARIABLE SETCOUNTVARIABLE

Debugger Tips


11262004

Copyright ©2004

Debugger shutdown If running, click first on the ‘Break Now’ icon for a faster response

Discard or Not Discard ? When discarding target data, the Informatica server does NOT

execute pre or post SQL commands against the target database write sequence generator values to the repository truncate target tables prepare insert or delete SQL statements against the targets verify if the target tables actually exist in the target database

Choose to write target data for a true test of the target system The debugger only commits rows to the database at the end of a completed

(successful) debug session

Debugger Tips


11262004

Copyright ©2004

1. Guidelines

2. Overview of most common errors

3. Handling data errors

4. Handling duplicate key errors

5. Example of an error table

6. Performing audits

Error Handling


11262004

Copyright ©2004

Data quality Develop a set of standards or adopt existing company standards Standards should include

Definition of an error What data gets rejected and why

Error handling procedures in your mappings and sessions How data gets rejected or corrected

What to do with rejected data Ignore Save for manual review Save for automated reload

Define your error strategy Errors are inevitable A good error strategy tries to:

Catch errors and reject bad rows Store these rows somewhere Set a threshold for an acceptable number of errors Provide a way to report on rejected rows as part of the load process

Error Handling Guidelines


11262004

Copyright ©2004

Data errors Incorrect or corrupted source data Unexpected values Data conflicts with business rules Referential integrity failure

Database errors Row violates primary key constraints Inserting a NULL value in a non null field

Session errors Exceeds error threshold Wrong connection settings or line buffer too small Insufficient resources (disk space or memory)

Dependencies errors A session or a job on which your session depends did not run properly Missing trigger file

Time constraints The load process did not terminate within the allotted time window

Audit/Balance Row counts or sum totals do not match Data mismatch between production systems and data warehouse

Server errors Informatica server or database server goes down Bad network connections Hardware failure

Typical Errors


11262004

Copyright ©2004

Handling in mappings At the field level

Program defensively, don’t leave open doors or loose ends Handle NULL values, blanks or zeroes consistently and in accordance to your error

strategy Use consistent default values for cases such as

Missing or blank data Invalid or NULL lookup return value Invalid or corrupted source data

Make sure these default values are understood and approved by your users Override the default ERROR() function for output ports

Use a constant value or expression

At the row level Use a custom ERROR() function

To force a row to be skipped To describe the details of an error condition in the session log Beware, this method can throw off a Joiner Transformation downstream Rows rejected this way will not be written to the reject file

Transfer rejected rows to a dedicated error table or error file For automated reloads For audit purposes Most flexible solution but also the most expensive in terms of work and processing

overhead

Data Errors


11262004

Copyright ©2004

Informatica Server behavior During normal load (row by row)

Rows that violate database constraints will be rejected by the database Rejected rows are written to the reject file (.bad) Details also appear in the session log (normal tracing) Error handling is automatic but there is a big performance hit if there is more than a few errors

Using External loaders Load may succeed but indexes will be left in unusable state Rejected rows will be written to the .ldrreject file Error details may be found in the .ldrlog file

Handling If performing full refreshes

whenever possible, eliminate duplicates at the source, before the load If performing updates

Load in normal mode and let Informatica handle it Run post process audit to correct rejected rows

Alternative, when performing full refreshes or updates Be proactive and catch potential duplicates inside your mapping Most expensive solution, reserve for critical tables

Duplicate Keys


11262004

Copyright ©2004

Duplicate Keys

Mapping solutions for full refresh

1 - Relational source, duplicate rows routed to an error table

2 - Flat file source, duplicate rows destroyed, last row of duplicate group sent to the target

Normal mapping processing goes here…

Normal mapping processing goes here…


11262004

Copyright ©2004

Create a versatile error table Should provide storage for all types of audit errors

Primary key errors Data errors Row count errors Control totals errors

Should be reused by all mappings that perform audits

Not tied to one particular source or target table format

One row per error All data values displayed as string Includes a concatenated value of the primary key

fields to reference the source or target system Source and target names are fully qualified Identifies the process (mapping) that wrote rows to

the table

Audits

Run an automated report After each load

Total number of errors per error type List of sources and targets having triggered an error

Truncate the error table before each load If you need to keep data for several days, archive the daily table at the end of each

load process

Field name Data type Precision

SourceName String 32

TargetName String 32

SourceFieldName String 32

SourceFieldValue String 500

TargetFieldname String 32

TargetFieldValue String 500

PrimaryKeyConcat String 100

ErrorType String 16

AuditProcess String 64

InsertDate Date/time 19


11262004

Copyright ©2004

Row by row audit For critical tables, when extracting data from production systems to data warehouse

Audits

Source must be unchanged or staged for the audit to work

Joiner needed only when source and target in different databases or source is a flat or VSAM file


11262004

Copyright ©2004

Assembling Workflows Step 1

Design flow, analyze dependencies and devise error strategy

Step 2 Gather unit-tested mappings into reusable sessions

Step 3 Assemble load stages and/or subject areas into worklets

Step 4 Assemble worklets into workflow Implement workflow level extra functionality

Step 5 Implement trigger mechanisms and scheduler

Step 6 Test worklets, workflow, triggers and scheduler


11262004

Copyright ©2004

Step 1 - Design Large Workflows

Complex workflows handling complete loads offer more flexibility and functionality than a collection of small workflows

The new workflow manager interface makes it easy to design large and complex load processes

The workflow manager workspace window is self documenting The workflow monitor will give you a very good idea of the load’s progress, using the

Gantt chart view

Exception If you have an external scheduler, you can either:

Run single session workflows and let the scheduler handle Informatica job dependencies

Run more complex workflows from your scheduler and let the workflows handle the job dependencies


11262004

Copyright ©2004

Step 1 - Design The Big Picture

Analyze your sources and targets to determine your dependencies at the mapping level

Devise triggering mechanisms to start the process when the sources are ready Use a diagram to visualize the flow of data Design your error strategy at the session level and workflow level

What to do when a session fails Audit and balancing procedures to capture data discrepancies

Think about production issues Who will get noticed when an error occurs and by what means Design restart-ability into your workflows, worklets and mappings


11262004

Copyright ©2004

Step 2 - Sessions Reusable Sessions

After a session is unit-tested, move it to the project folder as a reusable object

Reusable sessions are easier to manage Easy to copy between folders Can be exported/imported as XML objects Easy to integrate into worklets and nested worklets All reusable sessions conveniently listed in the Navigator, under the

‘Sessions’ folder.

Reusable session instance


11262004

Copyright ©2004

Step 2 - Sessions Overriding Reusable Sessions Instances

Reusable sessions provides an additional level of override at the instance level, either in a workflow or a worklet

The instance override takes precedence over the master object If you change a property in the master object and that property is overridden at the

instance level, the override will still be in effect You must use the revert button to cancel an override, even if the value of the

overridden property is the same as the value in the master object Transformation properties cannot be overridden at the instance level

The target connection in this instance has been changed. Click Revert to return to the connection specified in the original session object


11262004

Copyright ©2004

Step 2 - Sessions Error Strategies

Use a reusable session configuration to: set the error threshold set the number of session logs to save set the log tracing level…

2- Click to pick your reusable session config from a menu of available configurations

1- Define common error handling behavior in a session configuration object


11262004

Copyright ©2004

Step 2 - Sessions Error Strategies

Use a reusable email objects in your session components to send pre-formatted messages on failure or success

1- Create an email task you can reuse for each session failure

2- attach this email to your sessions as a reusable component


11262004

Copyright ©2004

Step 2 – Sessions And Tasks Error Strategies

Use task general properties to implement error handling at the task level

Fail parent… Simply marks the enclosing Workflow or Worklet as failed Does not stop or abort the enclosing Workflow or Worklet Use when your scheduler relies on a failed status code to flag an error at

the workflow level Fail parent if this task fails

Changes the end status of the enclosing Workflow to failed when the task fails

Fail parent if this task does not run Changes the end status of the enclosing Workflow to failed when the task

does not run For instance, if a link condition between the previous task and this task

evaluates to FALSE

Download - Www.infocrest.com1 11262004 Copyright ©2004 Advanced PowerCenter V 7.1.1 by

Top Related