www.infocrest.com 1
11262004
Copyright ©2004
Advanced PowerCenter V 7.1.1
by
www.infocrest.com
11262004
2 Copyright ©2004
Table of ContentsPartitioning104 Architecture Review 105 Guidelines and Definitions 107 Partition Methods 111 Cache Partitioning 113 Limitations 114 Demo 116Performance & Tuning 130 Tuning 101 131 Target Bottlenecks 135 Source Bottlenecks 137 Mapping Bottlenecks 139 Session Bottlenecks 141 System Bottlenecks 144 Using Performance Counter 146Command Line Utilities 155 pmcmd 160 pmrep 163 pmrepagent 167Repository MX Views 168 Viewing Dependencies 172 Performance Historical Data 175 Documentation 177
Introduction and Objectives 3Mapping Techniques 8 Local Variables 10 Expression Transformation 11 Aggregator 15 Aggregate Functions 17 De-Normalizing 18 Lookup Transformation 21 Dynamic Lookups 27 Transaction Control 33 Transaction Based Processing 36 Joiner 39 Stored Procedure 43Debugger Tips 50Mapping Error Handling 58 Duplicate Keys 64 Audits 66Workflow Manager 68 Design 71 Sessions & Tasks 73 Worklets 79 Workflows 81 Triggers and Schedulers 85 Testing 92Session and Server Variables 95 Parameter Files 100 Incremental Loads 103
www.infocrest.com 3
11262004
Copyright ©2004
Introduction
www.infocrest.com 4
11262004
Copyright ©2004
Founded in 1993
Leader in Enterprise Data Integration Platform
More than 1500 customers
Global presence
High growth
Public since April 1999
Symbol: INFA
www.infocrest.com 5
11262004
Copyright ©2004
Training & Knowledge Transfer:Creator of the Informatica Boot CampINFA Project reviewsPM/PC Implementation methodologies
Founded in 1997 in Los Angeles, CAPrincipal: Jean Abi-JaoudéInformatica Partner since 1997Fortune 100 clients (GE, Qualcomm, Disney ..)
www.infocrest.com 7
11262004
Copyright ©2004
Class Objectives
Advanced Mapping Techniques Mapping Error Handling Workflow Techniques Parameter Files Partitioning Performance and Tuning Command Line Utilities MX Views
www.infocrest.com 8
11262004
Copyright ©2004
Mapping Techniques
www.infocrest.com 9
11262004
Copyright ©2004
Expression Transformation Using local variables, forward and self references Using an expression with a sequence generator to achieve very large primary key sequences
Aggregator Transformation Sorted input De-normalizer techniques Aggregate functions
Lookup Transformation Lookup Caches SQL overrides Dynamic lookups
Transaction Control Dynamic commits Transaction Based Processing
Joiner Transformation Using sorted input Self-joins
Stored Procedures Pre and post load procedures Calling unconnected stored procedures
Transformation Objects
www.infocrest.com 10
11262004
Copyright ©2004
Self Reference A local variable can refer to itself as in:
Local Variables
Forward Reference A local variable can refer to a variable that is further down the port list Using this method, you can compare data from 2 or more consecutive rows
Keep the current value of v_EmployeeID IF v_IsDupe is true
Un-initialized Values Strings: empty (not a NULL value!) Numbers: 0 Dates: 01/01/1753
Transformations Local variables available in Expression, Aggregator and Rank
v_CustomerIDLagged takes the value of v_CustomerID before v_CustomerID is evaluated
www.infocrest.com 1111262004
Copyright ©2004
Expression Transformation
Lagger Technique When you need to make a decision or set a value based on the contents of 2 or more
consecutive rows Expression holds one or more field values in memory for one or more consecutive rows Makes use of the top to bottom order of evaluation for variable ports One input port, two variable ports and one output port needed for each lagged field Add one variable port for each additional row you wish to hold in memory
Usage Example One row holds a start time, the next row holds an end time, hold the first row in memory
to compute the elapsed time
Use variable forward reference to assign v_ShipCountry to v_ShipCountry_Lagged before v_ShipContry’s value is changed
www.infocrest.com 1211262004
Copyright ©2004
Expression Transformation
Lagger Mapping Needs Dummy ‘pusher’ row to push the last row of data out of the lagger expression
Flat files: add an extra row using a pre-session command task Relational: add an UNION ALL statement in a SQL override
Filter or Router group to remove the first row coming out of the lagger expression This row holds empty (un-initialized values from the expression)
Source Qualifier with UNION ALL SQL override
Lagger expression to detect duplicates. Has a flag with 3 values: •dupe•not dupe•first row
Router to direct the flow to dupe and non dupes target. Router will reject the first row based on the flag value
www.infocrest.com 1311262004
Copyright ©2004
Expression Transformation
Generating Very Large Surrogate Key Values Sequence generator limited to 2 billion + Use an expression to increment a multiplier when the sequence reaches a given
threshold Use a mapping variable to store the multiplier value in the repository after each run
www.infocrest.com 1511262004
Copyright ©2004
Aggregator Sorted Input
Use it whenever you can Improves the pipeline
performance Only the rows needed
for one group are kept in memory
All aggregations are done using RAM, not disk cache
You need to pre-sort the data using either a SQL override in your source qualifier, a Sorter Transformation or pre-sorted flat files
You need to sort the data in the order of the ‘group by’ ports in your aggregator The Informatica Server will fail the session if the data is not sorted in strict ascending
or descending order The Informatica Server will revert to unsorted input behavior when:
There is no ‘group by’ port in your aggregator You select ‘Incremental Aggregation’ in the aggregator properties You have nested aggregate functions in your aggregator’s expressions
www.infocrest.com 1611262004
Copyright ©2004
Read
Transform
Write
Aggregator Why Sorted Input Improves Performance
Informatica Server processes data in buffered stages Read, Transform and Write stages normally operate simultaneously, feeding each
others buffers An unsorted Aggregator is a bottleneck, as the write stage must wait until all
aggregation is done before processing any row
Read
Transform
Write
unsorted sorted
Elapsed Time Elapsed Time
www.infocrest.com 1711262004
Copyright ©2004
Aggregate Functions Nesting functions
You can nest aggregate functions as in MAX(SUM(order_amount)) The Informatica server ignores ‘sorted input’ when you nest aggregate functions You cannot mix nested and non-nested functions in the same aggregator
Conditional clause
Use instead of an IIF statement for aggregate expressions Syntax is : FUNCT(input, [Boolean expression]) Example: AVG (price, price > 2000), computes the average price value when price is above
2000 The input will be taken into account for the aggregation if and only if the Boolean expression
returns TRUE
Availability
With output ports only, no variable port can have an aggregate expression
www.infocrest.com 1811262004
Copyright ©2004
Methods Using the first, last, min or max functions with a conditional clause
For simple cases Using local variables and forward and self reference techniques
For more complex aggregations or concatenation of string values Limitations
Use only when the input has a known number of repeating rows or a know maximum number of repeating rows
De-Normalizing
BookID Language
1324 English
1324 French
1324 Spanish
1325 English
BookID EnglishFlag FrenchFlag SpanishFlag
1324 Y Y Y
1325 Y N N
Definition Pivot table to arrange data in multiple field in a single row rather than one field in multiple
rows
Pivot
www.infocrest.com 1911262004
Copyright ©2004
De-Normalizing
Using Aggregate Functions Use with a conditional expression to extract the value of the first or last row that matches the
condition
Amount Year Quarter
254,556 2003 first
546,332 2003 second
129,034 2003 third
490,223 2003 fourth
165,768 2004 first
265,443 2004 second
510,412 2004 third
435,690 2004 fourth
Year Q1_Amount Q2_Amount Q3_Amount Q4_Amount
2003 254,556 546,332 129,034 490,223
2004 165,768 265,443 510,412 435,690
In this case, it does not matter what aggregate expression is used. Last, Max or Avg would do just as well.
www.infocrest.com 2011262004
Copyright ©2004
De-Normalizing
Group by bookid
Use forward reference technique to identify new groups Use expressions to set each
language flag. Set flag to ’Y’ if book is published in that language. Uses self-reference to keep the previous setting unless it is a new groupOutput ports hold the
value of the flag for each group
Using Local Variables Local variables inside an aggregator are evaluated at each row In the debugger, you can only see values for each group, if the aggregator is sorted Local variables are needed for custom aggregation, like string concatenations
www.infocrest.com 21
11262004
Copyright ©2004
Lookup Transformation
Lookup Caches1 - Informatica engine issues a SELECT statement against the database when the first row arrives at the lookup. The result set is split into 2 caches
2 - Data cache holds the columns in the output ports
3 - Index cache holds the columns used in the lookup condition
4 - Caches are queried internally for each row passing through the lookup. The query is based on the lookup condition and the values of the lookup input ports
www.infocrest.com 23
11262004
Copyright ©2004
TRANSF_1_1_1> DBG_21097 Default sql to create lookup cache: SELECT OrderID,CustomerID FROM Orders ORDER BY CustomerID,OrderIDTRANSF_1_1_1> DBG_21079 Creating Lookup Cache : (Thu Sep 02 07:58:34 2004)TRANSF_1_1_1> DBG_21682 Lookup table row count : 830TRANSF_1_1_1> DBG_21297 Lookup cache row count : 830
Lookup query Always look at the session log and review the SQL query issued to
populate lookup caches This query always includes an ORDER BY clause.
This clause is generated automatically by Informatica and cannot be modified directly
Since every column is included in this ORDER BY, it can become very expensive
The only way around is to override the SQL query, add your own ORDER BY and finish the statement with a comment delimiter (such as -- )
Your custom ORDER BY must include the columns used in the lookup condition, in the order their appear in the condition tab
The engine expects rows to be sorted in the caches, so an ORDER BY clause is mandatory
Session log extract
Lookup Transformation
www.infocrest.com 24
11262004
Copyright ©2004
Lookup query If you only need a subset of the lookup table, override the SQL and write a WHERE
clause You can perform joins in the SQL override as long as the joined tables come from
the same database You must use the syntax SELECT <column name> AS <port name>. Your session
will fail otherwise even though the query may validate in the transformation
Use this field to set the lookup condition: OrderCount >= Zero and leave the ‘Zero’ port unconnected
You can use mapping variables and parameters in your override, for dynamic queries
You have to cache the lookup when you have a SQL override, otherwise you will get a run-time error
Lookup Transformation
www.infocrest.com 25
11262004
Copyright ©2004
Lookup Transformation
Sharing Lookup Caches Un-named
For multiple lookups on the same table in the same mapping Automatically shared if condition ports are the same (although operators can
be different) The first lookup transformation must extract the same data set as subsequent
lookups or extract a superset Cannot share a dynamic lookup
Named To share lookups among sessions Lookup structures must be identical (same conditions, same output ports) SQL overrides must be identical Session’s high precision setting must be the same for all sessions sharing the
lookup files You mix dynamic and static shared lookups, but you can’t run them in
simultaneous sessions Tip
Make your shared lookup transformation reusable
www.infocrest.com 26
11262004
Copyright ©2004
Lookup Transformation Shared Lookup Usage
Data Warehousing Dimensions may need to be looked up several times during the load process Optimize your workflow by having the first lookup create a named persistent lookup file Alter your other lookups to read from these cache files
Debugging Sessions with large lookups are time-consuming to debug If you can’t use a subset of the lookup data, make the cache persistent The first run will have to create the cache files but subsequent runs will be much faster
Example: Orders lookup used 3 times, in sessions 1, 2 and 3
First session must re-cache from the source
www.infocrest.com 27
11262004
Copyright ©2004
Dynamic Lookups Dynamic Lookups
Use when looking up a target table, when you need to keep lookup cache and target contents in sync
Dynamic lookups can insert new rows or update existing rows in the cache (but cannot delete rows)
Updated, inserted or unchanged rows are flagged by the lookup transformation via the NewLookupRow port
The row type is not changed by the dynamic lookup, you need to set up Update Strategies downstream to change the actual row type
Ports
Source fields
Lookup fields
Lookup fields not used in the condition must be manually associated with their corresponding source field
Select ignore in comparison for fields you don’t need to compare to determine the update status
Select ignore null in comparison so null source values don’t trigger an update
www.infocrest.com 28
11262004
Copyright ©2004
Dynamic Lookups Ports (cont.)
You can associate a lookup field holding a surrogate key (integer or small integer) with the special Sequence-ID value
The startup value of Sequence-ID is the highest field value found in the initial cache data for the surrogate key field or 1 if the initial cache is empty
This value is incremented every time a new row is added to the lookup cache You can use this feature in-lieu of a Sequence Generator transformation
Select Sequence-ID from the drop-down menu
NewLookupRow Port Pass this field to a router to branch out the data flow based on the insert/update/no-change
status 3 possible values:
0 = no change 1 = row inserted in the lookup cache 2 = row updated in the lookup cache
www.infocrest.com 29
11262004
Copyright ©2004
Dynamic Lookups Properties
Lookup caching must be enabled
Toggle dynamic lookup. Automatically adds the NewLookupRow port when on
When the lookup updates a row in the cache, you can choose to return either old (before the update) or new (after the update) values. Useful for slowly changing dimensions, when you need both values.
www.infocrest.com 31
11262004
Copyright ©2004
Properties (cont.)
Updating the cache Lookup cache can be updated if the incoming source row exists in the cache but the values
of one or more fields differ between the source row and the cached row By default, the dynamic lookup does not perform both insert and update operations on the
lookup cache You have to turn on ‘insert else update’ or ‘update else insert’ properties to enable this
feature, depending on the row type coming into the transformation
Insert Else Update When the row type is ‘insert’ (the default) Off: lookup cache is not updated, only new source rows are inserted into the cache On: both inserts and updates are performed
Update Else Insert When the row type is ‘update’ (you need an Update Strategies transformation upstream to
change the row type to update). Off: no lookup cache inserts are performed, only updates On: both inserts and updates are performed
Dynamic Lookups
www.infocrest.com 32
11262004
Copyright ©2004
Caveats SQL override with a WHERE clause
When using a subset of data for building the lookup cache Lookup cache and target may get out of sync if your source stream does not use the same
filter as your WHERE clause
Dynamic Lookups
SQL override includes a WHERE clause to exclude some productID
Filter must exclude the same productIDs from the source
When using the Sequence-ID feature to generate surrogate keys, you may get duplicate key values if your lookup cache does not hold the entire data set
Ignore NULLS in comparison You may need an Expression Transformation after the dynamic lookup to make sure you
are not passing NULL values from the source into the target Speed Penalty
Dynamic lookups are a bit slower than regular lookups because they have to update their cache contents on the fly
Biggest hit when the lookup cache is paging to disk and there are many updated or inserted rows
www.infocrest.com 3311262004
Copyright ©2004
Transaction Control Overview
Active (row level) transformation Enable dynamic commit or rollback points Works for relational targets or MQ series targets Naming convention: TC_<what it does>
Iconic view
www.infocrest.com 3411262004
Copyright ©2004
Transaction Control Setting Up the Mapping
Define a Boolean condition expression and a transaction type.Transaction type will be executed when the expression evaluates to TRUE
Transaction constants TC_COMMIT_BEFORE, TC_ROLLBACK_BEFORE:
Commit /rollback rows for the current transaction, not including the current row. Initiate a new transaction starting at the current row
TC_COMMIT_AFTER, TC_ROLLBACK_AFTER: Commit/rollback rows for the current transaction including the current row. Will Initiate a new transaction starting after the current row
TC_CONTINUE_TRANSACTION Do not commit or rollback at this point
www.infocrest.com 3511262004
Copyright ©2004
Transaction Control Setting Up the Session
Set to user defined automatically when you have a valid transaction control transformation
You can tell the server to rollback transactions from the last commit point when it encounters non-fatal errors
Checked by default, otherwise server rollbacks left over transactions if the last row does not coincide with a commit point
www.infocrest.com 3611262004
Copyright ©2004
Transaction Based Processing
Transformation Scope Property For Aggregator, Sorter, Joiner and Rank transformation
If ‘Transaction’ is selected, the transformation logic will only apply to rows within the same transaction
Target A ‘shipments’ table, containing one row per order per ship date
Challenge You need to aggregate the data by order id and ship date to populate this table The file is huge and you want to avoid either sorting the entire data set in the aggregator or sorting the data
in a pre-process
Transaction Based Solution The data is already sorted by order id. Any additional sorting can be done within the relatively small set of
order lines that comprise each order. Each set of line items will trigger a new transaction, and the scope of the sorter and the aggregator will be a
single transaction
Source: A very large flat file containing order details, at the line item level This file is already sorted by order id
Example scenario
www.infocrest.com 3711262004
Copyright ©2004
Transaction Processing Example
Mapping
Expression to detect a change in order ids. Compare the current order id with the previous order id, a change means we have the first row of a new order
Transaction control, set the type to TC_COMMIT_BEFORE every time a new order id is encountered
Sort by order id and ship date, scope set to ‘Transaction‘
Aggregate by order id and ship date, sorted input is On
Transaction control, to reset the commit point to target based
Sample input, sorted by order id but not by ship date
Sample output, one row per distinct order id and ship date, ordered by order id and ship date
www.infocrest.com 39
11262004
Copyright ©2004
Sorted Input Not selected
Cache the Master table rows in memory (or disk)
Brings in detail rows one at the time and compares them to the master rows depending on your join type and condition
You get better performance if the master table is the smallest of the two sources even if, logically this table is not the master
Selected Works only if both sources are pre-sorted Saves time by not having to cache the
entire master source If your sources are not sorted, you can use
a SQL override or a Sorter transformation before the Joiner
Pulls rows from both sources and compares them using the join type and condition
Fastest way to join if your sources are already sorted, otherwise weigh in the cost of using a database sort or a Sorter transformation
Joiner
www.infocrest.com 40
11262004
Copyright ©2004
Caching
TRANSF_1_1_1> DBG_21077 Create joiner cache on master relation : (Wed Sep 08 05:31:04 2004)TRANSF_1_1_1> CMN_1690 Created new data file [D:\Program Files\Informatica PowerCenter 7.1\Server\Cache\PMJNR530_19.dat] and index file [D:\Program Files\Informatica PowerCenter 7.1\Server\Cache\PMJNR530_19.idx] for Joiner [JNR_OldNew].TRANSF_1_1_1> DBG_21214 Finished joiner cache on master relation : (Wed Sep 08 05:31:04 2004)TRANSF_1_2_1> DBG_21603 Open master relation cache for detail joiner : (Wed Sep 08 05:31:04 2004)TRANSF_1_2_1> DBG_21215 Finished processing detail relation : (Wed Sep 08 05:31:04 2004)
Monitor caches files on server while session is running
Use RAM cache only for best performance
Specify a cache directory on a separate disk to reduce i/o contentions
Joiner
www.infocrest.com 41
11262004
Copyright ©2004
Joiner
Self-Join
Join two instances of the same source
Join two branches of the same pipeline
Aggregate dollars per company
Join source data to aggregated data
www.infocrest.com 42
11262004
Copyright ©2004
Joiner
Self-Join To join two branches of the same pipeline, you must use sorted data and turn on the
sorted input property If you cannot use sorted data, you must read your source twice and join the two
pipelines You can also join output from multi-group transformations such as Router, XML
source qualifier or Custom transformation If you join 2 branches with an Aggregator in one branch, the Aggregator must use
sorted input as well
www.infocrest.com 4311262004
Copyright ©2004
Stored Procedure Pre and Post Load Stored Procedures
Connections $Source alias is automatically selected for Source Pre
and Post-Load types $Target alias is selected for Target Pre and Post Load
types When changing from a Source to a Target type, the
corresponding connection alias is restored, overwriting any custom relational connections you might have selected.
Call Text Accepts only hard-coded values, mapping parameters or variables will not be
expanded Use Pre/Post SQL if you want dynamic stored procedure calls Use double quotes if your input parameters include spaces; single quotes are
ignored
Properties tab, Stored Procedure Type
www.infocrest.com 4411262004
Copyright ©2004
Stored Procedure Pre and Post Load Stored Procedures
Execution Plan To select the execution order of procedures having the same type Interface is similar to the Target Load Plan function
1- Choose Stored Procedures Plan… in the Mappings menu
3- The Execution order property is updated to reflect the execution plan
2- Move stored procedures within a group using the up and down arrows
www.infocrest.com 4511262004
Copyright ©2004
Stored Procedure Pre and Post Load Stored Procedures
Source Pre Load and Target Pre Load types Run before the mapping starts reading rows Similar to Pre SQL statement
Source Post Load and Target Post Load types Run after the mapping has finished writing to the target(s) Similar to Post SQL statements
Debugging You need to use a session instance (within a workflow) to have the debugger
execute your Pre/Post Load procedures
Under Config Objects Error Handling
Session error handling Either Stop or Continue If you choose Stop and your session fails on a Post Load procedure, your target
rows are already committed
www.infocrest.com 4711262004
Copyright ©2004
Stored Procedure Calling Normal Unconnected Stored Procedures
Using PROC_RESULT, one output parameter, return value is lost
Using a local variable, one output parameter, return value is kept Local variable and associated output parameter must have the same data-
type
Using PROC_RESULT and a local variable, two output parameters
www.infocrest.com 4811262004
Copyright ©2004
Calling Normal Unconnected Stored Procedures
Nesting Stored Procedure calls using PROC_RESULT The output of the innermost procedure is passed as an input parameter to the
outermost procedure You can use PROC_RESULT once per procedure call
Within a conditional expression Your expression must evaluate to TRUE for the procedure to be called
Within an expression attached to an output port The port must be connected forward to the data flow or your procedure will not
run
Stored Procedure
www.infocrest.com 4911262004
Copyright ©2004
Normal Stored Procedures Session Error Handling
Fatal errors in connected or unconnected normal procedures will always cause the session to fail
Non-fatal errors will increment the session error counter. Your session will fail when this counter reaches the threshold set in the session properties
Non-fatal error rows are skipped by the server and written the the session log and reject file
Stored Procedure
www.infocrest.com 50
11262004
Copyright ©2004
Debugger Tips
www.infocrest.com 51
11262004
Copyright ©2004
Reusable sessions Create a reusable session to debug your mapping when practical
Use for both debugger and workflow Cuts down on development time, specially if you have lots of sources and
targets Many parameters are not available in debugger sessions
Debugger Tips
Using Existing Sessions Choose a session instance to run the debugger within the context of a
workflow If you want to test mapping or workflow parameters and variables If you want your debug session to execute Pre or Post Load stored procedures
Choose a session instance or a reusable session to test all session attributes not available with a simple debug session
To run session components such as command tasks or emails
www.infocrest.com 52
11262004
Copyright ©2004
Drawback Remember to validate and save the reusable session when you make a big
change in your mapping Adding source or target Adding transformation
Sometimes, you’ll have to disconnect and reopen your folder in the Workflow Manager to register mapping changes
Beware of overridden parameters, like SQL overrides Session override takes precedence over the mapping Session instance overrides take precedence over the reusable session
object
Configuration objects You can’t specify a line buffer size or a block buffer size in a debug session But you can create a configuration object with the settings you want and use it
with your debug session
Debugger Tips
www.infocrest.com 53
11262004
Copyright ©2004
Workspace Organize your workspace to display
Debug & session logs at the bottom, using full window length Transformation instances and target instance above, side by side
Debugger Tips
First row Monitor the time it takes to bring in the first row of data
It is acceptable ? If not, review the SQL query Or, a flat file source may be rejecting all the rows
Line buffer too smallWrong date/time conversions
Remember to switch to session log pane after the debug session is initialized
www.infocrest.com 55
11262004
Copyright ©2004
Source data Always examine source data closely during the first run
Look for bad or unexpected data formats (date/time, amounts) Look for truncated data Make sure the data is indeed what you expect
Data movement Follow data through each transformation during the first run
Pay attention to unnecessary type conversions and back and forth type conversions
Verify the logic in each complex expression Look at how the data moves within your mapping
Do you have too many data paths ? It there a way to simplify the logic ?
Record the time it takes to load cached lookups Review the SQL query in the session log Do you need to bring in all the fields in the lookup ? You may want to override the ORDER BY clause inserted automatically by the server Do you have a complex join condition ?
» Make sure the conditions with an equal sign ‘=‘ are put at the top of the list
Debugger Tips
www.infocrest.com 56
11262004
Copyright ©2004
Evaluate Expression Find out the value of a port, a variable or enter your own
expression to evaluate Not available for every transformation, only for:
Aggregator Expression Filter Rank Router Update Strategy
Results are displayed in between square brackets as in [VALUE] Easy to spot unexpected padded char values, such as [VALUE ] Unexpected padded values cause trouble in comparisons
Lookups Expressions
Find out the value of a mapping parameter or variable Start value, enter the name of the variable or parameter Current value, use the variable or parameter with a variable function
SETMAXVARIABLE SETMINVARIABLE SETCOUNTVARIABLE
Debugger Tips
www.infocrest.com 57
11262004
Copyright ©2004
Debugger shutdown If running, click first on the ‘Break Now’ icon for a faster response
Discard or Not Discard ? When discarding target data, the Informatica server does NOT
execute pre or post SQL commands against the target database write sequence generator values to the repository truncate target tables prepare insert or delete SQL statements against the targets verify if the target tables actually exist in the target database
Choose to write target data for a true test of the target system The debugger only commits rows to the database at the end of a completed
(successful) debug session
Debugger Tips
www.infocrest.com 58
11262004
Copyright ©2004
Mapping Error Handling
www.infocrest.com 59
11262004
Copyright ©2004
1. Guidelines
2. Overview of most common errors
3. Handling data errors
4. Handling duplicate key errors
5. Example of an error table
6. Performing audits
Error Handling
www.infocrest.com 60
11262004
Copyright ©2004
Data quality Develop a set of standards or adopt existing company standards Standards should include
Definition of an error What data gets rejected and why
Error handling procedures in your mappings and sessions How data gets rejected or corrected
What to do with rejected data Ignore Save for manual review Save for automated reload
Define your error strategy Errors are inevitable A good error strategy tries to:
Catch errors and reject bad rows Store these rows somewhere Set a threshold for an acceptable number of errors Provide a way to report on rejected rows as part of the load process
Error Handling Guidelines
www.infocrest.com 61
11262004
Copyright ©2004
Data errors Incorrect or corrupted source data Unexpected values Data conflicts with business rules Referential integrity failure
Database errors Row violates primary key constraints Inserting a NULL value in a non null field
Session errors Exceeds error threshold Wrong connection settings or line buffer too small Insufficient resources (disk space or memory)
Dependencies errors A session or a job on which your session depends did not run properly Missing trigger file
Time constraints The load process did not terminate within the allotted time window
Audit/Balance Row counts or sum totals do not match Data mismatch between production systems and data warehouse
Server errors Informatica server or database server goes down Bad network connections Hardware failure
Typical Errors
www.infocrest.com 63
11262004
Copyright ©2004
Handling in mappings At the field level
Program defensively, don’t leave open doors or loose ends Handle NULL values, blanks or zeroes consistently and in accordance to your error
strategy Use consistent default values for cases such as
Missing or blank data Invalid or NULL lookup return value Invalid or corrupted source data
Make sure these default values are understood and approved by your users Override the default ERROR() function for output ports
Use a constant value or expression
At the row level Use a custom ERROR() function
To force a row to be skipped To describe the details of an error condition in the session log Beware, this method can throw off a Joiner Transformation downstream Rows rejected this way will not be written to the reject file
Transfer rejected rows to a dedicated error table or error file For automated reloads For audit purposes Most flexible solution but also the most expensive in terms of work and processing
overhead
Data Errors
www.infocrest.com 64
11262004
Copyright ©2004
Informatica Server behavior During normal load (row by row)
Rows that violate database constraints will be rejected by the database Rejected rows are written to the reject file (.bad) Details also appear in the session log (normal tracing) Error handling is automatic but there is a big performance hit if there is more than a few errors
Using External loaders Load may succeed but indexes will be left in unusable state Rejected rows will be written to the .ldrreject file Error details may be found in the .ldrlog file
Handling If performing full refreshes
whenever possible, eliminate duplicates at the source, before the load If performing updates
Load in normal mode and let Informatica handle it Run post process audit to correct rejected rows
Alternative, when performing full refreshes or updates Be proactive and catch potential duplicates inside your mapping Most expensive solution, reserve for critical tables
Duplicate Keys
www.infocrest.com 65
11262004
Copyright ©2004
Duplicate Keys
Mapping solutions for full refresh
1 - Relational source, duplicate rows routed to an error table
2 - Flat file source, duplicate rows destroyed, last row of duplicate group sent to the target
Normal mapping processing goes here…
Normal mapping processing goes here…
www.infocrest.com 66
11262004
Copyright ©2004
Create a versatile error table Should provide storage for all types of audit errors
Primary key errors Data errors Row count errors Control totals errors
Should be reused by all mappings that perform audits
Not tied to one particular source or target table format
One row per error All data values displayed as string Includes a concatenated value of the primary key
fields to reference the source or target system Source and target names are fully qualified Identifies the process (mapping) that wrote rows to
the table
Audits
Run an automated report After each load
Total number of errors per error type List of sources and targets having triggered an error
Truncate the error table before each load If you need to keep data for several days, archive the daily table at the end of each
load process
Field name Data type Precision
SourceName String 32
TargetName String 32
SourceFieldName String 32
SourceFieldValue String 500
TargetFieldname String 32
TargetFieldValue String 500
PrimaryKeyConcat String 100
ErrorType String 16
AuditProcess String 64
InsertDate Date/time 19
www.infocrest.com 67
11262004
Copyright ©2004
Row by row audit For critical tables, when extracting data from production systems to data warehouse
Audits
Source must be unchanged or staged for the audit to work
Joiner needed only when source and target in different databases or source is a flat or VSAM file
www.infocrest.com 68
11262004
Copyright ©2004
Workflow Manager
www.infocrest.com 69
11262004
Copyright ©2004
Assembling Workflows Step 1
Design flow, analyze dependencies and devise error strategy
Step 2 Gather unit-tested mappings into reusable sessions
Step 3 Assemble load stages and/or subject areas into worklets
Step 4 Assemble worklets into workflow Implement workflow level extra functionality
Step 5 Implement trigger mechanisms and scheduler
Step 6 Test worklets, workflow, triggers and scheduler
www.infocrest.com 71
11262004
Copyright ©2004
Step 1 - Design Large Workflows
Complex workflows handling complete loads offer more flexibility and functionality than a collection of small workflows
The new workflow manager interface makes it easy to design large and complex load processes
The workflow manager workspace window is self documenting The workflow monitor will give you a very good idea of the load’s progress, using the
Gantt chart view
Exception If you have an external scheduler, you can either:
Run single session workflows and let the scheduler handle Informatica job dependencies
Run more complex workflows from your scheduler and let the workflows handle the job dependencies
www.infocrest.com 72
11262004
Copyright ©2004
Step 1 - Design The Big Picture
Analyze your sources and targets to determine your dependencies at the mapping level
Devise triggering mechanisms to start the process when the sources are ready Use a diagram to visualize the flow of data Design your error strategy at the session level and workflow level
What to do when a session fails Audit and balancing procedures to capture data discrepancies
Think about production issues Who will get noticed when an error occurs and by what means Design restart-ability into your workflows, worklets and mappings
www.infocrest.com 73
11262004
Copyright ©2004
Step 2 - Sessions Reusable Sessions
After a session is unit-tested, move it to the project folder as a reusable object
Reusable sessions are easier to manage Easy to copy between folders Can be exported/imported as XML objects Easy to integrate into worklets and nested worklets All reusable sessions conveniently listed in the Navigator, under the
‘Sessions’ folder.
Reusable session instance
www.infocrest.com 74
11262004
Copyright ©2004
Step 2 - Sessions Overriding Reusable Sessions Instances
Reusable sessions provides an additional level of override at the instance level, either in a workflow or a worklet
The instance override takes precedence over the master object If you change a property in the master object and that property is overridden at the
instance level, the override will still be in effect You must use the revert button to cancel an override, even if the value of the
overridden property is the same as the value in the master object Transformation properties cannot be overridden at the instance level
The target connection in this instance has been changed. Click Revert to return to the connection specified in the original session object
www.infocrest.com 75
11262004
Copyright ©2004
Step 2 - Sessions Error Strategies
Use a reusable session configuration to: set the error threshold set the number of session logs to save set the log tracing level…
2- Click to pick your reusable session config from a menu of available configurations
1- Define common error handling behavior in a session configuration object
www.infocrest.com 76
11262004
Copyright ©2004
Step 2 - Sessions Error Strategies
Use a reusable email objects in your session components to send pre-formatted messages on failure or success
1- Create an email task you can reuse for each session failure
2- attach this email to your sessions as a reusable component
www.infocrest.com 77
11262004
Copyright ©2004
Step 2 – Sessions And Tasks Error Strategies
Use task general properties to implement error handling at the task level
Fail parent… Simply marks the enclosing Workflow or Worklet as failed Does not stop or abort the enclosing Workflow or Worklet Use when your scheduler relies on a failed status code to flag an error at
the workflow level Fail parent if this task fails
Changes the end status of the enclosing Workflow to failed when the task fails
Fail parent if this task does not run Changes the end status of the enclosing Workflow to failed when the task
does not run For instance, if a link condition between the previous task and this task
evaluates to FALSE