xi3 ds performance optimization guide en
TRANSCRIPT
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 1/198
Data Services Performance
Optimization Guide
BusinessObjects Data Services XI 3.0 (12.0.0)
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 2/198
© 2008 Business Objects. All rights reserved. Business Objectsowns the followingU.S. patents, which may cover products that are offered and licensed by Business
Copyright
Objects: 5,555,403; 5,857,205; 6,289,352; 6,247,008; 6,490,593; 6,578,027;6,831,668; 6,768,986; 6,772,409; 6,882,998; 7,139,766; 7,299,419; 7,194,465;7,222,130; 7,181,440 and 7,181,435. Business Objects and the Business Objects
logo, BusinessObjects, Business Objects Crystal Vision, Business Process OnDemand, BusinessQuery, Crystal Analysis, Crystal Applications, Crystal Decisions,Crystal Enterprise, Crystal Insider, Crystal Reports, Desktop Intelligence, Inxight,the Inxight Logo, LinguistX, Star Tree, Table Lens, ThingFinder, Timewall, Letthere be light, Metify, NSite, Rapid Marts, RapidMarts, the Spectrum Design, WebIntelligence, Workmail and Xcelsius are trademarks or registered trademarks inthe United States and/or other countries of Business Objects and/or affiliatedcompanies. All other names mentioned herein may be trademarks of their respectiveowners.
Business Objects products in this release may contain redistributions of software
licensed from third-party contributors. Some of these individual components may
Third-party
Contributorsalso be available under alternative licenses. A partial listing of third-partycontributors that have requested or permitted acknowledgments, as well as requirednotices, can be found at: http://www.businessobjects.com/thirdparty
2008-03-12
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 3/198
ContentsWelcome to Data Services 9Chapter 1
Welcome....................................................................................................10
Documentation set for Data Services........................................................10
Accessing documentation..........................................................................12
Accessing documentation on Windows................................................13
Accessing documentation on UNIX......................................................13
Accessing documentation from the Web..............................................13
Business Objects information resources...................................................14
Environment Test Strategy 17Chapter 2
The source OS and database server.........................................................18
Operating system.................................................................................18
Database..............................................................................................18
The target OS and database server..........................................................19
Operating system.................................................................................19Database..............................................................................................19
The network...............................................................................................20
Data Services Job Server OS and job options..........................................20
Operating system.................................................................................20
Data Services jobs...............................................................................21
Measuring Data Services Performance 23Chapter 3
Data Services processes and threads.......................................................24Data Services processes .....................................................................24
Data Services threads..........................................................................25
Measuring performance of Data Services jobs..........................................25
Data Services Performance Optimization Guide 3
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 4/198
Checking system utilization..................................................................26
Analyzing log files for task duration......................................................30
Reading the Monitor Log for execution statistics ................................31
Reading the Performance Monitor for execution statistics...................32
To view the Performance Monitor.........................................................33
Reading Operational Dashboards for execution statistics...................34
To compare execution times for the same job over time......................34
Tuning Overview 37Chapter 4
Strategies to execute Data Services jobs..................................................38
Maximizing push-down operations to the database server..................38Improving Data Services throughput....................................................39
Using advanced Data Services tuning options.....................................40
Maximizing Push-Down Operations 41Chapter 5
Push-down operations...............................................................................42
Full push-down operations...................................................................42
Partial push-down operations...............................................................43
Operations that cannot be pushed down.............................................44Push-down examples................................................................................45
Scenario 1: Combine transforms to push down ..................................45
Scenario 2: Full push down from the source to the target....................46
Scenario 3: Full push down for auto correct load to the target ............46
Scenario 4: Partial push down to the source .......................................47
To view SQL...............................................................................................47
Data_Transfer transform for push-down operations..................................50
Scenario 1: Push down an operation after a blocking operation..........50
Scenario 2: Using Data_Transfer tables to speed up auto correctloads.....................................................................................................53
Database link support for push-down operations across datastores.........57
4 Data Services Performance Optimization Guide
Contents
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 5/198
Software support..................................................................................58
To take advantage of linked datastores................................................58
Example of push-down with linked datastores.....................................59
Generated SQL statements..................................................................60
Tuning performance at the data flow or Job Server level.....................61
Using Caches 63Chapter 6
Caching data..............................................................................................64
Caching sources...................................................................................65
Caching joins........................................................................................66
To change the cache type for a data flow.............................................66Caching lookups...................................................................................67
Caching table comparisons..................................................................68
Specifying a pageable cache directory................................................69
Using persistent cache..............................................................................69
Using persistent cache tables as sources ...........................................70
Monitoring and tuning caches....................................................................70
Using statistics for cache self-tuning ...................................................71
To have Data Services automatically choose the cache type..............71
To monitor and tune in-memory and pageable caches........................71
Using parallel Execution 77Chapter 7
Parallel data flows and work flows.............................................................78
Parallel execution in data flows.................................................................79
Table partitioning..................................................................................79
Degree of parallelism ..........................................................................86
Combining table partitioning and a degree of parallelism....................94
File multi-threading...............................................................................96
Data Services Performance Optimization Guide 5
Contents
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 6/198
Distributing Data Flow Execution 101Chapter 8
Splitting a data flow into sub data flows...................................................102
Run as a separate process option ....................................................103
Examples of multiple processes for a data flow.................................104
Data_Transfer transform....................................................................111
Examples of multiple processes with Data_Transfer.........................111
Using grid computing to distribute data flows execution..........................117
Server Group......................................................................................117
Distribution levels for data flow execution..........................................117
Using Bulk Loading 123Chapter 9
Bulk loading in Oracle..............................................................................124
Bulk-loading methods.........................................................................124
Bulk-loading modes............................................................................125
Bulk-loading parallel-execution options..............................................125
Bulk-loading scenarios.......................................................................126
Using bulk-loading options.................................................................127
Bulk loading in DB2 Universal Database.................................................129When to use each DB2 bulk-loading method.....................................130
Using the DB2 CLI load method.........................................................132
To configure your system to use the CLI load method.......................132
To use the CLI load method in a Data Services job...........................132
Using the DB2 bulk load utility............................................................133
To configure your system to use the load utility.................................133
To use the load utility in a Data Services job......................................136
Using the import utility .......................................................................137
Bulk loading in Sybase ASE....................................................................138
Bulk loading in Sybase IQ........................................................................138
Bulk loading in Teradata..........................................................................139
6 Data Services Performance Optimization Guide
Contents
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 7/198
When to use each Teradata bulk load method...................................139
How Data Services and Teradata use the file options to load............148
Using the UPSERT bulk operation.....................................................150
Warehouse Builder method................................................................151
Teradata standalone utilities ..............................................................157
Bulk loading in Netezza...........................................................................177
Netezza bulk-loading process............................................................177
Options overview................................................................................178
Netezza log files.................................................................................179
Configuring bulk loading for Netezza ................................................179
To configure the target table...............................................................180
Other Tuning Techniques 183Chapter 10
Source-based performance options........................................................185
Join ordering.......................................................................................185
Minimizing extracted data...................................................................188
Using array fetch size.........................................................................189
To set the Array fetch size parameter ...............................................189
Target-based performance options..........................................................190
Loading method .................................................................................190
Rows per commit................................................................................191
Job design performance options.............................................................192
Loading only changed data ...............................................................192
Minimizing data type conversion........................................................192
Minimizing locale conversion..............................................................193
Precision in operations ......................................................................193
Index 195
Data Services Performance Optimization Guide 7
Contents
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 8/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 9/198
Welcome to Data Services
1
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 10/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 11/198
What this document providesDocument
Information for third-party developers to access DataServices functionality
Integrator's Guide
Information about how to use Data Services Adminis-trator
Management Console: Administrator
Guide
Information about how to use Data Services MetadataReports
Management Console: Metadata Re-
ports Guide
Release-specific product behavior changes fromearlier versions of Data Services to the latest release
Migration Considerations
Information about how to migrate from Data Quality
to Data ServicesMigration Guide
Information about how to improve the performanceof Data Services
Performance Optimization Guide
Detailed reference material for Data Services Design-er
Reference Guide
A compiled “master” PDF of core DataServices bookscontaining a searchable master table of contents andindex:
• Installation Guide• Getting Started Guide
• Designer Guide
• Reference Guide
• Management Console: Metadata Reports Guide
• Management Console: Administrator Guide
• Performance Optimization Guide
• Advanced Development Guide
• Supplement for J.D. Edwards
• Supplement for Oracle Applications
• Supplement for PeopleSoft
• Supplement for Siebel
• Supplement for SAP
Technical Manuals
A step-by-step introduction to using Data ServicesTutorial
Data Services Performance Optimization Guide 11
1Welcome to Data Services
Documentation set for Data Services
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 12/198
In addition, you may need to refer to several Adapter Guides andSupplemental Guides.
What this document providesDocument
Information about how to install, configure, and use the DataServices Adapter for JMS
JMS Adapter Interface
Information about how to install, configure, and use the DataServices Salesforce.com Adapter Interface
Salesforce.com Adapter
Interface
Information about license-controlled interfaces between DataServices and J.D. Edwards World and J.D. Edwards OneWorld
Supplement for J.D. Ed-
wards
Information about the license-controlled interface between Data
Services and Oracle Applications
Supplement for Oracle Ap-
plications
Information about license-controlled interfaces between DataServices and PeopleSoft
Supplement for PeopleSoft
Information about license-controlled interfaces between DataServices, SAP ERP and R/3, and SAP BI/BW
Supplement for SAP
Information about the license-controlled interface between DataServices and Siebel
Supplement for Siebel
Accessing documentation
You can access the complete documentation set for Data Services in severalplaces.
Note: For the latest tips and tricks on Data Services, access our KnowledgeBase on the Customer Support site at http://technicalsupport.businessob
jects.com. We have posted valuable tips for getting the most out of your Data
Services product.
12 Data Services Performance Optimization Guide
Welcome to Data Services1 Accessing documentation
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 13/198
Accessing documentation on Windows
After you install Data Services, you can access the documentation from theStart menu.
1. Choose Start > Programs > BusinessObjects XI 3.0 >BusinessObjects Data Services > Data Services Documentation.
Note: Only a subset of the documentation is available from the Startmenu. The documentation set for this release is available inLINK_DIR \Doc\Books\en.
2. Click the appropriate shortcut for the document that you want to view.
Accessing documentation on UNIX
After you install Data Services, you can access the online documentation bygoing to the directory where the printable PDF files were installed.
1. Go toLINK_DIR /doc/book/en/.
2. Using Adobe Reader, open the PDF file of the document that you wantto view.
Accessing documentation from the Web
You can access the complete documentation set for Data Services from theBusiness Objects Customer Support site.
1. Go to www.businessobjects.com
2. From the "Support" pull-down menu, choose Documentation.
3. On the "Documentation" screen, choose Product Guides and navigateto the document that you want to view.
You can view the PDFs online or save them to your computer.
Data Services Performance Optimization Guide 13
1Welcome to Data Services
Accessing documentation
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 14/198
Business Objects information resources
Customer support, consulting, and training
A global network of Business technology experts provides customer support,education, and consulting to ensure maximum business intelligence benefitto your business.
Useful addresses at a glance
ContentAddress
Information about the full range of
Business Objects products.
Product information
http://www.businessobjects.com
Business Objects product documenta-
tion, including the Business Objects
Documentation Roadmap.
Product documentation
http://www.businessobjects.com/sup
port
Send us feedback or questions about
your Business Objects documentation.
Documentation mailbox
documentation@businessob- jects.com
Information on Customer Support pro-
grams, as well as links to technical arti-
cles, downloads, and online forums.
Online Customer Support
http://www.businessobjects.com/sup
port
An online resource for sharing and
learning about Data Services with your
developer colleagues.
Online Developer Community
http://diamond.businessobjects.com/
Information about how Business Ob-
jects can help maximize your business
intelligence investment.
Consulting services
http://www.businessobjects.com/ser
vices/consulting/
Information on Business Objects train-
ing options and modules.
Education services
http://www.businessobjects.com/ser
vices/training
14 Data Services Performance Optimization Guide
Welcome to Data Services1 Business Objects information resources
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 15/198
Online Customer Support
The Business Objects Customer Support web site contains information aboutCustomer Support programs and services. It also has links to a wide range
of technical information including Knowledge Base articles, downloads, andsupport forums. http://www.businessobjects.com/support
Looking for training options?
From traditional classroom learning to targeted e-learning seminars, BusinessObjects can offer a training package to suit your learning needs and preferredlearning style. Find more information on the Business Objects Educationweb site: http://www.businessobjects.com/services/training
Send us your feedback
Do you have a suggestion on how we can improve our documentation? Isthere something that you particularly like or have found useful? Drop us aline, and we will do our best to ensure that your suggestion is considered for the next release of our documentation: [email protected].
Note: If your issue concerns a Business Objects product and not thedocumentation, please contact our Customer Support experts.
Data Services Performance Optimization Guide 15
1Welcome to Data Services
Business Objects information resources
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 16/198
16 Data Services Performance Optimization Guide
Welcome to Data Services1 Business Objects information resources
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 17/198
Environment Test Strategy
2
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 18/198
This section covers suggested methods of tuning source and target databaseapplications, their operating systems, and the network used by your DataServices environment. It also introduces key Data Services job execution
options.This section contains the following topics:
• The source OS and database server on page 18
• The target OS and database server on page 19
• The network on page 20
• Data Services Job Server OS and job options on page 20
To test and tune Data Services jobs, work with all four of these components
in the order shown above.In addition to the information in this section, you can use your UNIX or Windows operating system and database server documentation for specifictechniques, commands, and utilities that can help you measure and tune theData Services environment.
The source OS and database server
Tune the source operating system and database to quickly read data from
disks.
Operating system
Make the input and output (I/O) operations as fast as possible. Theread-ahead protocol, offered by most operating systems, can greatly improveperformance. This protocol allows you to set the size of each I/O operation.Usually its default value is 4 to 8 kilobytes which is too small. Set it to at least64K on most platforms.
Database
Tune your database on the source side to perform SELECTs as quickly aspossible.
18 Data Services Performance Optimization Guide
Environment Test Strategy2 The source OS and database server
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 19/198
In the database layer, you can improve the performance of SELECTs inseveral ways, such as the following:
• Create indexes on appropriate columns, based on your Data Services
data flows.• Increase the size of each I/O from the database server to match the OS
read-ahead I/O size.
• Increase the size of the shared buffer to allow more data to be cached inthe database server.
• Cache tables that are small enough to fit in the shared buffer. For example,if jobs access the same piece of data on a database server, then cachethat data. Caching data on database servers will reduce the number of I/O operations and speed up access to database tables.
See your database server documentation for more information abouttechniques, commands, and utilities that can help you measure and tune thethe source databases in your Data Services jobs.
The target OS and database server
Tune the target operating system and database to quickly write data to disks.
Operating system
Make the input and output operations as fast as possible. For example, theasynchronous I/O, offered by most operating systems, can greatly improveperformance. Turn on the asynchronous I/O.
Database
Tune your database on the target side to perform INSERTs and UPDATESas quickly as possible.
In the database layer, there are several ways to improve the performanceof these operations.
Here are some examples from Oracle:
Data Services Performance Optimization Guide 19
2Environment Test Strategy
The target OS and database server
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 20/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 21/198
• Consider a design that will run one al_engine process per CPU at atime.
• Tune the Job Server OS so that Data Services threads spread to all
available CPUs.
For more information, see Checking system utilization on page 26.
Data Services jobs
You can tune Data Services job execution options after:
• Tuning the database and operating system on the source and the targetcomputers
• Adjusting the size of the network buffer
• Your data flow design seems optimal
You can tune the following execution options to improve the performance of Data Services jobs:
• Monitor sample rate
• Collect statistics for optimization and Use collected statistics
Setting Monitor sample rate
During job execution, Data Services writes information to the monitor log fileand updates job events after processing the number of rows specified inMonitor sample rate. Default value is 1000. Increase Monitor sample rate
to reduce the number of calls to the operating system to write to the log file.
When setting Monitor sample rate, you must evaluate performanceimprovements gained by making fewer calls to the operating system againstyour ability to view more detailed statistics during job execution. With a higher
Monitor sample rate, Data Services collects more data before calling theoperating system to open the file, and performance improves. However, witha higher monitor rate, more time passes before you can view statistics during
job execution.
Data Services Performance Optimization Guide 21
2Environment Test Strategy
Data Services Job Server OS and job options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 22/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 23/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 24/198
This section contains the following topics:
• Data Services processes and threads on page 24
• Measuring performance of Data Services jobs on page 25
Data Services processes and threads
Data Services uses processes and threads to execute jobs that extract datafrom sources, transform the data, and load data into a data warehouse. Thenumber of concurrently executing processes and threads affects theperformance of Data Services jobs.
Data Services processes
The processes Data Services uses to run jobs are:
• al_jobserver
The al_jobserver initiates one process for each Job Server configured ona computer. This process does not use much CPU power because it isonly responsible for launching each job and monitoring the job's execution.
• al_engine
For batch jobs, an al_engine process runs when a job starts and for eachof its data flows. Real-time jobs run as a single process.
The number of processes a batch job initiates also depends upon thenumber of:
• parallel work flows
• parallel data flows
• sub data flows
For an example of the Data Services monitor log that displays the processes,see Analyzing log files for task duration on page 30.
24 Data Services Performance Optimization Guide
Measuring Data Services Performance3 Data Services processes and threads
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 25/198
Data Services threads
A data flow typically initiates one al_engine process, which creates onethread per data flow object. A data flow object can be a source, transform,or target. For example, two sources, a query, and a target could initiate four threads.
If you are using parallel objects in data flows, the thread count will increaseto approximately one thread for each source or target table partition. If youset the Degree of parallelism (DOP) option for your data flow to a valuegreater than one, the thread count per transform will increase. For example,a DOP of 5 allows five concurrent threads for a Query transform. To runobjects within data flows in parallel, use the following Data Services features:
• Table partitioning
• File multithreading
• Degree of parallelism for data flows
Related Topics
• Using parallel Execution on page 77
Measuring performance of Data Services jobs
You can use several techniques to measure performance of Data Services jobs:
• Checking system utilization on page 26
• Analyzing log files for task duration on page 30
• Reading the Monitor Log for execution statistics on page 31
• Reading the Performance Monitor for execution statistics on page 32
• Reading Operational Dashboards for execution statistics on page 34
Data Services Performance Optimization Guide 25
3Measuring Data Services Performance
Measuring performance of Data Services jobs
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 26/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 27/198
For example, if you run a job and see that the CPU utilization is very high,you might decrease the DOP value or run less parallel jobs or data flows.Otherwise, CPU thrashing might occur.
For another example, if you run a job and see that only half a CPU is beingused, or if you run eight jobs on an eight-way computer and CPU usage isonly 50%, you can be interpret this CPU utilization in several ways:
• One interpretation might be that Data Services is able to push most of the processing down to source and/or target databases.
• Another interpretation might be that there are bottlenecks in the databaseserver or the network connection. Bottlenecks on database servers donot allow readers or loaders in jobs to use Job Server CPUs efficiently.
To determine bottlenecks, examine:
• Disk service time on database server computers
Disk service time typically should be below 15 milliseconds. Consultyour server documentation for methods of improving performance.For example, having a fast disk controller, moving database server log files to a raw device, and increasing log size could improve diskservice time.
• Number of threads per process allowed on each database server operating system. For example:
• On HPUX, the number of kernel threads per process is configurable.The CPU to thread ratio defaults to one-to-one. Business Objectsrecommends setting the number of kernel threads per CPU tobetween 512 and 1024.
• On Solaris and AIX, the number of threads per process is notconfigurable. The number of threads per process depends onsystem resources. If a process terminates with a message like"Cannot create threads," you should consider tuning the job.
For example, use the Run as a separate process option to splita data flow or use the Data_Transfer transform to create two subdata flows to execute sequentially. Since each sub data flow isexecuted by a different Data Services al_engine process, thenumber of threads needed for each will be 50% less than in your previous job design.
Data Services Performance Optimization Guide 27
3Measuring Data Services Performance
Measuring performance of Data Services jobs
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 28/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 29/198
For example, if your data flow reads data from a table, joins it to a file,and then groups it to calculate an average, the group by operation mightbe occurring in memory. If you stage the data after the join and beforethe group by into a database on a different computer, then when a subdata flow reads the staged data and continues with the group processing,it can utilize memory from the database server on a different computer.This situation optimizes your system as a whole.
For information about how to stage your data, see Data_Transfer transform
on page 111. For more information about distributing sub data flows todifferent computers, see Using grid computing to distribute data flows
execution on page 117.
• Large amount of memory but it is under-utilized.
In this case, you might cache more data. Caching data can improve theperformance of data transformations because it reduces the number of times the system must access the database.
Data Services provides two types of caches: in-memory and pageable.For more information, see Caching data on page 64.
• Paging occurs.
Pageable cache is the default cache type for data flows. On Windowsand Linux, the virtual memory available to the al_engine process is 1.5gigabytes (500 megabytes of virtual memory is reserved for other engine
operations, totaling 2GB). On UNIX, Data Services limits the virtualmemory for the al_engine process to 3.5 gigabytes (500MB is reservedfor other engine operations, totaling 4GB). If more memory is neededthan these virtual memory limits, Data Services starts paging to continueexecuting the data flow.
If your job or data flow requires more memory than these limits, you cantake advantage of one of the following Data Services features to avoidpaging:
• Split the data flow into sub data flows that can each use the amount
of memory set by the virtual memory limits.
Each data flow or each memory-intensive operation within a data flowcan run as a separate process that uses separate memory from eachother to improve performance and throughput. For more information,see Splitting a data flow into sub data flows on page 102.
Data Services Performance Optimization Guide 29
3Measuring Data Services Performance
Measuring performance of Data Services jobs
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 30/198
• Push-down memory-intensive operations to the database server sothat less memory is used on the Job Server computer. For moreinformation, see Push-down operations on page 42.
Analyzing log files for task duration
The trace log shows the progress of an execution through each component(object) of a job. The following sample Trace log shows a separate ProcessID (Pid) for the Job, data flow, and each of the two sub data flows.
30 Data Services Performance Optimization Guide
Measuring Data Services Performance3 Measuring performance of Data Services jobs
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 31/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 32/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 33/198
Note: Memory statistics (Cache Size column) display in the PerformanceMonitor only if you select the Collect statistics for monitoring option whenyou execute the job.
The following sample Performance Monitor shows the following information:• The Query_Lookup transform used 110 kilobytes of memory.
• The first sub data flow processed 830 rows, and the second sub dataflow processed 35 rows.
To view the Performance Monitor
1. Access the Management Console with one of the following methods:
• In the Designer top menu bar, click Tools and select Data Services
Management Console.
• Click Start > Programs > BusinessObjects Data Services >Management Console.
2. On the launch page, click Administrator .
3. Select Batchrepository
4. On the Batch Job Statuspage, find a job execution instance.
5. Under Job Information for an instance, click Performance > Monitor .
Data Services Performance Optimization Guide 33
3Measuring Data Services Performance
Measuring performance of Data Services jobs
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 34/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 35/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 36/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 37/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 38/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 39/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 40/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 41/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 42/198
For SQL sources and targets, Data Services creates database-specific SQLstatements based on the data flow diagrams in a job. Data Services generatesSQL SELECT statements to retrieve the data from source databases. To
optimize performance, Data Services pushes down as many SELECToperations as possible to the source database and combines as manyoperations as possible into one request to the database. Data Services canpush down SELECT operations such as joins, Group By, and commonfunctions such as decode and string functions.
Data flow design influences the number of operations that Data Servicescan push to the database. Before running a job, you can view the SQL thatData Services generates and adjust your design to maximize the SQL thatis pushed down to improve performance.
You can use database links and the Data_Transfer transform to pushdownmore operations.
This section discusses:
• Push-down operations
• Push-down examples
• Viewing SQL
• Data_Transfer transform for push-down operations
• Database link support for push-down operations across datastores
Push-down operations
By pushing down operations to the source database, Data Services reducesthe number of rows and operations that the engine must retrieve and process,which improves performance. When determining which operations to pushto the database, Data Services examines the database and its environment.
Full push-down operations
The Data Services optimizer always first tries to do a full push-down operation. A full push-down operation is when all Data Services transform operationscan be pushed down to the databases and the data streams directly fromthe source database to the target database. Data Services sends SQL
42 Data Services Performance Optimization Guide
Maximizing Push-Down Operations5 Push-down operations
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 43/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 44/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 45/198
Push-down examples
The following are typical push-down scenarios.
Scenario 1: Combine transforms to push down
When determining how to push operations to the database, Data Servicesfirst collapses all the transforms into the minimum set of transformationsexpressed in terms of the source table columns. Next, Data Services pushesall possible operations on tables of the same database down to that DBMS.
For example, the following data flow extracts rows from a single source table.
The first query selects only the rows in the source where column A containsa value greater than 100. The second query refines the extraction further,reducing the number of columns returned and further reducing the qualifyingrows.
Data Services collapses the two queries into a single command for the DBMSto execute. The following command uses AND to combine the WHEREclauses from the two queries:
SELECT A, MAX(B), CFROM sourceWHERE A > 100 AND B = CGROUP BY A, C
Data Services can push down all the operations in this SELECT statementto the source DBMS.
Data Services Performance Optimization Guide 45
5Maximizing Push-Down Operations
Push-down examples
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 46/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 47/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 48/198
Data Services only shows the SELECT generated for table sources andINSERT INTO... SELECT for targets. Data Services does not show theSQL generated for SQL sources that are not table sources, such as:
• Lookup function
• Key_generation function
• Key_Generation transform
• Table_Comparison transform
4. Select a name from the list of datastores on the left to view the SQL thatthis data flow applies against the corresponding database or application.
The following example shows the optimized SQL for the second datastorewhich illustrates a full push-down operation (INSERT INTO... SELECT).This data flows uses a Data_Transfer transform to create a transfer tablethat Data Services loads directly into the target. For more information,see "Data_Transfer transform for push-down operations" on page 1667
48 Data Services Performance Optimization Guide
Maximizing Push-Down Operations5 To view SQL
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 49/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 50/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 51/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 52/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 53/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 54/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 55/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 56/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 57/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 58/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 59/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 60/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 61/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 62/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 63/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 64/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 65/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 66/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 67/198
3. On the General tab of the Properties window, select Pageable in thedrop-down list for the Cache type option.
Caching lookups
You can also improve performance by caching data when looking upindividual values from tables and files.
There are two methods of looking up data:
• Using a Lookup function in a query on page 67
• Using a source table and setting it as the outer join on page 68
Using a Lookup function in a query
Data Services has three Lookup functions: lookup, lookup_seq, andlookup_ext. The lookup and lookup_ext functions have cache options.Caching lookup sources improves performance because Data Servicesavoids the expensive task of creating a database query or full file scan oneach row.
You can set cache options when you specify a lookup function. There arethree caching options:
• NO_CACHE — Does not cache any values.
• PRE_LOAD_CACHE — Preloads the result column and compare columninto memory (it loads the values before executing the lookup).
• DEMAND_LOAD_CACHE — Loads the result column and comparecolumn into memory as the function executes.
Use this option when looking up highly repetitive values that are a smallsubset of the data and when missing values are unlikely.
Demand-load caching of lookup values is helpful when the lookup resultis the same value multiple times. Each time Data Services cannot findthe value in the cache, it must make a new request to the database for that value. Even if the value is invalid, Data Services has no way of knowing if it is missing or just has not been cached yet.
Data Services Performance Optimization Guide 67
6Using Caches
Caching data
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 68/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 69/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 70/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 71/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 72/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 73/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 74/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 75/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 76/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 77/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 78/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 79/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 80/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 81/198
At runtime, Data Services translates this data flow to:
The source threads execute in parallel and the Case, Merge, and targetsexecute in parallel.
Viewing, creating, and enabling table partitions
Oracle databases support range, list, and hash partitioning. You can importthis information as table metadata and use it to extract data in parallel. Youcan use range and list partitions to load data to Oracle targets. You can alsospecify logical range and list partitions using Data Services metadata for Oracle tables.
In addition, Data Services provides the ability to specify logical rangepartitions for DB2, Microsoft SQL Server, Sybase ASE, and Sybase IQ tables
by modifying imported table metadata.
Data Services uses partition information by instantiating a thread at runtimefor each partition. These threads execute in parallel. To maximizeperformance benefits, use a multi-processor environment.
To view partition information
1. Import a table into Data Services.
2. In the Datastores tab of the object library, right-click the table name and
select Properties.
3. Click the Partitions tab.
When you import partitioned tables from your database, you will find thesepartitions displayed on the Partitions tab of the table's Properties window.
Data Services Performance Optimization Guide 81
7Using parallel Execution
Parallel execution in data flows
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 82/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 83/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 84/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 85/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 86/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 87/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 88/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 89/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 90/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 91/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 92/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 93/198
• soundex()• is_group_changed()
• sql()• key_generation()
• sum()• mail_to()
• total_rows()• max()
To enable stored procedures to run in parallel
Use the Enable parallel execution option to set functions to run in parallelwhen the transforms in which they are used execute in parallel.
1. In the Datastores tab of the object library, expand a Datastore node.
2. Expand its Function node.
3. Right-click a function and select Properties.
4. In the Properties window, click the Function tab.
5. Click the Enable Parallel Execution check box.
6. Click OK.
To enable custom functions to run in parallel
1. In the Custom Functions tab of the object library, right-click a functionname and select Properties.
2. In the Properties window, click the Function tab.
3. Click the Enable Parallel Execution check box.
4. Click OK.
Tips
DOP can degrade performance if you do not use it judiciously. The bestvalue to choose depends on the complexity of the flow and number of CPUsavailable. For example, on a computer with four CPUs, setting a DOP greater
Data Services Performance Optimization Guide 93
7Using parallel Execution
Parallel execution in data flows
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 94/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 95/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 96/198
general rule. This produces a data flow without the Merge Round Robin Splitand each partition pipes the data directly into the consuming transform.
Two source partitions, DOP of two, and two target partitions
The best case situation is when the following conditions exist:
• The source and target are partitioned the same way.
• The source and target have the same number of partitions.
• DOP is equal to the same number of partitions.
When a source has two partitions, it replicates twice. Because the DOP valueis two, the query transform replicates twice. When a target has two partitions,it replicates twice. The following figure shows that each source partition feedsdirectly into a replicated query transform, and the output from each queryfeeds directly into a replicated target.
File multi-threading
You can set the number of threads Data Services uses to process flat filesources and targets. The Parallel process threads option is available onthe:
• File format editor
• Source file editor
• Target file editor
• Properties window of an SAP ERP or R/3 data flow.
Without multi-threading:
• With delimited file reading, Data Services reads a block of data from thefile system and then scans each character to determine if it is a column
96 Data Services Performance Optimization Guide
Using parallel Execution7 Parallel execution in data flows
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 97/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 98/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 99/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 100/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 101/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 102/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 103/198
Run as a separate process option
If your data flow contains multiple resource-intensive operations, you canrun each operation as a separate process (sub data flow) that uses separateresources (memory and computer) from each other to improve performanceand throughput. When youspecify multiple Run as separate process optionsin objects in a data flow, Data Services splits the data flow into sub dataflows that run in parallel.
The Run as a separate process option is available on resource-intensiveoperations that including the following:
• Hierarchy_Flattening transform
• Associate transform
• Country ID transform
• Global Address Cleanse transform
• Global Suggestion Lists transform
• Match Transform
• United States Regulatory Address Cleanse transform
• User-Defined transform
• Query operations that are CPU-intensive and memory-intensive:
• Join
• GROUP BY
• ORDER BY
• DISTINCT
• Table_Comparison transform
• Lookup_ext function
• Count_distinct function
Data Services Performance Optimization Guide 103
8Distributing Data Flow Execution
Splitting a data flow into sub data flows
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 104/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 105/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 106/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 107/198
Scenario 2: Run multiple sub data flows with DOP greater than
1
When Degree Of Parallelism (DOP) is set to a value greater than 1, eachtransform defined in the data flow replicates for use on a parallel subset of data. For more information, see Degree of parallelism on page 86.
Set DOP to a value greater than 1 on the data flow Properties window.
Data Services Performance Optimization Guide 107
8Distributing Data Flow Execution
Splitting a data flow into sub data flows
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 108/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 109/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 110/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 111/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 112/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 113/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 114/198
4. After you save the data flow and click ValidationDisplay Optimized
SQL..., the Optimized SQL window shows that the join between thetransfer table and source Orders table is pushed down to the database.
Data Services can push down many operations without using theData_Transfer transform. For more information, see Push-down operations
on page 42.
5. When you execute the job, the Trace Log shows messages such as thefollowing that indicate that Data Services created two sub data flows with
114 Data Services Performance Optimization Guide
Distributing Data Flow Execution8 Splitting a data flow into sub data flows
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 115/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 116/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 117/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 118/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 119/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 120/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 121/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 122/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 123/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 124/198
Data Services supports capabilities present in Oracle, Microsoft SQL Server,Informix, DB2, Netezza, Sybase ASE, Sybase IQ, and Teradata that enableyou to load data in bulk rather than using SQL statements. Some general
considerations when using bulk loading are:• You specify bulk-loading options on the Data Services target table editor
in the Bulk Loader Options tab.
• If you are using bulk loading, parallel loading is available only in the Oracleand Informix environment.
• When you use a template table, bulk loading is disabled.
Related Topics
• Reference Guide:Target
Bulk loading in Oracle
Data Services supports Oracle bulk loading.
Bulk-loading methods
You can bulk load to Oracle using an API or a staging file:
• If you select the API method, Data Services accesses the direct pathengine of Oracle's database server associated with the target table andconnected to the target database. Using Oracle's Direct-Path Load API,
124 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Oracle
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 125/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 126/198
Note: The Enable partitioning check box does not appear on the Options
tab if the target table is not partitioned.
Bulk-loading scenarios
With two bulk-loading methods, two load modes, and two parallel load options,there are several scenarios you can configure:
Parallel Load Op-
tionsLoad ModeMethodScenario
Enable partitions
is not selected(One loader isused by default)
Direct-path API1
Enable partitionsis selected
Direct-path API2
Number of load-ers = 1
Direct-pathFile3
Number of load-ers > 1
Direct-pathFile4
Enable partitionsis selected
Direct-pathFile5
Number of load-
ers = 1ConventionalFile6
Number of load-ers > 1
ConventionalFile7
126 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Oracle
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 127/198
Parallel Load Op-
tionsLoad ModeMethodScenario
Enable partitionsis selected
ConventionalFile8
Here are some tips for using these scenarios:
• The API method always uses the direct-path load type, and when it isused with a partitioned target table, Data Services processes loads inparallel. Data Services instantiates multiple loaders based on the number of partitions in a table. Each loader receives rows that meet the conditionsspecified by the partition.
• With the File method, direct-path is faster than conventional load, but theFile method is slower than using an API because of the need to generatea staging file, logs, and invoke Oracle's SQL*Loader.
• With the File method, when you use a value of greater than one for either the Number of Loaders or the Enable partitioning option, loads cannottruly run in parallel. The creation of a staging file and log for each loader is serialized.
Using bulk-loading options
As seen in the table on the previous page, there are many ways to set upbulk loading for an Oracle database. The following sections describe twoscenarios in detail.
Direct-path loads using Number of Loaders and File method
In the Options tab of the target table editor, when you enter a value for Number of loaders, Data Services instantiates multiple loaders. Each loader receives rows equal to the amount specified in the Rows per commit boxon the Bulk Loader Options tab. The loaders pipe rows to a staging file,then call the SQL*Loader to load the staging file contents into the table.
Data Services Performance Optimization Guide 127
9Using Bulk Loading
Bulk loading in Oracle
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 128/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 129/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 130/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 131/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 132/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 133/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 134/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 135/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 136/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 137/198
• Copy target directory—Defines the directory of the copy files when youenable both the database forward log recovery and select theRecoverable option. Data Services supports only the copy files optionfor the DB2 CLI load method.
• Data file on client computer —When you select this option, Data Serviceswrites to the data file on the DB2 client computer. Data Services doesnot need to FTP the data file because the DB2 client transfers the datadirectly to the DB2 server during the load process. To use this option:
• You must use DB2 Version 8.x or later.
• The target DB2 cannot be a DB2 enterprise (extended editionenvironment).
• The target table and database must not be partitioned.
• This option is only applicable if Data Services and DB2 are on differentservers.
When you execute the DB2 bulk load utility, DB2 automatically generatesthe following files:
• Local message file (named .log) in the bulk loader working directory. DB2writes output messages into this log file.
• "Dump" file (named .bad) in the DB2 server working directory. DB2 writesrejected input rows into this .bad file. If you clear the Data file on client
machine option, Data Services uses FTP to send the .bad file to the bulkloader working directory and deletes it after the load completessuccessfully.
Check the trace log to find either of these files.
Related Topics
• Reference Guide: Data Services Objects, Target tables (Teradata target
table options)
Using the import utility
Data Services also supports bulk loading in the DB2 Universal Database 5.2environment using the import utility. For Data Services to initiate DB2 bulkloading by this method directly, the Data Services Job Server and DB2 must
Data Services Performance Optimization Guide 137
9Using Bulk Loading
Bulk loading in DB2 Universal Database
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 138/198
be located on the same system. If they are not, use the following procedureto initiate bulk loading:
1. Generate a control file and data file. Check Generate files only in the
target table editor on the Bulk Loader Options tab.2. Manually move the control file and data file to the system where the target
database is located.
3. Start the execution of the bulk loader manually.
Bulk loading in Sybase ASE
Data Services supports bulk loading of Sybase ASE databases through the
Sybase ASE bulk copy utility. For detailed information about the Sybase ASEbulk loader options and their behavior in the Sybase ASE DBMS environment,see the relevant Sybase ASE product documentation.
Bulk loading in Sybase IQ
Data Services supports bulk loading Sybase IQ databases via the SybaseIQ LOAD TABLE SQL command. For detailed information about the SybaseIQ LOAD TABLE parameters and their behavior in the Sybase IQ databaseenvironment, see the relevant Sybase IQ product documentation.
For Sybase IQ version 12.6 only, if the job server and Sybase IQ databaseserver are on the same computer, you can use named pipes for bulk loadinginstead of files. Using named pipes eliminates disk access time and usuallyimproves the performance of the bulk loader.
If the job server and Sybase IQ database server are not on the samecomputer, Data Services generates a data file and transfers it via FTP to thedatabase computer for loading. You set FTP parameters in the Sybase IQdatastore editor.
After a job executes, Data Services stores the Sybase IQ message and rowlogs in the Bulk loader directory specified in the datastore editor (regardlessof the setting for the JS and DB on same machine option). A data file willalso be present if you do not use the named pipe option.
• message log—Records constraint violations specified in the Error
handling section of the target table Bulk Loader Options tab.
138 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Sybase ASE
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 139/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 140/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 141/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 142/198
RestrictionsAdvantagesFile OptionBulk loader
method
• User must pro-vide the load-ing script.
• Cannot useData Servicesparallel pro-cessing
Load utilities arefaster than IN-SERT statementsthrough the ODBCdriver.
File — Loads databy writing to a datafile that it passesto the Teradataserver.
Load utility
• User must pro-vide the load-ing script.
• Cannot useData Servicesparallel pro-
cessing• A job that uses
a generic pipeis notrestartable.
• Load utilitiesare faster thanINSERT state-
ments throughthe ODBC driv-er.
• Named pipesare faster thandata files be-cause:
• As soon asData Ser-vices writesto a pipe,
Teradatacan readfrom thepipe.
• On Win-dows, noI/O to an in-termediatedata file oc-curs be-cause a
pipe is inmemory
Generic named
pipe — Loads alarge volume of
data by writing toa pipe from whichTeradata reads.
Load utility
142 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 143/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 144/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 145/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 146/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 147/198
RestrictionsAdvantagesFile OptionBulk loader
method
• Load utilitiesare faster thanINSERT orUP-SERT state-ments throughthe ODBC driv-er.
• Named pipesshould befaster than da-ta files be-
cause:• As soon as
Data Ser-vices writesto a pipe,Teradatacan readfrom thepipe.
• On Win-dows, no
I/O to an in-termediatedata file oc-curs be-cause apipe is inmemory
• The job isrestartable.For details, see"Automatically
recovering jobs" in the Da-
ta Services De-
signer Guide.
• Data Servicescreates the
Data Services Performance Optimization Guide 147
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 148/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 149/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 150/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 151/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 152/198
When you use the Warehouse Builder method, you can leverage DataServices' powerful parallel processing capabilities to specify a maximumnumber of files or pipes for Data Services to use in processing large quantitiesof data. For details, see Parallel processing with Teradata Warehouse Builder
on page 154.
With Teradata Warehouse Builder, you can choose from four types of loadoperators. Each operator processes your data in a slightly different way.
DescriptionOperator
Loads a large amount of data at highspeed into an empty table on the
Teradata RDBMS. Use this operator when initially loading tables in thedata warehouse.
Load
Inserts data into a specified existingtable on the Teradata RDBMS. Asingle SQL session can insert data,while other operators require multiplesessions. Because it uses the fewestRDBMS resources, this is the least
intrusive and slowest method of loading data.
SQL Inserter
152 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 153/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 154/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 155/198
To use Warehouse Builder bulk loading with Data Services
parallel processing
1. In your target table Options tab, specify the Number of loaders to controlthe number of data files or named pipes. Data Services will write data tothese files in batches of 999 rows.
2. In the Bulk Loader Options tab, choose Warehouse Builder as your bulkloader.
3. In File Option, choose the type of file (Data File, Generic named pipe,or Named pipe access module) to contain the data to bulk load.
4. If you chose Data Fileor Generic named pipein File Option, specify thenumber of read and load instances in the loading scripts.
Data Services Performance Optimization Guide 155
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 156/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 157/198
Teradata standalone utilities
In addition to the Warehouse Builder (Parallel Transporter) interface, DataServices supports several Teradata utilities that load to and extract from theTeradata database. Each load utility is a separate executable designed tomove data into a Teradata database. Choose from the following bulk loader utilities:
DescriptionUtility
Loads large quantities of data into
populated tables. MultiLoad alsosupports bulk inserts, updates, up-serts, and deletions against populat-ed tables.
MultiLoad on page 158
Loads unpopulated tables only. Boththe client and server environmentssupport FastLoad. Provides a high-performance load (inserts only) toone empty table each session.
FastLoad on page 162
Uses standard SQL/DML to maintaindata in tables. It also contains amethod that you can use to specifythe percentage of system resourcesnecessary for operations on tables.
Allows background maintenance for insert, update, upsert, and deleteoperations to take place at any time
you specify. Used with small datavolumes.
TPump on page 165
Data Services Performance Optimization Guide 157
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 158/198
DescriptionUtility
Invokes one of the above utilities(MultiLoad, FastLoad, or TPump) withthe interface prior to Data Servicesversion 11.5.1.
Load on page 177
MultiLoad
To bulk load a table using the Teradata MultiLoad utility1. Ensure that your Teradata datastore specifies a value in TdpId (Teradata
Director Program Identifier). This option identifies the name of the Teradatadatabase to load and is mandatory for bulk loading.
2. In the Bulk Loader Options tab of your target table editor, chooseMultiLoad in the Bulk loader drop-down list.
3. In File Option, choose the type of file (Data File, Generic named pipe,or Named pipe access module) to contain the data to bulk load. Thedefault is Data File.
4. You can specify the following MultiLoad parameters:
Short descriptionMultiLoad parameter
The default is not to reduce printoutput.
Reduced print output
The default is not to encrypt allsessions.
Data Encryption
Particular mapping betweencharac-ters and byte strings (such as ASCIIor UTF-8).
Character set
158 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 159/198
For more information about these parameters, see the Teradata MultiLoadReference.
5. In Attributes, you can usually use the default settings for the following
attributes in the MultiLoad script that Data Services generates.
Short descriptionScript attribute
Table in which Teradata stores theload job status. Specify the restartlog table that will maintain thecheckpoint information for your MultiLoad job.
LogTable
Identifier, of up to 30 characters,associated with the user name thatwill logon to the Teradata database.
AccountId
Teradata uses this table to stageinput data.
WorkTable
Teradata uses this table to storeerrors that it detects during the ac-quisition phase of the MultiLoadimport task.
ErrorTable1
Teradata uses this table to storeerrors that it detects during the ap-plication phase of the MultiLoadimport task.
ErrorTable2
Data Services Performance Optimization Guide 159
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 160/198
Short descriptionScript attribute
Maximum number of rejectedrecords that Teradata can write tothe error table 1 during the acquisi-tion phase of the MultiLoad importtask. If used with ErrorPercentage,ErrorLimit specifies the number of records that must be sent to theTeradata database beforeErrorPer-
centage takes effect.
ErrorLimit
Approximate percentage (ex-pressed as an integer) of totalrecords sent so far (ErrorLimit) tothe Teradata database that the ac-quisition phase might reject.
ErrorPercentage
Interval between checkpoint opera-tions during the acquisition phase.Express this value as either:
• The number of rows read fromyour client system or sent to theTeradata database
• An amount of time in minutes
CheckpointRate
Maximum number of MultiLoadsessions for the load job.
MaxSessions
Minimum number of MultiLoadsessions required for the load jobto continue.
MinSessions
160 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 161/198
Short descriptionScript attribute
Number of hours that the MultiLoadutility continues trying to logonwhen the maximum number of load
jobs are already running on theTeradata database.
TenacityHours
Number of minutes that the Multi-Load utility waits before it retries alogon operation. The default is six
minutes.
TenacitySleep
Number of hours that MultiLoadcontinues trying to start when oneof the target tables is being loadedby some other job.
TableWait
Specifies how MultiLoad should re-spond when an Access Module
Processor (AMP) is down.
AmpCheck
Select IgnoreDuplicate to not placeduplicate rows in error table 2. Thedefault is to load the duplicate rows.
IgnoreDuplicate
Note: By default, Data Services uses the bulk loader directory to storethe script, data, error, log, and command (bat) files.
For more information about these parameters, see the Teradata MultiLoadReference.
6. If you specified Data file in File Option, you can increase the Number
of loaders in the Options tab which increase the number of data files.Data Services can use parallel processing to write data to multiple datafiles in batches of 999 rows.
Data Services Performance Optimization Guide 161
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 162/198
If you specified Generic named pipe or Named pipe access module, DataServices supports only one loader and disables the Number of loadersoption.
Related Topics
• Reference Guide: Data Services Objects, DBC driver manager for UNIX
• Reference Guide: Data Services Objects, Target tables (Teradata target
table options)
FastLoad
To bulk load a table using the Teradata FastLoad utility
1. Ensure that your Teradata datastore specifies a value in TdpId (TeradataDirector Program Identifier). This option identifies the name of the Teradatadatabase to load and is mandatory for bulk loading.
2. In the Bulk Loader Options tab of your target table editor, chooseFastLoad in the Bulk loader drop-down list.
3. In File Option, choose the type of file (Data File, Generic named pipe,or Named pipe access module) to contain the data to bulk load.
4. You can specify the following FastLoad parameters:
Short descriptionFastLoad parameter
Encrypt data and requests in allsessions used by the job.
The default is not to encrypt allsessions.
Data Encryption
Prints every request sent to the
Teradata database. The default isnot to reduce print output.Print all requests
162 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 163/198
Short descriptionFastLoad parameter
Number of kilobytes for the outputbuffer that FastLoad uses for mes-sages to the Teradata database.The default is 63 kilobytes which isalso the maximum size.
Buffer size
Particular mapping betweencharac-ters and byte strings (such as ASCIIor UTF-8).
Character set
For more information about these parameters, see the Teradata FastLoadReference.
5. In Attributes, you can usually use the default settings for the followingattributes in the FastLoad script that Data Services generates.
Short descriptionScript attribute
Identifier, of up to 30 characters,associated with the user name thatwill logon to the Teradata database.
AccountId
The number of rows sent to theTeradata database betweencheckpoint operations. The defaultis not to checkpoint.
CheckpointRate
Maximum number of rejectedrecords that Teradata can write tothe error table 1 while inserting intoa FastLoad table.
ErrorLimit
Data Services Performance Optimization Guide 163
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 164/198
Short descriptionScript attribute
FastLoad uses this table to storerecords that were rejected for errorsother than unique primary index or duplicate row violation.
ErrorTable1
FastLoad uses this table to storerecords that violated the uniqueprimary index constraint.
ErrorTable2
Maximum number of FastLoadsessions for the load job.
MaxSessions
Minimum number of FastLoad ses-sions required for the load job tocontinue.
MinSessions
Number of hours that the FastLoad
utility continues trying to logonwhen the maximum number of load jobs are already running on theTeradata database.
TenacityHours
Number of minutes that the Fast-Load utility waits before it retries alogon operation. The default is sixminutes.
TenacitySleep
Note: By default, Data Services uses the bulk loader directory to storethe script, data, error, log, and command (bat) files.
For more information about these parameters, see the Teradata FastLoadReference.
164 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 165/198
6. If you specified Data file in File Option, you can increase the Number
of loaders in the Options tab which increase the number of data files.Data Services can use parallel processing to write data to multiple datafiles in batches of 999 rows.
If you specified Generic named pipe or Named pipe access module, DataServices supports only one loader and disables the Number of loadersoption.
Related Topics
• Reference Guide: Data Services Objects, DBC driver manager for UNIX
TPump
To bulk load a table using the Teradata TPump utility
1. Ensure that your Teradata datastore specifies a value in TdpId (TeradataDirector Program Identifier). This option identifies the name of the Teradatadatabase to load and is mandatory for bulk loading.
2. In the Bulk Loader Options tab of your target table editor, choose TPump
in the Bulk loader drop-down list.
3. In File Option, choose the type of file (Data File, Generic named pipe,or Named pipe access module) to contain the data to bulk load.
4. You can specify the following TPump parameters:
Short descriptionFastLoad parameter
Reduce the print output of TPumpto the minimal information requiredto determine the success of the job.
The default is not to reduce printoutput.
Reduced print output
Data Services Performance Optimization Guide 165
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 166/198
Short descriptionFastLoad parameter
Keep macros that were createdduring the job run. You can usethese macrosas predefined macrosfor subsequent runs of the same
job.
Retain Macros
Encrypt data and requests in allsessions used by the job.
The default is not to encrypt all
sessions.
Data Encryption
Number of request buffers thatTPump uses for SQL statements tomaintain the Teradata database.
Number of buffers
Particular mapping betweencharac-ters and byte strings (such as ASCIIor UTF-8).
Character set
Configuration file for the TPump job.
Configuration file
Controls the rate at which TPumptransfers SQL statements to theTeradata database. Value can bebetween 1 and 600, which specifies
the number of periods per minute.The default value is 4 15-secondperiods per minute.
Periodicity value
166 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 167/198
Short descriptionFastLoad parameter
Turns on verbose mode which pro-vides additional statistical data inaddition to the regular statistics.
Print all requests
For more information about these parameters, see the Teradata ParallelData Pump Reference.
5. In Attributes, you specify Data Services parameters that correspond toTeradata parameters in TPump commands. You can usually use the
default settings for the following parameters in the TPump script that DataServices generates.
Short description
DataServices param-
eter in Attributes
pane
TPump command
Identifier, of up to 30characters, associated
with the user name thatwill logon to the Terada-ta database.
AccountIdNAME
Data Services Performance Optimization Guide 167
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 168/198
Short description
DataServices param-
eter in Attributes
pane
TPump command
Use the error tablespecified in ErrorTable.If the table does notexist, TPump createsit. If the structure of theexisting error table isnot compatible with theerror table TPump cre-
ates, the job will run in-to an error whenTPump tries to insert or update the error table.
AppendBEGIN LOAD
Number of minutes be-tween checkpoint oper-ations. Value must bean unsigned integer from 0 through 60, in-
clusive.
The default is tocheckpoint every 15minutes.
CheckpointRateBEGIN LOAD
168 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 169/198
Short description
DataServices param-
eter in Attributes
pane
TPump command
Maximum number of rejected records thatTPump can write to theerror table while main-taining a table. The de-fault is no limit.
If you specify ErrorPer-centage , ErrorLimit
specifies the number of records that must besent to the Teradatadatabase beforeError-
Percentage takes ef-fect. For example, if ErrorLimit is 100 andErrorPercentage is 5,100 records must besent to the Teradata
database before theapproximate 5% rejec-tion limit is applied. If only 5 records were re-
jected when the 100threcord is sent, the limitis not exceeded. How-ever, if six recordswere rejected when the100th record is sent,TPump stops process-ing because the limit isexceeded.
ErrorLimitBEGIN LOAD
Data Services Performance Optimization Guide 169
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 170/198
Short description
DataServices param-
eter in Attributes
pane
TPump command
Integer value that repre-sents the approximatepercent of the totalnumber of records sentto the TeradataDatabase that might berejected during theTPump task. You can-
not specify this param-eter without ErrorLimit.
ErrorPercentageBEGIN LOAD
Name of the table inwhich TPump storesinformation about er-rors and the rejectedrecords.
ErrorTableBEGIN LOAD
Name of macro to exe-cute. Using predefinedmacros saves time be-cause TPump does notneed to create anddrop new macros eachtime you run a TPump
job script.
ExecuteMacroEXECUTE
Select Ignore dupli-
cate inserts to notplace duplicate rows inthe error table.
Ignore duplicate in-
sertsDML LABEL
170 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 171/198
Short description
DataServices param-
eter in Attributes
pane
TPump command
Character string thatidentifies the name of a job. The maximumlength is 16 characters.
JobNameNAME
Number of secondsthat the oldest record
resides in the buffer before TPump flushesit to the Teradatadatabase. Value can-not be less than onesecond.
If the SerializeOn isnot specified, only thecurrent buffer can pos-sibly be stale. If you
specify SerializeOn,the number of stalebuffers can range fromzero to the number of sessions.
LatencyBEGIN LOAD
Name of the table touse to write checkpointinformation that is re-quired for the safe and
automatic restart of aTPump job.
The default name hasthe following format:
owner.table _LT
LogTableOther TPump com-mands
Data Services Performance Optimization Guide 171
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 172/198
Short description
DataServices param-
eter in Attributes
pane
TPump command
Name of database tocontain any macrosTPump uses or builds.The default is to placemacros in the samedatabase that containstheTPump target table.
MacroDatabaseBEGIN LOAD
Maximum number of sessions for TPump touse to update the Tera-data database. DataServices uses a defaultof 3.
MaxSessionsBEGIN LOAD
Minimum number of sessions for TPump to
use to update the Tera-data database.
MinSessionsBEGIN LOAD
Do not drop the error table, even if it is emp-ty, at the end of a job.You can use NoDrop
with Append to persistthe error table, or youcan use it alone.
NoDropBEGIN LOAD
172 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 173/198
Short description
DataServices param-
eter in Attributes
pane
TPump command
Prevents TPump fromchecking for statementrate changes from, or update status informa-tion for, the TPumpMonitor.
NoMonitor BEGIN LOAD
Prevents TPump fromterminating because of an error associatedwith a variable-lengthrecord.
NoStopIMPORT INFILE
Number of SQL state-ments to pack into amultiple-statement re-quest. The default is 20
statements per re-quest. The maximumvalue is 600.
PackBEGIN LOAD
Select PackMaximumto have TPump dynam-ically determine thenumber of records topack within one re-quest. The maximum
value is 600.
PackMaximumBEGIN LOAD
Data Services Performance Optimization Guide 173
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 174/198
Short description
DataServices param-
eter in Attributes
pane
TPump command
Initial maximum rate atwhich TPump sendsSQL statements to theTeradata database.Value must be a posi-tive integer. If unspeci-fied, Rate is unlimited.
RateBEGIN LOAD
174 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 175/198
Short description
DataServices param-
eter in Attributes
pane
TPump command
Specifies whether or not to use robustrestart logic. Value canbe YES or NO.
• NO specifies simplerestart logic, whichcause TPump tobegin where the last
checkpoint occurredin the job. TPumpredoesanyprocess-ing that occurredafter the check-point.
• YES specifies robustrestart logic, whichyou would use for DML statements
that change the re-sults when you re-peat the operation.Examples of suchstatements includethe following:
INSERTs into ta-bles which allowduplicate rows
UPDATE foo SET A=A+1...
RobustBEGIN LOAD
Data Services Performance Optimization Guide 175
9Using Bulk Loading
Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 176/198
Short description
DataServices param-
eter in Attributes
pane
TPump command
Specify a comma sepa-rated list of columns touse as the key for rowsand guarantee that op-erations on these rowsoccur serially.
SerializeOnBEGIN LOAD
Number of hours thatthe utility tries to log onsessions required toperform the TPump
job. The default is four hours.
TenacityHoursBEGIN LOAD
Number of minutes thatTPump waits before itretries a logon opera-
tion. The default is sixminutes.
TenacitySleepBEGIN LOAD
Note: By default, Data Services uses the bulk loader directory to storethe script, data, error, log, and command (bat) files.
For more information about these parameters, see the Teradata ParallelData Pump Reference.
6. If you specified Data file in File Option, you can increase the Number
of loaders in the Options tab which increase the number of data files.Data Services can use parallel processing to write data to multiple datafiles in batches of 999 rows.
If you specified Generic named pipe or Named pipe access module, DataServices supports only one loader and disables the Number of loadersoption.
176 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Teradata
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 177/198
Related Topics
• Reference Guide: Data Services Objects, DBC driver manager for UNIX
Load
To bulk load a Teradata table using the Load utility
1. In the Bulk Loader Options tab of your target table editor, choose Load
in the Bulk loader drop-down list.
2. In File Option, choose the type of file (Data File, Generic named pipe,or Named pipe access module) to contain the data to bulk load.
3. Enter a command to be invoked by Data Services in the Command linetext box. For example, fastload<C:\tera_script\float.ctl
4. If you chose Data File in File Option, enter (or browse to) the directorypath where you want Data Services to place your data file.
5. If you chose Generic named pipe or Named pipe access module inFile Option, enter the pipe name.
Bulk loading in Netezza
Data Services supports bulk loading to Netezza Performance Servers.
For detailed information about Netezza loading options and their behavior in theNetezza environment, see the relevant Netezza product documentation.
Netezza recommends using the bulk-loading method to load data for faster performance. Unlike other bulk loaders, the Data Services bulk loader for Netezza supports UPDATE and DELETE as well as INSERT operations,which allows for more flexibility and performance.
Netezza bulk-loading process
To bulk load to a Netezza target table, Data Services:
• Creates an external table that is associated with a local file or namedpipe
• Loads data from the source into the file or named pipe
Data Services Performance Optimization Guide 177
9Using Bulk Loading
Bulk loading in Netezza
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 178/198
• Loads data from the external table into a staging table by executing anINSERT statement
• Loads data from the staging table to the target table by executing a set
of INSERT/UPDATE/DELETE statements
Options overview
From the Bulk Loader Options tab of your Netezza target table editor, selectone of these methods depending on your Netezza environment:
• Named pipe—Data Services streams data as it is written to the namedpipe through the external table to the staging table. For files that are larger
than 4 GB in size, select this option for faster performance.• File—Data Services writes the data to a file before loading through the
external table to the staging table. For files that are smaller than 4 GB insize, select this option for faster performance.
• None—Data Services does not use bulk loading.
Because the Data Services bulk loader for Netezza also supports UPDATEand DELETE operations, the following options (on the target table editor Options tab) are also available for Netezza bulk loading.
Note: These are the only options that will be available on the Options tab.• Column comparison
• Number of loaders
• Use input keys
• Update key columns
178 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Netezza
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 179/198
• Auto correct load
Related Topics
• Reference Guide:Target tables
Netezza log files
When writing from the external table to the staging table, Netezza generateslogs (nzlog and nzbad files) and writes them to a database server workingdirectory. You can use these logs to troubleshoot your jobs. (If you do notenter a Database server working directory in the datastore editor, Netezzauses the temp directory on its server, /tmp, to store the nzlog and nzbad
files.)
For Data Services to access and manage these logs, configure the FTPparameters in the datastore editor. After a load, Data Services copies thesefiles from the specified Netezza Database server working directory to thespecified Data Services Bulk loader directory and deletes them from theNetezza server.
For successful loads, Data Services then deletes these log files from theBulk loader directory (assuming the Clean up bulk loader directory after
load option is checked in the target table editor).
For failed loads, Data Services does not delete the log files from the Bulkloader directory even if the Clean up bulk loader directory after load
option is checked in the target table editor.
Configuring bulk loading for Netezza
To configure the datastore:
1. Create a new ODBC datastore.
2. Specify the options for the Netezza database by selecting the Databasetype: ODBC.
3. Click Advanced.
4. Click in the field to the right of Bulk loader directory and type thedirectory path or click Browse to where Data Services should write SQLand data files for bulk loading.
Data Services Performance Optimization Guide 179
9Using Bulk Loading
Bulk loading in Netezza
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 180/198
5. Expand the FTP category and enter the FTP host name, login user name,login password, and host working directory.
Data Services uses these options to transfer the Netezza nzlog and nzbad
files.Note: If this datastore is not being used specifically for Netezza bulkloading, Data Services ignores any FTP option entries.
6. If you are loading non-ASCII character data, set the Code page to latin-9.
If you are loading mulitbyte data, set the Code page to utf-8.
7. Click OK or Apply.
Related Topics
• Reference Guide: ODBC • Designer Guide: Defining a database datastore
To configure the target table
For descriptions of all Netezza bulk-loading options.
1. Create a data flow with a Netezza target table.
2. Double-click the target table to open its editor.
3. Click the Bulk Loader Options tab.4. For Bulk load, from the drop-down list select a bulk-loading method (File
or Named pipe).
5. Edit the other options as necessary.
Click the Options tab. For Netezza databases, only the following optionsare available. Edit them as necessary.
• Column comparison
• Number of loaders
• Use input keys
• Update key columns
• Auto correct load
180 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Netezza
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 181/198
Related Topics
• Reference Guide:Target tables
Data Services Performance Optimization Guide 181
9Using Bulk Loading
Bulk loading in Netezza
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 182/198
182 Data Services Performance Optimization Guide
Using Bulk Loading9 Bulk loading in Netezza
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 183/198
Other Tuning Techniques
10
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 184/198
The previous chapters describe the following tuning techniques:
• Maximizing push-down operations
• Using Caches
• Using Parallel Execution
• Distributing Data Flow Execution
• Using Bulk Loading
This section describes other tuning techniques that you can use to adjustData Services performance:
• Source-based performance options
• Join ordering
• Minimizing extracted data
• Using array fetch size
• Target-based performance options
• Loading method
• Rows per commit
• Job design performance options
• Loading only changed data
• Minimizing data type conversion
• Minimizing locale conversion
• Precision in operations
These techniques require that you monitor the change in performancecarefully as you make small adjustments.
Related Topics
• Maximizing Push-Down Operations on page 41
• Using Caches on page 63
• Using parallel Execution on page 77
• Distributing Data Flow Execution on page 101
• Using Bulk Loading on page 123
184 Data Services Performance Optimization Guide
Other Tuning Techniques10 Source-based performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 185/198
• Join ordering on page 185
• Minimizing extracted data on page 188
• Using array fetch size on page 189
• Loading method on page 190• Rows per commit on page 191
• Job design performance options on page 192
• Minimizing data type conversion on page 192
• Minimizing locale conversion on page 193
• Precision in operations on page 193
• Measuring Data Services Performance on page 23
Source-based performance options
Join ordering
Data Services reads sources in preparation for the join in an order determinedby a "depth first traversal" of a tree starting from the root node. Controllingthis order—referred to as the join order in this description—can often havea profound effect on the performance of producing the join result. The joinorder has no effect on the actual result produced.
Related Topics
• How join ranks affect join order on page 185
• How join ranks are used in outer joins on page 186
How join ranks affect join order
Consider these examples without join ranks set:
In this join tree, there is only onebranch to traverse. Table A is readthen joined with Table B. The resultsfrom that join are joined with TableC.
Data Services Performance Optimization Guide 185
1Other Tuning Techniques
Source-based performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 186/198
In this join tree, the longest branch istraversed first. Table J is read, then
joined with Table L. The results are joined with Table M. The results arethen joined with Table K.
You can use join ranks to control the order in which the sources are joined.Data Services joins the child nodes in the order of decreasing join ranks.The join rank associated with a node is the maximum join rank of any nodefurther down the branch.
For example, suppose you are joining four tables in an outer join—tables J
through M—and you assign each a different join rank.
In this join tree, the join rank of Table L is considered 3 because the join rankof Table M is greater than the original join rank of Table L. However, the joinrank of Table K (5) is still larger than the join rank of Table L. The join order is as follows: Table J is read and joined with Table K. The results are joinedwith Table L. The result of that join is joined with Table M.
Note: If the value of Join rank is 0 and you collected cache statistics for thesources involved in the join, the optimizer determines the join ranks basedon the cache statistics.
How join ranks are used in outer joins
Join ordering is determined by the type of join you choose to use in DataServices. There are two types of joins:
• Normal — Created by drawing connection lines from multiple sources ina data flow to the same query and then entering an integer as a Join
rank value for each source using source editors. Data Services orders join operations using these values.
186 Data Services Performance Optimization Guide
Other Tuning Techniques10 Source-based performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 187/198
For a normal join between three or more tables, Data Services internallyimplements a series of two-way joins. First, it joins the two highest rankedtables together using the join ranking rules outlined above. Next, DataServices joins the result of the first join with the third highest join rankedtable. This process continues until all tables are included in the join.
• Outer Join — Created by taking a normal join and then using the Outer
Join tab of the query to specify the outer and inner sources. Data Servicesprocesses the outer source as if it was assigned the highest join rank.
For an Outer join, Data Services ignores the Join rank values unless youare using three or more tables and a pair of joins share the same outer
join table.
For example, if table A is an outer join to tables B and C, the join rank iscalculated by taking the maximum join rank value of the two tables ineach outer join pair (A and B and A and C). The pair with the higher joinrank becomes the outer loop.
Data Services implements joins as nested loop joins. The source with thehigher join rank or the one specified as an Outer Source in the Outer Jointab becomes an outer loop. If all the ranks are equal or not set (defaults to0), then Data Services picks an inner and outer source at random. During
job execution, Data Services reads the source in the outer loop one time;and reads the source in the inner loop for each row in the outer loop.Performance improves when fewer rows are read.
The number of output rows for a join depends on the type of join you use ina data flow.
• For a Normal join, the output is limited to the number of rows that meet join conditions.
• For an Outer join, the output is equal to or greater than the number of rows in the source specified as the Outer Source.
Data Services Performance Optimization Guide 187
1Other Tuning Techniques
Source-based performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 188/198
Using an Outer join is often preferable with real-time jobs because oftenyou want the whole message passed on whether or not conditions a joinlooks for in the inner source exist.
In fact, if you do not use an Outer Join to order joins for a message, DataServices will still process the message as if it has the highest join rank.The message source editor lacks a Join rank option because DataServices automatically gives it the highest join rank.
Related Topics
• Reference Guide: Query
Tips
For a join between two tables, assign a higher join rank value to the larger table and, if possible, cache the smaller table.
For a join between a table and file:
• If the file is small and can be cached, then assign it a lower join rank valueand cache it.
• If you cannot cache the file, then assign it a higher join rank value so thatit becomes the "outer table" in the join.
For a join between two files, assign a higher rank value to the larger file and,if possible, cache the smaller file.
Minimizing extracted data
The best way to minimize the amount of data extracted from the sourcesystems is to retrieve only the data that has changed since the last time youperformed the extraction. This technique is called changed-data capture.
Related Topics
• Designer Guide: Techniques for Capturing Changed Data
188 Data Services Performance Optimization Guide
Other Tuning Techniques10 Source-based performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 189/198
Using array fetch size
Data Services provides an easy way to request and obtain multiple data rowsfrom source databases. The array fetch size feature allows you to retrievedata using fewer requests, thereby significantly reducing traffic on your network. Tuning array fetch size can also reduce the amount of CPU use onthe Job Server computer.
The Data Services array fetch feature lowers the number of databaserequests by "fetching" multiple rows (an array) of data with each request.Enter the number of rows to fetch per request in the Array fetch size optionon any source table editor or SQL transform editor. The default setting is1000, meaning that with each database request, Data Services will
automatically fetch 1000 rows of data from your source database. Themaximum array fetch size that you can specify is 5000 bytes.
Business Objects recommends that you set the array fetch size based onnetwork speed.
Note: Higher array fetch settings will consume more processing memoryproportionally to the length of the data in each row and the number of rowsin each fetch.
Regardless of the array fetch setting, sources reading columns with an Oracle
LONG data type cannot take advantage of this feature. If a selected datacolumn is of type LONG, the array fetch size internally defaults to 1 row per request.
To set the Array fetch size parameter
1. Use either a source table editor or an SQL transform editor.
To use a source table editor:
a. Double-click a source table in the Designer's workspace.
b. In the Performance section of the Source tab, enter a number in theArray fetch size text box.
To use an SQL transform editor:
a. Double-click an SQL transform in the Designer's workspace.
b. In the SQL transform editor, enter a number in the Array fetch size
text box.
Data Services Performance Optimization Guide 189
1Other Tuning Techniques
Source-based performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 190/198
Array Fetch Size indicates the number of rows returned in a single fetchcall to a source table. The default value is 1000. This value reduces thenumber of round-trips to the database and can improve performance for table reads.
The Array Fetch Size option does not support long column data types. If the SELECT list contains a long column, Data Services sets the ArrayFetch Size to 1 and reads one row of data at a time from the database.
2. Click OK.
Target-based performance options
Loading method
You can choose to use regular loading or bulk loading with Data Services.For a regular load, the Parameterized SQL option is automatically selectedby Data Services when it must generate, parse, and compile the statement.By using parameterized SQL, Data Services can minimize these efforts byusing one handle for a set of values instead of one handle per value.
You cannot combine bulk loading with the following options:
• Auto-correct load
• Enable Partitioning
• Number of Loaders
• Full push down to a database
Data Services automatically selects this optimizer process when thefollowing conditions are met:
• The source and target in a data flow are on the same database
• The database supports the operations in the data flow
If the optimizer pushes down source or target operations, then it ignoresthe performance options set for sources (Array fetch size, Caching, andJoin rank) because Data Services is not solely processing the data flow.
• Overflow file
• Transactional loading
190 Data Services Performance Optimization Guide
Other Tuning Techniques10 Target-based performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 191/198
To improve performance for a regular load (parameterized SQL), you canselect the following options from the target table editor. Note that if you useone, you cannot use the others for the same target.
• Enable PartitioningParallel loading option. The number of parallel loads is determined by thenumber of partitions in the target table.
• Number of Loaders
Parallel loading option. The number of parallel loads is determined by thenumber you enter for this option.
Related Topics
• Push-down operations on page 42
• Table partitioning on page 79
• Using Bulk Loading on page 123
Rows per commit
Rows per commit for regular loading defaults to 1000 rows. Setting theRows per commit value significantly affects job performance. Adjust therows per commit value in the target table editor's Options tab, noting the
following rules:• Do not use negative number signs and other non-numeric characters.
• If you enter nothing or 0, the text box will automatically display 1000.
• If you enter a number larger than 5000, the text box automatically displays5000, which is the maximum value allowed.
Business Objects recommends setting rows per commit between 500 and2000 for best performance. You might also want to calculate a value. To dothis, use the following formula:
max_IO_size/row size (in bytes)
For most platforms, max_IO_size is 64K. For Solaris, max_IO_size is 1024K.For a formula to calculate row size, see Caching sources on page 65.
Note that even with a value greater than one set for Rows per commit, DataServices will submit data one row at a time if the following conditions exist:
Data Services Performance Optimization Guide 191
1Other Tuning Techniques
Target-based performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 192/198
• You set Rows per commit too high for your environment
• You are using an overflow file
• The initial transaction fails
In this case, performance becomes worse than setting Rows per commit
to 1.
Job design performance options
Loading only changed data
Identifying and loading only changed data is called changed-data capture(CDC), which includes only incremental data that has changed since the lastrefresh cycle. Performance improves because with less data to extract,transform, and load, the job typically takes less time.
Related Topics
• Designer Guide: Techniques for Capturing Changed Data
Minimizing data type conversion
Data Services offers very robust and easy-to-use data type conversions viacolumn mappings of different data types. Business Objects recommendsthat you:
• Avoid unnecessary data conversions.
• Verify that Data Services is performing the implicit conversions (selectedwhen you drag and drop columns from input to output schemas in thequery transform) as expected. This can be done by looking at the warningsgenerated during job validation.
192 Data Services Performance Optimization Guide
Other Tuning Techniques10 Job design performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 193/198
Minimizing locale conversion
If your jobs do not require the use of different or multi-byte locales, you canincrease performance by ensuring that locales are single-byte and notmismatched.
Precision in operations
Data Services supports the following precision ranges: 0-28, 29-38, 39-67,68-96. Note that as you decrease precision, performance increases for arithmetic operations and comparison operations. In addition, when
processing an arithmetic or boolean operation that includes decimals indifferent precision ranges, Data Services converts all to the highest precisionrange value because it cannot process more than one decimal data typeprecision range for a single operation. For example, if Data Services mustperform an arithmetic operation for decimals with precision 28 and 38, itconverts both to precision 38 before completing the operation.
Data Services Performance Optimization Guide 193
1Other Tuning Techniques
Job design performance options
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 194/198
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 195/198
Index
A
aggregations, pushing to database 43al_engine process 24al_jobserver process 24analyzing trace log files 30array fetch size 189auto correct load
enabling push down 53
B
bulk loadingDB2 129DB2, using import utility 137Oracle 124Oracle, conventional-path load 125Oracle, direct-path load 125Sybase 138
Teradata 139Teradata, Load Utilities 157Teradata, Warehouse Builder 151using 177vs. regular loading 190
C
caching joins 66Table_Comparison transform 68tables 65using a lookup function 67
caching data 39, 64caching lookups vs. setting source table as
outer join 67
changed-data capture, performance, using for 188
conventional-path loading in Oracle 125
D
datacapturing changes 192
Data Services, optimizing data transformations42
data transformationsoptimize by caching 28, 39, 64optimizing 42pushing to source database 47
data type conversion 192database links
and datastores 57defined 57example data flow 59
importing 58software support 58tuning performance 61
datastoresadding database link support 58and database links 57persistent cache 69
DB2, bulk loading 129degree of parallelism, enabled for functions 92degree of parallelism, for data flows 86
Eextracting data
caching 64minimizing 188
Data Services Performance Optimization Guide 195
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 196/198
F
filtering, pushing to database 43functions, pushing to database 43
I
import utility for DB2 bulk loading 137IPC communications
between data flows 120between sub data flows 121
J
join ordering 185 join rank
outer joins 185 joins, caching 66 joins, pushing to database 43
L
linked datastorescreating 58example data flow 59
tuning performance 61Loading 190logs
performance monitor 71lookup function, caching 67
M
minimize data extracted 188monitor log files, performance,
href="../perf_opt/measuring with 31
O
operational dashboards 34optimizing data transformations 42
Oraclebulk loading 124bulk loading, conventional-path load 125bulk loading, direct-path loading 125parallel loading 127
ordering, pushing to database 43outer joins
rank in 185
P
parallel execution 78, 99degree of parallelism 86Degree Of Parallelism 86, 107
degree of parallelism, enabled for functions92
file multi-threading 96for different data flows and work flows 78partitioned tables 79partitioned tables, creating 81partitioned tables, enabling 84using table partitioning and DOP 94within a single data flow 79, 99
parallel loading in Oracle 127peer-to-peer communications
between data flows 120between sub data flows 121
performancechanged-data capture and 192data transformations and 47improving, DS environment 18, 22improving, with bulk loading 177improving, with parallel execution 78, 99measuring, with DI options 25measuring, with DS options 34tuning, with DS job design options 39, 184tuning, with DS source options 39, 184
tuning, with DS target options 39, 184performance monitor log 71Performance Monitor, reading execution
statistics 32persistent cache datastore 69
196 Data Services Performance Optimization Guide
Index
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 197/198
projection, pushing to database 43pushing operations to database
example 45logic 42overview 42
R
rowsduplicate, avoiding loading 53
rows per commit 191rows, retrieving multiple 26, 189
Sserver groups
grid computing 117sizing tables 65sources
retrieving multiple rows 26, 189SQL
pushing to database 42Sybase, bulk loading 138
T
Table_Comparison transform, performance,improving 68
tablescaching 65retrieving multiple rows 26, 189sizing 65
Teradatabulk loading 139bulk loading using Load Utilities 157bulk loading using Warehouse Builder 151load utilities 158, 162, 165, 177
throughput, improving 39
trace log files, analyzing 30tuning techniques
array fetch size 189caching data 39, 64 join ordering 185minimize data extracted 188minimize data type conversion 192rows per commit 191
Data Services Performance Optimization Guide 197
Index
7/31/2019 Xi3 Ds Performance Optimization Guide En
http://slidepdf.com/reader/full/xi3-ds-performance-optimization-guide-en 198/198
Index