data integrator performance and scalability whitepaper_v07

17
EXECUTIVE SUMMARY  As data vo lumes continu e to grow e xponential ly, IT depa rtments must evalua te new appro ache s to ensure data management processes can be accomplished within finite time windows. New strategies for enhancing scalability coupled with powerful and intuitive data integration technologies ensure you can continue to achieve your objectives as your processing demands increase. The goal of t his white paper is t o illustrate the SAP BusinessObjects ™ Data Integrator performance and scalability for different load strategies. We will explain both our push-down capabilities to the database as well as Data Integrator’s own engine performance. Our objectiv e is t o help you learn from our best practices, which are illustrated with test results, and help you size your Data Integrator architecture and estimate the time needed to load your data warehouses. TEST CONCEPT OVERVIEW The test scenarios we execute are based on real-world scenarios. This means:  Real-world h ardware. We u se two CPU d ual-core mac hines for both source and target databases (Oracle) as well as for the Data I ntegrator job server all running on Microsoft Windows 2003. For CPU-intens ive jobs we show scalability using a four CPU dual-core job server.  Representativ e da ta sets. Source d ata from the industry-standa rd TCP -C be nchmark with tables from 6,000,000 records (customers table) to 60,000,000 records (order lines table).  Jobs ty pically us ed in a da ta warehous e environment. Loading dimension tab les with slowly changing dimension s (type two) and fact t ables with surrogate key lookups.  All tests were ex ecuted using Data Integrator 11.7 on Microsoft Windows s ervers. RESULTS SUMMARY This white paper concentrates on performance. But before we share the performance results, it is also important to realize other factors, like ease-of-use and maintainabili ty, are key elements in t he evaluation of a data integration product. All jobs used in our test scenarios are simple to develop, do not require technical database analysis skills, and are easy to maintain and understand, thanks to the graphical represe ntation. This is in sharp contrast to hand-coded data integration projects which significantly increas es development and maintenance efforts.  As a reference for our p erformance numbers, we first measured database (Oracle) and net work performance by copyin g a table without operations from the source to the target database us ing database links. This resulted in a throughpu t of 54 GB/Hour, which is the maximum performance possible in our test environment and will be used as a base line t o calculate overhead added by Data Int egrator by adding more advance d transformations. The main conclusions about Data Integrator’s performance are:  Data Integrator can lev erage the po wer of the datab ase s erver and do a full p ush down of the e ntire dataflo w to the source database, including loading into the t arget database via database links. Using this approach, no data will be transferred between the database and the Data Integrator servers. Our t ests illustrate that even more complex data flows will be pushed down entirely to the database, resulting in fairly complex SQL DATA MIGRATION DATA INTEGRATOR WHITEPAPER PERFORMANCE AND SCALABILITY

Upload: sribbleher

Post on 03-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 1/17

EXECUTIVE SUMMARY

 As data volumes continue to grow exponentially, IT departments must evaluate new approaches toensure data management processes can be accomplished within finite time windows. Newstrategies for enhancing scalability coupled with powerful and intuitive data integration technologiesensure you can continue to achieve your objectives as your processing demands increase.

The goal of this white paper is to illustrate the SAP BusinessObjects™ Data Integrator performance and scalability for 

different load strategies. We will explain both our push-down capabilities to the database as well as Data Integrator’s

own engine performance. Our objective is to help you learn from our best practices, which are illustrated with test

results, and help you size your Data Integrator architecture and estimate the time needed to load your data

warehouses.

TEST CONCEPT OVERVIEW

The test scenarios we execute are based on real-world scenarios. This means:

  Real-world hardware. We use two CPU dual-core machines for both source and target databases (Oracle) as

well as for the Data Integrator job server all running on Microsoft Windows 2003. For CPU-intensive jobs we

show scalability using a four CPU dual-core job server.

  Representative data sets. Source data from the industry-standard TCP-C benchmark with tables from

6,000,000 records (customers table) to 60,000,000 records (order lines table).

  Jobs typically used in a data warehouse environment. Loading dimension tables with slowly changing

dimensions (type two) and fact tables with surrogate key lookups.

 All tests were executed using Data Integrator 11.7 on Microsoft Windows servers.

RESULTS SUMMARY

This white paper concentrates on performance. But before we share the performance results, it is also important to

realize other factors, like ease-of-use and maintainability, are key elements in the evaluation of a data integration

product. All jobs used in our test scenarios are simple to develop, do not require technical database analysis skills, and

are easy to maintain and understand, thanks to the graphical representation. This is in sharp contrast to hand-coded

data integration projects which significantly increases development and maintenance efforts.

 As a reference for our performance numbers, we first measured database (Oracle) and network performance by

copying a table without operations from the source to the target database using database links. This resulted in a

throughput of 54 GB/Hour, which is the maximum performance possible in our test environment and will be used as a

baseline to calculate overhead added by Data Integrator by adding more advanced transformations.

The main conclusions about Data Integrator’s performance are:

  Data Integrator can leverage the power of the database server and do a full push down of the entire dataflow to

the source database, including loading into the target database via database links. Using this approach, no

data will be transferred between the database and the Data Integrator servers. Our tests illustrate that even

more complex data flows will be pushed down entirely to the database, resulting in fairly complex SQL

DATA MIGRATIONDATA INTEGRATOR WHITEPAPER

PERFORMANCE AND SCALABILITY

Page 2: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 2/17

2

©SAP AG 2009

statements generated by Data Integrator’s optimizer. In this scenario, Data Integrator’s performance will equal

the database performance.

  For more complex transformations that cannot be translated to pure SQL for a full pushdown to the database,

Data Integrator will read from the source, process data in its engine, and write to the target. Data Integrator’s

optimizer will still push down as much as possible to the source database (joins, aggregations, functions), but

data will stream through the Data Integrator engine. For the initial load of a typical dimension table, including

data validation functions and lookups, we generated a throughput of 42 GB/Hour. In most cases, the database

and network—not the Data Integrator server—were shown to be the bottlenecks. To illustrate the potential for pure Data Integrator performance, without external constraints, artificial tests with random data generation and

no loading show pure engine performance of up to 142 GB/Hour.

  For delta loads, Data Integrator can tightly integrate with CDC techniques provided by your database (such as

Oracle CDC) for optimal performance. In case this is not available on your source system, Data Integrator’s

built-in Table_Comparison transform can be used to detect changes by comparing the source with the target

table. In our 6,000,000 record customer table, we achieve a throughput of 12 GB/Hour for the delta load while

comparing each source record with the target and keeping history for changed records.

In the above cases, the Data Integrator engine is not the limiting factor. Most of the time Data Integrator is waiting on

the source or target database. Which is why CPU power on the Data Integrator server is not fully optimized. Tuning in

the above cases should be done at the database level (e.g. increase number of disks, more memory, CPU, etc.).

In the next test cases we strove to fully use Data Integrator’s engine performance. Whereas the common trend is to try

to push down as much as possible to the source database to improve performance, Data Integrator’s engine can

potentially equal or even exceed database performance (depending on database tuning) and will certainly put less

stress on your operational database sources.

Page 3: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 3/17

3

©SAP AG 2009

Test Concept

Our goal with this white paper is to provide data integration performance numbers for real-life customer scenarios,

rather than focusing on record-breaking performance numbers in an isolated and unrealistic test environment. Also

hardware and data sets used in our tests were chosen to reflect real-life scenarios. This approach will enable

customers to compare our approaches and results with their environment and use the numbers shown in this report to

estimate the time needed to meet their specific data loading requirements, which can be the foundation for a sizingstudy.

For the test scenarios we’ve chosen the example of loading a data warehouse from a transactional system. However 

most examples can be applied to other data integration scenarios as well. For each scenario we show a number of 

variations using different settings in Data Integrator and we change the degree of parallelism to show the performance

impact of these options. Where appropriate, we illustrate the execution results with the servers’ CPU and memory

consumption.

Because the database and network are important factors that impact performance, we executed an initial test (scenario

0) to have a reference to compare other performance numbers to. In fact we will show that, as in most real-life data

integration projects, database and network tend to be the bottlenecks rather then the extracting, transforming, and

loading (ETL) tool itself.

TEST CONCEPT OVERVIEW

The test scenarios we execute are based on real-world scenarios, this means:

Scenario 0: Straight Table Copy

To have reference numbers that will give an indication of database and network performance in our test environment,

we executed a simple data movement scenario where a table is copied from source to target database. We

recommend customers repeat this simple test in their environment to check what the baseline performance for their 

environment is.

The scenario consists of three tests:

  Load data from source to target directly, without going through the Data Integrator engine. This is

accomplished using a database link between source and target and an INSERT INTO ... SELECT ...

statement.

  Read data via the Data Integrator engine and load into the target table using API bulk loading. This will

illustrate the overhead of having an extra server (hence an extra network hop) between source and target

database.

  The same test as above, without bulk loading but using INSERT statements. This technique is the only

alternative in case INSERTs and UPDATEs are mixed.

Scenario 1: Load Product Dimension

The goal of this scenario is to show our push-down capability, not only for simple data flows like scenario 0 (simple

table copy), but also for more complex scenarios. This way we leverage the power of the database and minimize theamount of data send over the network. The job we designed joins the item table with the stock table (using an outer 

 join) and aggregates on products to get the total stock value (sum) and availability (count) in each warehouse. Both

initial and delta load are implemented, for the delta load we use slowly changing dimensions type 1, no history is kept.

Scenario 2: Straight Table Copy

To have reference numbers that will give an indication of database and network performance in our test environment,

we executed a simple data movement scenario where a table is copied from source to target database. We

Page 4: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 4/17

4

©SAP AG 2009

recommend customers repeat this simple test in their environment to check the baseline performance for their 

environment.

In this scenario we load the customer dimension and need to do some additional operations:

  Validate the customer data. In our example we defined two business rules stating customer name cannot be

NULL and the credit limit must be greater then 0. This is accomplished using Data Integrator’s validation

transform.

  Lookup data warehouse keys for previously loaded dimensions like warehouse and district. You can use Data

Integrator’s lookup_ext function for this. Because lookup tables are relatively small, we can easily cache them

in memory (no paging will occur).

  Generate a data warehouse key for each new customer (using the key_generation transform).

  For delta loading, implement slowly changing dimensions type 2 (SCD2), building up history for changes in the

customer’s attributes. For each record, ‘valid from’ and ‘valid to’ dates are calculated and a ‘current record’

indicator is set. SCD2 is implemented using Table_Comparison and History_Preserving transforms.

Data will stream through the engine and we use bulk loading (API) at the target table if initial load, for delta load we

have a mixture of inserts and updates so need to fall back on parameterised SQL.

Scenario 3: Join Tables to Load Fact Table

In this scenario we want to concentrate on Data Integrator’s own engine capabilities and illustrate the distributed data

flows feature in order to have better performance and maximize memory use. The scenario will load the fact table, by

 joining tables from two different databases with no database link available, in this case the join will be calculated by

Data Integrator. After joining the two tables, a query is used to get data warehouse keys for the customer, district, and

warehouse dimension and provide default values for missing dimension keys. This job will require a lot of memory both

for executing the join (and caching the inner table in memory) and the lookup.

Scenario 4: Aggregate Data from File

In the last scenario we use a flat file as source (comma delimited) and aggregate the 60,000,000 lines (6 GB in file

size) to 2,000 groups calculating a sum for each group. With this example we limit the network and database

bottlenecks and show pure Data Integrator performance.

SERVER HARDWARE AND ARCHITECTURE

For the database, HP Proliant servers running Microsoft Windows 2003 SP2 with two dual core CPUs (which means

four logical CPUs) are used. Source and target database are Oracle 10g (10.2.0.1.0) instances which are physically

located on different servers connected via a Gigabit Ethernet network.

The Data Integrator servers are located on separate servers, but these servers have the same characteristics as the

database servers. There are two different Data Integrator jobservers available: one with two dual core CPUs (hence

four logical CPUs), and one server with four dual core CPUs (eight logical CPUs). The 2 CPU machine is used for most

tests, only for the CPUintensive scenario four we show scalability with the 4 CPU server.

The table below lists the servers with their details:

Page 5: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 5/17

5

©SAP AG 2009

DATA SETS

For the source data we used the TPC-C data model (www.tpc.org) which represents a real life enterprise resource

planning-like (ERP) source system, representative data is generated automatically for all tables. Details about the

number of rows and total table size for the tables used in our scenarios are found below (W=200).

For the target schema, a star schema with fact table (ODER_FACT) and dimension tables (PRODUCT, CUSTOMER,

…) is built. The customer dimension table implements a slowly changing dimension type two with additional columns to

specify ‘valid from’ and ‘valid to’ dates and a ‘current record’ indicator.

Page 6: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 6/17

6

©SAP AG 2009

Below are the row counts and size of these tables after loading:

Between source and target schema a database link is set up, indexes are created and table partitioning is used. The

database is optimally tuned for the available hardware it runs on.

Page 7: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 7/17

7

©SAP AG 2009

Test Results

This chapter show test results for the scenarios explained in the previous chapter. In the first section we show some

reference numbers about our environment (database and network). In the next two sections we make the distinction

between full push down, where the database will do all the work, and data streaming through the engine, where data

physically flows through the Data Integrator engine. The last section concentrates on the pure engine performance

illustrating that Data Intagrator can also execute memory and CPU intensive tasks.

REFERENCE NUMBERS

Scenario 0: Straight Table Copy

In this section we show reference numbers for different types of loading in Data Integrator. There are no

transformations applied to the data, each of the tests is a straight copy. By copying the customers table from source to

target database via a database link, without passing through Data Integrator, we measure the maximum database and

network throughput. An ‘INSERT INTO …

SELECT …’ statement is used to move the data which are 6,000,000 records or 3.7 GB and gives a throughput of 55

GB/Hour.

To measure the maximum performance with Data Integrator involved, we read the same table and bulk load it into the

target database using the Data Integrator engine. This results in a throughput of 54 GB/Hour which shows there’s

hardly any overhead by introducing an extra server hop. We get this high performance because we use the bulk loader 

technology in API mode which means we feed data directly into the target database files using Oracle’s Direct Path

Load API.

In the last reference test, we measured Data Integrator’s loading without bulk loaders enabled. In this case Data

Integrator generates insert statements and passes the values as parameters (parameterized SQL), the throughput

decreases to 27 GB/Hour, which shows for our environment (Oracle) the advantage of using bulk loading.

The tables below summarize the numbers with some extra execution details:

Page 8: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 8/17

8

©SAP AG 2009

PUSH DOWN TO THE DATAB ASE

Scenario 1: Load Product Dimension

 A common technique used to optimize performance is to push down the full SQL statement to the database and letting

the database handle the load. This uses the database performance and results in tuning only needing to be done at the

database side. Data Integrator’s optimizer will always try to generate a full push down whenever this is possible,

generally when no transforms or functions are used that cannot be translated to SQL and when source and targettables are in the same database or connected via database links.

The optimizer will always generate the SQL optimal for the database you are running against. Even the database

version can impact the SQL to insure optimal performance in your environment, so when a new database version

comes out, the generated SQL will be optimized for that version, without any change needed in the design of the data

flow. The data integration developer does not need to know advanced SQL options in order to write a high performing

data flow.

Database Links

Crucial to a full push down is the fact that all tables (sources and targets) are either in the same database or in

separate databases linked via a database link. Without this database link, only the SELECT can be pushed down to the

source and the INSERT will be handled by Data Integrator (see next scenario). In our environment, source and target

database are on separate machines and on the target database a database link is created to the source:

create database link SOURCE connect to ORABM_DI identified by <pwd>

using 'orcl_src'

This database link is imported in Data Integrator to link the target datastore to the source datastore.

For the initial load for the product dimension, the Data Integrator optimizer will generate an INSERT INTO statement

using Oracle’s /* APPEND */ hint for optimal performance. Operations like outer join, GROUP BY, and sum and count

aggregation functions are pushed down in the SELECT clause.

For the delta load the ‘autocorrect load’ option at the target table is turned on in the design of the data flow. This option

corresponds to what is also know as UPSERTs: based on the key information a row will either be updated (if the record

Page 9: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 9/17

9

©SAP AG 2009

already exists in the target) or inserted (for new records). Using this option will result in a MERGE statement generated

by the optimizer. A developer can visualize the SQL generated by the optimizer in Data Integrator Designer using the

‘View optimized SQL’ functionality.

Below is the SQL generated by Data Integrator for the data flow in the screenshot above. Notice the MERGE statement

and the database links used:

MERGE INTO "TPC_USER"."PRODUCT" s

USING

(SELECT "ITEM"."I_ID" ITEM_ID , "ITEM"."I_IM_ID" INVENTORY_ID , "ITEM"."I_NAME" ITEM_NAME ,"ITEM"."I_PRICE" ITEM_PRICE , "ITEM"."I_DATA" ITEM_DATA , (sum( "STOCK"."S_QUANTITY" ) *

"ITEM"."I_PRICE" ) TTL_INV_AMT , count( "STOCK"."S_W_ID" ) S_AVAIL_WAREHOUSES

FROM "ORABM_DI"."STOCK"@SOURCE.REGRESS.RDBMS.DEV.US.ORACLE.COM "STOCK",

"ORABM_DI"."ITEM"@SOURCE.REGRESS.RDBMS.DEV.US.ORACLE.COM "ITEM"

WHERE ( "STOCK"."S_I_ID" (+) = "ITEM"."I_ID" ) and ( "STOCK"."S_QUANTITY" (+) > 0)

GROUP BY "ITEM"."I_ID" , "ITEM"."I_IM_ID" , "ITEM"."I_NAME" , "ITEM"."I_PRICE" , "ITEM"."I_DATA"

) n

ON ((s.ITEM_ID = n.ITEM_ID))

WHEN MATCHED THEN

UPDATE SET s."INVENTORY_ID" = n.INVENTORY_ID,

s."ITEM_NAME" = n.ITEM_NAME,

s."ITEM_PRICE" = n.ITEM_PRICE,

s."ITEM_DATA" = n.ITEM_DATA,s."TTL_INV_AMT" = n.TTL_INV_AMT,

s."S_AVAIL_WAREHOUSES" = n.S_AVAIL_WAREHOUSES

WHEN NOT MATCHED THENINSERT (s."ITEM_ID", s."INVENTORY_ID", s."ITEM_NAME", s."ITEM_PRICE", s."ITEM_DATA", s."TTL_INV_AMT",

s."S_AVAIL_WAREHOUSES" )VALUES (n.ITEM_ID , n.INVENTORY_ID , n.ITEM_NAME , n.ITEM_PRICE , n.ITEM_DATA , n.TTL_INV_AMT ,

n.S_AVAIL_WAREHOUSES)

The performance for this scenario is pure Oracle performance. As illustrated in the reference scenario for our test

environment maximum throughput for a straight copy between the two servers is 55 GB/Hour.

DATA STREAMING THOUGH THE DATA INTEGRATOR ENGINE

Scenario 2: Load Customer Dimension (Initial Load)

When a full push down is not possible because more advanced transformations are needed— which cannot be

translated to SQL—only the SELECT statement will be sent to the source. This SELECT is still optimized to push down

as much as possible to the database (e.g. joins, GROUP BY, database functions, …). The data will be passed fromsource database via the Data Integrator server to the target database. At the target database, you can use bulk loading

for best performance. In this scenario, where we load the customer table, the validation transform needs to be

executed by Data Integrator and is blocking the full pushdown to the database, additionally we use a lookup function to

find data warehouse keys for district and warehouse dimensions.

For the initial load scenario, we get a throughput of 42 GB/Hour, CPU use on the Data Integrator server is low because

the bottleneck is database and network. To get an idea about the Data Integrator performance itself, we executed two

additional tests. For the first we used the same data flow but blocked all inserts before loading into the target (using

Map_operation transform), for the second test we additionally excluded the reading from the source by generating the

data in the engine with the same row size as the source data (using the Row_generation transform). The engine

throughput without data base and network overhead (so no reader and no loader) is 142 GB/Hour.

Page 10: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 10/17

10

©SAP AG 2009

Because the database is the bottleneck, CPU is not used 100% on the Data Integrator server. Tuning the Data

Integrator will not bring much performance improvements, throughput from the database side needs to be enhanced

first (e.g. by adding more disks for faster I/O). The graph below shows CPU use on the Data Integrator server during

the initial load indicating less then 50% of the available CPU power is used. This is typical behaviour for scenarios

where data streams through the engine.

Scenario 2: Load Customer Dimension (Delta Load)

For the delta load, you can use different techniques. Autocorrect load was already illustrated in the previous scenario,

which can result in a full push down. Another alternative is to use the database changed data capture (CDC)

capabilities. Data Integrator has native support for CDC on the database side: for Oracle, Data Integrator can subscribe

to CDC tables to get changed data only. In this case a throughput can be expected similar to an initial load because no

extra operations are needed to detect changes.

The third alternative is to use Data Integrator’s Table_comparison transform to compare source and target table in

order to detect changes. This is the most flexible solution, e.g. users can specify which columns to compare instead of 

comparing all, and no special setup at the database side is needed. However, it will involve connections to both sourceand target databases so performance will degrade due to extra bottlenecks.

Page 11: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 11/17

11

©SAP AG 2009

To tune performance for the Table_comparison transform, three comparison techniques can be used: row-by-row

select, cached comparison table, or sorted input.

Row-by-row select. For each row from the source, a query is executed on the target to verify if it’s a new, an

existing, or a changed row. This usually is the slowest method.

Cached comparison table. The compare columns from the target table are read into memory first and

comparison can be done in memory. Fast for smaller tables, for larger tables (> 2 GB) paging will occur and

performance will degrade.

Sorted input. Requires incoming data be sorted on the compare keys from both source and target. This will be

fast from a Data Integrator perspective but the additional sorting on the database will put the bottleneck at that

side.

In order to implement a slowly changing dimension type two, we keep history for each change made. The old record is

updated with a new ‘valid to’ date and the ‘current record’ indicator is reset. For the new version of the record, a new

data warehouse key is generated and ‘valid from,’ ‘valid to,’ and ‘current record’ indicator are set.

The table below compares the three methods; sorted input is by far the fastest method but still significantly slower than

the initial load. Using cache does not bring performance benefit here because the cached table is 5 GB and access to

disk for paging slows down performance. The results (below) are with no changes detected between source and target,

so no inserts or updates at the target. Depending on the number of changes, more time will be needed by the database

to handle inserts and updates (bulk loader cannot be used here).

PURE DATA INTEGRATOR ENGINE PERFORMANCE

Scenario 3: Join Tables to Load Fact Table

To load the fact table, we need to join the ORDER (6,000,000 records) and ORDER_LINE tables (60,000,000 records).

This join can be performed on the database or in the Data Integrator engine. In case the tables reside on different

databases (without dblink) or one of them is a text file, joining in the Data Integrator engine will be the only option.

Whereas the common trend is to try to push down as much as possible to the database, this scenario will proof that the

Data Integrator engine is capable of delivering similar performance as the database.

To join the two tables in Data Integrator we used the order table, which is the smallest, as inner table and cached it in

memory. The larger order_line table will not be cached in memory but will be read from the database and join each

record with the cached orders table. After joining the two tables also a lookup is performed to find the customer’s data

warehouse key. In order to do this the lookup columns from the customer’s table will be cached in memory.

The key factor to understand performance for this job is the memory usage : we will need about 1.2 GB to cache the

orders table and 1 GB to cache the customer lookup. So when doing everything in 1 process about 2.5 GB is needed

for caching and adding some additional memory for the data processing. For memory usage above 1.5 GB Data

Integrator will start paging to disk and memory access will de slower due to disk I/O. The larger the amount of memory

paged to disk, the slower the performance.

Page 12: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 12/17

12

©SAP AG 2009

To remedy this, it is possible to split a data flow in sub data flows where each sub data flow can access 1.5GB real

memory without starting to page. There are different techniques to create sub data flows:

Run as separate process. Memory intensive processes like lookup, sort, group by and others have this

option. The memory intensive process will run in a separate process and will benefit from more memory that

can be used.

Parallel jo in. A join can be split in smaller parts where each joined table is split up in smaller parts which are

individually joined and merged afterwards. By using the degree of parallelism (DOP) not only the join is plit up

in smaller parts, also other transforms in the dataflow will be parallelized.

Data_transfer transform. To break up a dataflow at any point in sub data flows. The data_tranfer transform

has the option to write to a table in order to use the database again for pushdown in downstream operations.

Each of these options can easily be set in an existing data flows without redesigning the data f low (e.g. runs as

separate process is just a checkbox). For our test we compare three execution plans:

1. Single process using pageable cache

2. Use lookup as separate process: one sub data flow for join and a second sub data flow for lookup. Each sub

data flow will use less then 1.5 GB so no paging will occur.

3. Split the join in two parallel joins, which will result in three sub data flows. Two of them performing each half of 

the join, the third one merges the result and performs the lookup.

Below is a graph showing CPU use in the case lookup is executed as a separate process. This time CPU use is closeto the maximum (85%), total memory used is 2.4 GB.

Page 13: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 13/17

13

©SAP AG 2009

The next graph shows data for the same execution, this time for two individual sub data flows (using 1.1 GB and 1.3

GB of memory each).

The results for our three tests are very similar and result in a maximum throughput of 27 GB/Hour. When more memory

is used, pageable cache performance will be worse giving more benefit to splitting up the process in sub data flows that

fit into memory.

When the amount of memory needed by all sub data flows exceeds the total memory available on the server, pagingwill occur at the operating system level (Windows). Also this will have a negative impact on performance. A solution

here might be to use grid computing and benefit from memory available on other servers in a Data Integrator server 

group.

Page 14: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 14/17

14

©SAP AG 2009

Scenario 4: Aggregate Data from File

We illustrate pure Data Integrator engine performance by aggregating a text file of 6 GB (60,000,000 records). to 2,000

groups. The parameters we tune in this scenario to improve performance are:

Parallel file reader. With the option ‘parallel process threads’ (PPT) one can specify the number of parallel

threads that will be used to convert the text string read from the flat file to data organised in columns based on

the schema defined in the file format. This process is much more resource consuming then the reading from

disk itself.

Degree of parallelism (DOP). By setting a DOP greater than one for a dataflow, transforms in the dataflow will

be replicated and executed as parallel threads. Each replicated transform will treat only part of the records,

where data is distributed by a round-robin splitter to the different parallel transforms and merged again after the

last replicated transform.

The table below shows the results for various values for the two parameters. For our case the optimal combination is

found with four parallel threads for reading the file and no parallelism for the group by calculation (DOP =0). This gives

a throughput of 74 GB/Hr, doubling the throughput for the case without parallelism (33 GB/Hr).

In this case the most blocking factor is reading and interpreting the file. Increasing the degree of parallelism does not

make sense if the reader can not deliver the data fast enough. That is why, in this example, increasing the number of 

parallel threads for reading improves performance dramatically, but degree of parallelism does not add extra value and

in fact decreases performance due to the extra overhead. The decrease in performance for high number of threads (>

4) is because the CPUs are already 100% used, adding more processes does not improve performance, overall

performance will go down because of the extra overhead (more processes to control, splitting and merging more data,

etc.)

In this scenario there are no external bottlenecks like database or network, so we can further improve performance by

 just adding more CPU power to the job server. The same job was executed on a 4 CPU dual core machine (eightlogical CPUs available). The maximum throughput in this case increases to 127 GB/Hr. Detailed results are in the table

below:

Page 15: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 15/17

15

©SAP AG 2009

With more CPUs available, degree of parallelism starts to play a role when we have eight parallel reader processes.

Only in that case, the group by query gets data fast enough to make it efficient to handle the grouping in parallel.

 As a general rule of thumb we can set the number of parallel threads equal to the number of (logical) CPUs available.

Parallel threads in this case can be file reader threads or parallel transforms (DOP).

With this rule, it’s easy to get significant performance gain (although probably not yet the optimal case) without

spending a lot of time in tuning. For our test cases this would give eight readers and DOP = eight for the 4 CPU dual

core server. This configuration comes close to the optimal configuration.

The results for both servers are summarized below in a chart comparing the original dataflow without parallelism, the

configuration with the best performance, and the configuration where we used the general rule of thumb.

Page 16: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 16/17

16

©SAP AG 2009

Finally we used this scenario to showcase the Data_Transfer transform. Using this transform, we can easily (within one

dataflow) copy the text file to a database table in order to pushdown the aggregation to the database (using INSERT

SELECT INTO). This might give good results in case the database server is much more powerful than the Data

Integrator server. In our test case, aggregation on database or inside Data Integrator takes about the same time. The

results for the pushdown to the database are worse in this case because it includes the additional time for loading the

file from the jobserver to the database first (using Data_Transfer).

Page 17: Data Integrator Performance and Scalability Whitepaper_v07

8/12/2019 Data Integrator Performance and Scalability Whitepaper_v07

http://slidepdf.com/reader/full/data-integrator-performance-and-scalability-whitepaperv07 17/17

17

©SAP AG 2009

GRID COMPUTING

The idea of grid computing is to set up a group of Data Integrator job servers to act as a server grid. When a job is

launched on a server group, Data Integrator’s load balancing algorithm will send the job to the server in the group with

the lowest work load at that time. At the same time this is also a failover solution: when one of the servers in the grid

falls out, new jobs executed to the server group will get handled by the remaining servers. The load balancer is not a

separate process, but is built-in into the job launcher process to avoid a single point of failure.

When a job contains multiple data flows, the load balancer will, for each data flow, find the optimal job server,

spreading one job over any number of job servers. This will be very beneficial when data flows can be launched in

parallel (no dependencies defined). With the introduction of sub data flows in Data Integrator XI Release 2 Accelerated,

grid computing is extended with one extra level: even for sub data flows, you can use load balancing over a server 

group. This can be useful for data flows requiring lots of memory with limited physical memory available on an

individual server. In that case, you can avoid paging by splitting the dataflow in sub data flows using the distributed

data flows options (e.g. ‘run as a separate process’) and executing each sub data flow on a separate machine. Each

sub data flow can use more memory, but there will be extra overhead because all data needs to be exchangedbetween the different sub data flows. For fast (gigabit) networks, the overhead introduced by sending data from one

sub data flow over the network to another sub data flow can in be compensated by the fact that memory can be

accessed directly (without paging occurring).