performance tuning for data services

Post on 04-Jun-2018

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 1/43

September 9 –11, 2013

Anaheim, California

My ETL is faster than your database!

Advanced ETL performance tuning for Data Services

Jim Egan

Senior Consultant – Mantis Technology Group, Inc.

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 2/43

2

Learning Points

Tuning ETL and database objects together

can improve overall job execution speed

Views can be great for ETL performance

Complete pushdown to the database isn'talways the best solution

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 3/43

3

Tuning

The Holistic approach to tuning

ETL should never be developed as if the

database was a black box

ETL and DDL are created to complimenteach other

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 4/43

Real world use case

Associated Staging

4

Tuning

DW

Oracle Instance

Stage_BoraBora

Stage_Samoa

Stage_Tahiti

Stage_TongaTonga

Tahiti

Samoa

BoraBora

   S  o  u  r  c  e

   S  t  a  g   i  n  g

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 5/43

5

Tuning

Associated Staging - Benefits

No contention compared to a single

staging table

Jobs run in parallel for all sources

Great place to put views that filter data

down to a specific source

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 6/43

6

Tuning

Associated Staging - Costs

More DDL to deploy

Each schema/database is identical

Requires use of System Configuration andaliases in ETL

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 7/43

7

Tuning

What’s missing? 

Associated partitions in target table

P_BoraBora

P_Samoa

P_Tahiti

P_Tonga

Customer_DimCustomer_Dim

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 8/43

8

Tuning

Associated partitions – Benefits

Partition exchange

Staging Views are linked to a partition

No WHERE clause filterFROM <tbl> PARTITION (<partition>)

Zero contention on target table

Compression by partition

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 9/43

9

Tuning

Associated partitions – Benefits

Very fast archive/truncate of old data if

partitioned by date

Parallel truncate/load is now possible

“Dramatically improves query

performance”, according to Oracle 

Transparent to reports

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 10/43

10

Tuning

Associated partitions – Costs

Data Services doesn’t do partition

exchange

Requires a small amount of scripting

Bulk Loader should be avoided

Be careful when enabling Partitions in the

Dataflow target and source tables

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 11/43

11

Tuning

Using partitions in a Dataflow

Lookups

Table Comparison transform

SQL transform

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 12/43

12

Tuning – Using Partitions

Lookups

Q: How is this expression handled by the

database?

A: It isn’t!

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 13/43

13

Tuning – Using Partitions

Lookups

The expression is evaluated at the job

server

All rows from the table are cached

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 14/43

14

Tuning – Using Partitions

Lookups

The fix: create a view in the staging

schemaCREATE VIEW STAGE_BORABORA .CUSTOMER_DIM

 AS

SELECT *

FROM DW .CUSTOMER_DIM

PARTITION(P_BORABORA);

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 15/43

15

Tuning – Using Partitions

Lookups

Change the lookup 

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 16/43

16

Tuning – Using Partitions

Lookup using views - Benefit

Populates in 25% of the time

Uses 25% of the memory

Lookup process is fasterViews could be created with or without

partitions 

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 17/43

17

Tuning – Using Partitions

Lookup using views - Cost

Extra DDL to create

Additional “table” to import into the

Datastore and migrateNot worth the effort for small look up

tables 

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 18/43

18

Tuning – Using Partitions

Table Comparison

using views

Cursor of the

entire comparetable

Will this use an index?

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 19/43

19

Tuning – Using Partitions

Table Comparison using views

SQL submitted to create cursor

SELECT … 

FROM DW.CUSTOMER_DIM

ORDER BY NLSSORT( SOURCE_SYSTEM,

'NLS_SORT=BINARY') ASC,

CUSTOMER_ID ASC

An index on SOURCE_SYSTEM, CUSTOMER_ID

will not be used because of the function

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 20/43

20

Tuning – Using Partitions

Synchronizing the Index with ETL

To support this: ORDER BY NLSSORT( SOURCE_SYSTEM, 'NLS_SORT=BINARY') ASC,

CUSTOMER_ID ASC

The (function) index has to look exactly like

this:CREATE INDEX CUSTOMER_DIM_IDX1

ON CUSTOMER_DIM(

 NLSSORT( SOURCE_SYSTEM, 'NLS_SORT=BINARY') ASC,

CUSTOMER_ID ASC);

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 21/43

21

Tuning – Using Partitions

Matching Index definition to ETL usage

• Is it worth it?

• In this case it improved performance

• Table compare cursor used the index• Insert was slowed very slightly

• Reporting will never use the function

index

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 22/43

22

Tuning – Using Partitions

Table Comparison

using views

Cursor of only the

relevant partition 

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 23/43

23

Tuning – Using Partitions

Table Comparison using views

SQL submitted to create cursor

SELECT … 

FROM STAGE_BORABORA.CUSTOMER_DIM

 WHERE CUSTOMER_ID >= <CUSTOMER_ID>

ORDER BY CUSTOMER_ID ASC

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 24/43

24

Tuning – Using Partitions

Table Comparison using views - Benefit

Cursor has 25% of the scope of data

Could run in as little as 25% of the time

Requires an index only on CUSTOMER_IDWHERE clause in now included against

the compare table 

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 25/43

25

Tuning – Using Partitions

Table Comparison using views - Cost

Extra DDL

Additional “table” to import into the

Datastore and migrateNot worth the effort for small look up

tables

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 26/43

26

Tuning – New Feature

Table Comparison – New feature

Does it make views obsolete?

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 27/43

27

Tuning – New Feature

Table Comparison – New feature

Very close to the same performance as a

partition sourced view

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 28/43

28

Tuning – New Feature

Table Comparison – New feature

What else could it be used for?

CURRENT_IND = ‘Y’ 

DELETED_IND = ‘N’ 

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 29/43

29

Tuning

Table Comparison – compare methods

Compare

Method

Input

Size

Compare

Table Size Pro/Con

Row-by-row Small Any No memory impact

Cached Small Small Fastest compare, largepotential memory

impact

Sorted Large Large No memory impact,

result set must besorted to match

compare table PK

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 30/43

30

Tuning

Table Comparison – compare method speed

Input Rows  Compare Rows  Compare Method  Execution Time 

3,240,081 6,995,950Sorted Input 3,268 seconds

Row-by-row-select 31,657 seconds

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 31/43

31

Tuning

Display Optimized SQL

Show joins pushed down to the database

Show columns in result set

Show Group By and Order By pushed down

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 32/43

32

Tuning

Display Optimized SQL

Bad joins

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 33/43

33

Tuning

Display Optimized SQL

Good joins Allmappings

here?All

expressionshere?

Should there be an ORDER BY or GROUP BY?

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 34/43

34

Tuning

SQL Transform

• Transform of last resort

• Use the Query transform first

• No metadata captured from SQL transform

•Use SQL transform when

• Query is very complex

• Optimizer hints or partition name has to be supplied

• Joins are not pushed down (by a query transform)

• Variables can be passed in

• Columns must be named

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 35/43

35

Tuning

SQL Transform - Example

Clients

 Join – 

 atthe job

server

Max(contract number)

by client (cached)

35

Compare – 

 only updatechanged Clients

X

i

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 36/43

36

Tuning

SQL Transform – Example (optimized)

Clients with new

max(contract number)

Normal to Update

36

T i

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 37/43

37

Tuning

SQL Transform - Example

Max

contract

number

Only the clients that need to be updated

T i

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 38/43

SQL Transform – Example with partitioning

38

Tuning

Partition name as a parameter

T i

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 39/43

39

Tuning

SQL Transform – Example (with metadata solution)

T i

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 40/43

40

Tuning

What is this? Join at the job server

Full cached

result

sets of

staging and

target

Was target

row found?

Same table!

T i

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 41/43

41

Tuning

Much better implementation Less development time

Better execution time

Very low memory utilization

K L i

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 42/43

52

Key Learnings

Action Benefit

Tune ETL and DDL

together

Better performance

Associated Staging,

Partitions

Reduce contention, parallel

loads

SQL Transform Reduce complexity,

performance

Lookups Eliminate database joins,

performance

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 43/43

Thank you for participating.

Please provide feedback on this session bycompleting a short survey via the event

mobile application.SESSION CODE: 0214

Learn more year-round at www.asug.com

top related