performance tuning for data services

43
8/13/2019 Performance Tuning for Data Services http://slidepdf.com/reader/full/performance-tuning-for-data-services 1/43 September 9  –11, 2013 Anaheim, California My ETL is faster than your database! Advanced ETL performance tuning for Data Services Jim Egan Senior Consultant – Mantis Technology Group, Inc.

Upload: venkat-raman-gajjala

Post on 04-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 1/43

September 9 –11, 2013

Anaheim, California

My ETL is faster than your database!

Advanced ETL performance tuning for Data Services

Jim Egan

Senior Consultant – Mantis Technology Group, Inc.

Page 2: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 2/43

2

Learning Points

Tuning ETL and database objects together

can improve overall job execution speed

Views can be great for ETL performance

Complete pushdown to the database isn'talways the best solution

Page 3: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 3/43

3

Tuning

The Holistic approach to tuning

ETL should never be developed as if the

database was a black box

ETL and DDL are created to complimenteach other

Page 4: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 4/43

Real world use case

Associated Staging

4

Tuning

DW

Oracle Instance

Stage_BoraBora

Stage_Samoa

Stage_Tahiti

Stage_TongaTonga

Tahiti

Samoa

BoraBora

   S  o  u  r  c  e

   S  t  a  g   i  n  g

Page 5: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 5/43

5

Tuning

Associated Staging - Benefits

No contention compared to a single

staging table

Jobs run in parallel for all sources

Great place to put views that filter data

down to a specific source

Page 6: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 6/43

6

Tuning

Associated Staging - Costs

More DDL to deploy

Each schema/database is identical

Requires use of System Configuration andaliases in ETL

Page 7: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 7/43

7

Tuning

What’s missing? 

Associated partitions in target table

P_BoraBora

P_Samoa

P_Tahiti

P_Tonga

Customer_DimCustomer_Dim

Page 8: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 8/43

8

Tuning

Associated partitions – Benefits

Partition exchange

Staging Views are linked to a partition

No WHERE clause filterFROM <tbl> PARTITION (<partition>)

Zero contention on target table

Compression by partition

Page 9: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 9/43

9

Tuning

Associated partitions – Benefits

Very fast archive/truncate of old data if

partitioned by date

Parallel truncate/load is now possible

“Dramatically improves query

performance”, according to Oracle 

Transparent to reports

Page 10: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 10/43

10

Tuning

Associated partitions – Costs

Data Services doesn’t do partition

exchange

Requires a small amount of scripting

Bulk Loader should be avoided

Be careful when enabling Partitions in the

Dataflow target and source tables

Page 11: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 11/43

11

Tuning

Using partitions in a Dataflow

Lookups

Table Comparison transform

SQL transform

Page 12: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 12/43

12

Tuning – Using Partitions

Lookups

Q: How is this expression handled by the

database?

A: It isn’t!

Page 13: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 13/43

13

Tuning – Using Partitions

Lookups

The expression is evaluated at the job

server

All rows from the table are cached

Page 14: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 14/43

14

Tuning – Using Partitions

Lookups

The fix: create a view in the staging

schemaCREATE VIEW STAGE_BORABORA .CUSTOMER_DIM

 AS

SELECT *

FROM DW .CUSTOMER_DIM

PARTITION(P_BORABORA);

Page 15: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 15/43

15

Tuning – Using Partitions

Lookups

Change the lookup 

Page 16: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 16/43

16

Tuning – Using Partitions

Lookup using views - Benefit

Populates in 25% of the time

Uses 25% of the memory

Lookup process is fasterViews could be created with or without

partitions 

Page 17: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 17/43

17

Tuning – Using Partitions

Lookup using views - Cost

Extra DDL to create

Additional “table” to import into the

Datastore and migrateNot worth the effort for small look up

tables 

Page 18: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 18/43

18

Tuning – Using Partitions

Table Comparison

using views

Cursor of the

entire comparetable

Will this use an index?

Page 19: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 19/43

19

Tuning – Using Partitions

Table Comparison using views

SQL submitted to create cursor

SELECT … 

FROM DW.CUSTOMER_DIM

ORDER BY NLSSORT( SOURCE_SYSTEM,

'NLS_SORT=BINARY') ASC,

CUSTOMER_ID ASC

An index on SOURCE_SYSTEM, CUSTOMER_ID

will not be used because of the function

Page 20: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 20/43

20

Tuning – Using Partitions

Synchronizing the Index with ETL

To support this: ORDER BY NLSSORT( SOURCE_SYSTEM, 'NLS_SORT=BINARY') ASC,

CUSTOMER_ID ASC

The (function) index has to look exactly like

this:CREATE INDEX CUSTOMER_DIM_IDX1

ON CUSTOMER_DIM(

 NLSSORT( SOURCE_SYSTEM, 'NLS_SORT=BINARY') ASC,

CUSTOMER_ID ASC);

Page 21: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 21/43

21

Tuning – Using Partitions

Matching Index definition to ETL usage

• Is it worth it?

• In this case it improved performance

• Table compare cursor used the index• Insert was slowed very slightly

• Reporting will never use the function

index

Page 22: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 22/43

22

Tuning – Using Partitions

Table Comparison

using views

Cursor of only the

relevant partition 

Page 23: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 23/43

23

Tuning – Using Partitions

Table Comparison using views

SQL submitted to create cursor

SELECT … 

FROM STAGE_BORABORA.CUSTOMER_DIM

 WHERE CUSTOMER_ID >= <CUSTOMER_ID>

ORDER BY CUSTOMER_ID ASC

Page 24: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 24/43

24

Tuning – Using Partitions

Table Comparison using views - Benefit

Cursor has 25% of the scope of data

Could run in as little as 25% of the time

Requires an index only on CUSTOMER_IDWHERE clause in now included against

the compare table 

Page 25: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 25/43

25

Tuning – Using Partitions

Table Comparison using views - Cost

Extra DDL

Additional “table” to import into the

Datastore and migrateNot worth the effort for small look up

tables

Page 26: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 26/43

26

Tuning – New Feature

Table Comparison – New feature

Does it make views obsolete?

Page 27: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 27/43

27

Tuning – New Feature

Table Comparison – New feature

Very close to the same performance as a

partition sourced view

Page 28: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 28/43

28

Tuning – New Feature

Table Comparison – New feature

What else could it be used for?

CURRENT_IND = ‘Y’ 

DELETED_IND = ‘N’ 

Page 29: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 29/43

29

Tuning

Table Comparison – compare methods

Compare

Method

Input

Size

Compare

Table Size Pro/Con

Row-by-row Small Any No memory impact

Cached Small Small Fastest compare, largepotential memory

impact

Sorted Large Large No memory impact,

result set must besorted to match

compare table PK

Page 30: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 30/43

30

Tuning

Table Comparison – compare method speed

Input Rows  Compare Rows  Compare Method  Execution Time 

3,240,081 6,995,950Sorted Input 3,268 seconds

Row-by-row-select 31,657 seconds

Page 31: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 31/43

31

Tuning

Display Optimized SQL

Show joins pushed down to the database

Show columns in result set

Show Group By and Order By pushed down

Page 32: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 32/43

32

Tuning

Display Optimized SQL

Bad joins

Page 33: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 33/43

33

Tuning

Display Optimized SQL

Good joins Allmappings

here?All

expressionshere?

Should there be an ORDER BY or GROUP BY?

Page 34: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 34/43

34

Tuning

SQL Transform

• Transform of last resort

• Use the Query transform first

• No metadata captured from SQL transform

•Use SQL transform when

• Query is very complex

• Optimizer hints or partition name has to be supplied

• Joins are not pushed down (by a query transform)

• Variables can be passed in

• Columns must be named

Page 35: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 35/43

35

Tuning

SQL Transform - Example

Clients

 Join – 

 atthe job

server

Max(contract number)

by client (cached)

35

Compare – 

 only updatechanged Clients

X

i

Page 36: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 36/43

36

Tuning

SQL Transform – Example (optimized)

Clients with new

max(contract number)

Normal to Update

36

T i

Page 37: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 37/43

37

Tuning

SQL Transform - Example

Max

contract

number

Only the clients that need to be updated

T i

Page 38: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 38/43

SQL Transform – Example with partitioning

38

Tuning

Partition name as a parameter

T i

Page 39: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 39/43

39

Tuning

SQL Transform – Example (with metadata solution)

T i

Page 40: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 40/43

40

Tuning

What is this? Join at the job server

Full cached

result

sets of

staging and

target

Was target

row found?

Same table!

T i

Page 41: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 41/43

41

Tuning

Much better implementation Less development time

Better execution time

Very low memory utilization

K L i

Page 42: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 42/43

52

Key Learnings

Action Benefit

Tune ETL and DDL

together

Better performance

Associated Staging,

Partitions

Reduce contention, parallel

loads

SQL Transform Reduce complexity,

performance

Lookups Eliminate database joins,

performance

Page 43: Performance Tuning for Data Services

8/13/2019 Performance Tuning for Data Services

http://slidepdf.com/reader/full/performance-tuning-for-data-services 43/43

Thank you for participating.

Please provide feedback on this session bycompleting a short survey via the event

mobile application.SESSION CODE: 0214

Learn more year-round at www.asug.com