performance tuning for data services
TRANSCRIPT
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 1/43
September 9 –11, 2013
Anaheim, California
My ETL is faster than your database!
Advanced ETL performance tuning for Data Services
Jim Egan
Senior Consultant – Mantis Technology Group, Inc.
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 2/43
2
Learning Points
Tuning ETL and database objects together
can improve overall job execution speed
Views can be great for ETL performance
Complete pushdown to the database isn'talways the best solution
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 3/43
3
Tuning
The Holistic approach to tuning
ETL should never be developed as if the
database was a black box
ETL and DDL are created to complimenteach other
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 4/43
Real world use case
Associated Staging
4
Tuning
DW
Oracle Instance
Stage_BoraBora
Stage_Samoa
Stage_Tahiti
Stage_TongaTonga
Tahiti
Samoa
BoraBora
S o u r c e
S t a g i n g
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 5/43
5
Tuning
Associated Staging - Benefits
No contention compared to a single
staging table
Jobs run in parallel for all sources
Great place to put views that filter data
down to a specific source
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 6/43
6
Tuning
Associated Staging - Costs
More DDL to deploy
Each schema/database is identical
Requires use of System Configuration andaliases in ETL
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 7/43
7
Tuning
What’s missing?
Associated partitions in target table
P_BoraBora
P_Samoa
P_Tahiti
P_Tonga
Customer_DimCustomer_Dim
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 8/43
8
Tuning
Associated partitions – Benefits
Partition exchange
Staging Views are linked to a partition
No WHERE clause filterFROM <tbl> PARTITION (<partition>)
Zero contention on target table
Compression by partition
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 9/43
9
Tuning
Associated partitions – Benefits
Very fast archive/truncate of old data if
partitioned by date
Parallel truncate/load is now possible
“Dramatically improves query
performance”, according to Oracle
Transparent to reports
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 10/43
10
Tuning
Associated partitions – Costs
Data Services doesn’t do partition
exchange
Requires a small amount of scripting
Bulk Loader should be avoided
Be careful when enabling Partitions in the
Dataflow target and source tables
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 11/43
11
Tuning
Using partitions in a Dataflow
Lookups
Table Comparison transform
SQL transform
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 12/43
12
Tuning – Using Partitions
Lookups
Q: How is this expression handled by the
database?
A: It isn’t!
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 13/43
13
Tuning – Using Partitions
Lookups
The expression is evaluated at the job
server
All rows from the table are cached
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 14/43
14
Tuning – Using Partitions
Lookups
The fix: create a view in the staging
schemaCREATE VIEW STAGE_BORABORA .CUSTOMER_DIM
AS
SELECT *
FROM DW .CUSTOMER_DIM
PARTITION(P_BORABORA);
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 15/43
15
Tuning – Using Partitions
Lookups
Change the lookup
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 16/43
16
Tuning – Using Partitions
Lookup using views - Benefit
Populates in 25% of the time
Uses 25% of the memory
Lookup process is fasterViews could be created with or without
partitions
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 17/43
17
Tuning – Using Partitions
Lookup using views - Cost
Extra DDL to create
Additional “table” to import into the
Datastore and migrateNot worth the effort for small look up
tables
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 18/43
18
Tuning – Using Partitions
Table Comparison
using views
Cursor of the
entire comparetable
Will this use an index?
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 19/43
19
Tuning – Using Partitions
Table Comparison using views
SQL submitted to create cursor
SELECT …
FROM DW.CUSTOMER_DIM
ORDER BY NLSSORT( SOURCE_SYSTEM,
'NLS_SORT=BINARY') ASC,
CUSTOMER_ID ASC
An index on SOURCE_SYSTEM, CUSTOMER_ID
will not be used because of the function
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 20/43
20
Tuning – Using Partitions
Synchronizing the Index with ETL
To support this: ORDER BY NLSSORT( SOURCE_SYSTEM, 'NLS_SORT=BINARY') ASC,
CUSTOMER_ID ASC
The (function) index has to look exactly like
this:CREATE INDEX CUSTOMER_DIM_IDX1
ON CUSTOMER_DIM(
NLSSORT( SOURCE_SYSTEM, 'NLS_SORT=BINARY') ASC,
CUSTOMER_ID ASC);
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 21/43
21
Tuning – Using Partitions
Matching Index definition to ETL usage
• Is it worth it?
• In this case it improved performance
• Table compare cursor used the index• Insert was slowed very slightly
• Reporting will never use the function
index
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 22/43
22
Tuning – Using Partitions
Table Comparison
using views
Cursor of only the
relevant partition
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 23/43
23
Tuning – Using Partitions
Table Comparison using views
SQL submitted to create cursor
SELECT …
FROM STAGE_BORABORA.CUSTOMER_DIM
WHERE CUSTOMER_ID >= <CUSTOMER_ID>
ORDER BY CUSTOMER_ID ASC
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 24/43
24
Tuning – Using Partitions
Table Comparison using views - Benefit
Cursor has 25% of the scope of data
Could run in as little as 25% of the time
Requires an index only on CUSTOMER_IDWHERE clause in now included against
the compare table
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 25/43
25
Tuning – Using Partitions
Table Comparison using views - Cost
Extra DDL
Additional “table” to import into the
Datastore and migrateNot worth the effort for small look up
tables
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 26/43
26
Tuning – New Feature
Table Comparison – New feature
Does it make views obsolete?
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 27/43
27
Tuning – New Feature
Table Comparison – New feature
Very close to the same performance as a
partition sourced view
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 28/43
28
Tuning – New Feature
Table Comparison – New feature
What else could it be used for?
CURRENT_IND = ‘Y’
DELETED_IND = ‘N’
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 29/43
29
Tuning
Table Comparison – compare methods
Compare
Method
Input
Size
Compare
Table Size Pro/Con
Row-by-row Small Any No memory impact
Cached Small Small Fastest compare, largepotential memory
impact
Sorted Large Large No memory impact,
result set must besorted to match
compare table PK
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 30/43
30
Tuning
Table Comparison – compare method speed
Input Rows Compare Rows Compare Method Execution Time
3,240,081 6,995,950Sorted Input 3,268 seconds
Row-by-row-select 31,657 seconds
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 31/43
31
Tuning
Display Optimized SQL
Show joins pushed down to the database
Show columns in result set
Show Group By and Order By pushed down
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 32/43
32
Tuning
Display Optimized SQL
Bad joins
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 33/43
33
Tuning
Display Optimized SQL
Good joins Allmappings
here?All
expressionshere?
Should there be an ORDER BY or GROUP BY?
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 34/43
34
Tuning
SQL Transform
• Transform of last resort
• Use the Query transform first
• No metadata captured from SQL transform
•Use SQL transform when
• Query is very complex
• Optimizer hints or partition name has to be supplied
• Joins are not pushed down (by a query transform)
• Variables can be passed in
• Columns must be named
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 35/43
35
Tuning
SQL Transform - Example
Clients
Join –
atthe job
server
Max(contract number)
by client (cached)
35
Compare –
only updatechanged Clients
X
i
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 36/43
36
Tuning
SQL Transform – Example (optimized)
Clients with new
max(contract number)
Normal to Update
36
T i
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 37/43
37
Tuning
SQL Transform - Example
Max
contract
number
Only the clients that need to be updated
T i
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 38/43
SQL Transform – Example with partitioning
38
Tuning
Partition name as a parameter
T i
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 39/43
39
Tuning
SQL Transform – Example (with metadata solution)
T i
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 40/43
40
Tuning
What is this? Join at the job server
Full cached
result
sets of
staging and
target
Was target
row found?
Same table!
T i
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 41/43
41
Tuning
Much better implementation Less development time
Better execution time
Very low memory utilization
K L i
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 42/43
52
Key Learnings
Action Benefit
Tune ETL and DDL
together
Better performance
Associated Staging,
Partitions
Reduce contention, parallel
loads
SQL Transform Reduce complexity,
performance
Lookups Eliminate database joins,
performance
8/13/2019 Performance Tuning for Data Services
http://slidepdf.com/reader/full/performance-tuning-for-data-services 43/43
Thank you for participating.
Please provide feedback on this session bycompleting a short survey via the event
mobile application.SESSION CODE: 0214
Learn more year-round at www.asug.com