db2 sql tuning.ppt

8/14/2019 DB2 SQL TUNING.ppt

1/53


2/53

Topics of Discussion

General Recommendation

Predicates Evaluation

Filter Factor

Column Correlation

Writing Subquery

Special Techniques to influence access path Using DB2 EXPLAIN

Different Access Types


3/53

Topics of Discussion

Join Methods

DB2 Data and Index page prefetch

Sorting of Data and RIDs

View Merge/Materialization

Query Parallelism

DB2 V5 Features


4/53


Make Sure

Queries are as simple as possible

Unused rows are not fetched. Filtering to be doneby DB2 not in the application program.

Unused columns are not selected

There is no unnecessary ORDER BY or GROUPBY Clause

Use page level locking and try to minimize lock

duration.

Big tables should be handled with care.


5/53


Try to use indexable predicates wherever possible

Do not code redundant predicates

Make sure that declared length of host variable isnot greater than length attribute of data column.

If there are efficient indexes available on the

tables in the subquery, co-related subquery willperform better. Otherwise no co related subquery

will perform better.

If there are multiple subqueries, make sure that

they are ordered in efficient manner.


6/53

Predicate

Predicates are found on WHERE, ON and

HAVING clause of the SQL. ON predicates are

applied first, then WHERE predicates and afterdata access HAVING predicates are applied.

Predicates on HAVING clause are not used when

accessing data

Predicate types have great impact on choosing

DB2 access path.

General Predicate Types

- Subquery, Equal, Range, IN-List and NOT


7/53

Predicate

Predicates are also categorized into Indexable and

non-Indexable predicates.

Predicates are classified into Stage 1 and Stage 2predicates depending on when they are used

during query evaluation.

For Outer join, predicates on ON clause are

treated as stage 2 predicates. And most of the

other predicates are applied after JOIN as stage 2

predicates. Predicates on table expression can be

evaluated before join as stage 1 predicate


8/53

Order of Predicate Evaluation

Predicates are evaluated in following sequence :

Indexable matching predicates are accessed first

Next is Indexable non-matching index. Processingis called index screening

All Indexable predicates are stage 1 predicates

After data page access, other stage 1 predicates areapplied

Finally stage 2 predicates are applied in returned

data rows


9/53

Order of Predicate Evaluation

Predicate evaluation within each stage :

All equal predicates are applied first

Range predicates are next

All other predicates at the last

After both of the sets of rule are applied,predicates are evaluated in the order, they appear

in the query.


10/53

Filtering

DB2 will apply the most restrictive predicate first

to reduce the processing at next stage for stage 1

predicates. It is better to code predicate with high filtering

factor first

DB2 evaluates the filter factor based on catalog

information in SYSCOLUMNS and

SYSCOLDIST tables.


11/53

Filtering

If distribution of column value is not available in

catalog table SYSIBM.SYSCOLDIST, DB2

assumes normal distribution for filtering. Make sure the catalog tables are updated either

manually or running RUNSTATS.

If there is no information available from the

catalog tables, DB2 assumes default filter factor.

DB2 uses default filter factor for predicates in

static SQLs using host variables.


12/53

Column Correlation

Two columns of a table are said to be co-related if

they are depended on each other.

DB2 might not determine optimum access path,table order or join method when query uses highly

correlated columns.

Column correlation makes query cheaper than

actually they are.

Run RUNSTATS to update the catalog tables with

correct correlation to help DB2 to find actual

filtering factor.


13/53

Using host variable effectively

If static SQL has host variables, DB2 might not

select the optimum access path as it uses default

filter factor for the predicates using host variable. There are two ways to change the access path for

the query that contains host variables

- Using REOPT(VARS) option to change the

access path at run time.

- Rewrite the SQL in a different way.


14/53

Writing Subquery

Subquery could be correlated and non-correlated

For Correlated subquery, for each row, returned

from the outer query, the subquery is evaluated. For non-correlated subquery, Inner query is

evaluated first and then the outer query

DB2 sometime can transform the subquery to join

and sometime application programmer has to do it

for better performance

Any subquery could be transformed to a join


15/53

Writing Subquery

If you use columns from the both the tables, better

to use JOIN

Guidelines for Writing efficient subquery : If there are efficient indexes available on the tables in the

subquery, then a correlated subquery is likely to be the

most efficient kind of subquery.

If there are no efficient indexes available on the tables inthe subquery, then a non-correlated subquery would likely

perform better.

If there are multiple subqueries in any parent query, make

sure that the subqueries are ordered in the most efficientmanner.


16/53

Some Special Techniques

OPTIMIZE OF n ROWS

Reducing the number of matching columns for

index scan Adding extra local predicates

Changing inner join to outer join

Updating Catalog Statistics


17/53

OPTIMIZE FOR n ROWS

DB2 Chooses the access path that minimizes the

response time for retrieving the first few rows

Using OPTIMIZE FOR does not stop the useraccessing whole result set.

This is not useful when DB2 has to gather whole

result set before returning the first n rows.


18/53

Influencing access path

DB2 evaluates the access path based on

information available in catalog tables

Wrong catalog information or unavailable cataloginformation may result in selection of wrong

access path

Wrong access path could be because of wrong

index selection

It also could be of index selection where

tablespace scan is effective


19/53


20/53

Other factors

Adding extra predicate may influence in selection

of join method

If you have extra predicate, Nested loop join maybe selected as DB2 assumes that filter factor will

be high. The proper type of predicate to add is

WHERE T1.C1 = T1.C1

Hybrid join is a costlier method. Outer join does

not use hybrid join. So If hybrid join is used by

DB2, convert inner join to outer join and add extra

predicates to removes unneeded rows.


21/53

Updating catalog tables

Access path based on catalog column values

Catalog tables could be updated manually or

running RUNSTATS with appropriate options Tables which are frequently changed, access

method on them may suffer as statistics are not

reflected

Running RUNSTATS is a costly process. So

catalog statistics manually should be updated.

It is necessary to rebind the static SQLs after

catalog statistics update


22/53

DB2 EXPLAIN AND TUNING

EXPLAIN is a monitoring tool that produces

information about a plan, package, or SQL

statement when it is bound. The output appears ina user-supplied table called PLAN_TABLE

It helps you to do the following

Design databases, indexes, and application

programs

Determine when to rebind an application

Determine the access path chosen for a query


23/53

DB2 EXPLAIN OUTPUT

Explain output is stored in PLAN_TABLE

Each plan is identified by APPLNAME column

Each package is identified by PROGNAME,COLLID and VERSION

In each package, you might have multiple SQLs

and each is identified by QUERYNO

For each query will be evaluated in multiple stages

and each stage is identified by QBLOCKNO and

PLANNO


24/53

Type of Access

Tablespace Scan (ACCESSTYPE = R)

Index scan

Index scan can categorized into

Index Only Access (INDEXONLY = Y)

Multiple index Scan

(ACCESSTYPE=M,MI,MU,MX) Matching index scan (MATCHCOLS > 0)

Non-Matching index scan ( MATCHCOLS = 0)

One fetch access (ACCESSTYPE= I1)


25/53

Tablespace scan (ACCESSTYPE=R)

A matching index scan is not possible because an

index is not available, or there are no predicates to

match the index columns. high percentage of the rows in the table is

returned. In this case an index is not really useful,

because most rows need to be read anyway.

The indexes that have matching predicates have

low cluster ratios and are therefore efficient only

for small amounts of data.

Sequential prefetch is used (PREFETCH=S)


26/53

Using Index

Index to be defined should be solely based on how

does application fetch data

Proper definition of index will avoid sort You need to trade cost of defining index and

performance.


27/53

Matching Index Scan

Match index scan provide filtering

This is possible if predicates are specified on

either the leading or all of the index key columns If degree of filtering is high. Matching index scan

is efficient

MATCHCOLS will provide the number of

matching columns

If there are more than one index, DB2 will use IX

with most restrictive filtering for matching index

scan


28/53

Non-Matching IX scan

This is also called Index Screening

DB2 select index screening when predicates are

specified on index key columns but are not part ofthe matching columns

Index screening predicates improve the index

access by reducing the number of rows that

qualify while searching the index

MATCHCOLS = 0 and ACCESSTYPE = I


29/53

IN LIST Index Scan

An IN-list index scan is a special case of the

matching index scan, in which a single indexable

IN predicate is used as a matching equal predicate. PLAN TABLE shows MATCHCOLS > 0 and

ACCESSTYPE = N


30/53

Multiple Index Scan

Multiple index access uses more than one index to

access a table

It is a good access path when No single indexprovides efficient access OR A combination of

index accesses provides efficient access.

LIST Sequential prefetch is used as RIDs are

collected from each index scan

ACCESSTYPE = M,MI,MU,MX and

PREFETCH = L

Same index also may be scanned more than ones


31/53

One Fetch Access

One-fetch index access requires retrieving only

one row. It is the best possible access path if

available.One-fetch index access is a possible when :

There is only one table in the query.

The column function is either MIN or MAX and There is

an ascending index column for MIN, and a descendingindex column for MAX.

Either no predicate or all predicates are matching

predicates for the index. And There is no GROUP BY.


32/53

Index Only access

With index-only access, the access path does not

require any data pages because the access

information is available in the index Because the index is almost always smaller than

the table itself, an index-only access path usually

processes the data efficiently.

ACCESSTYPE = I AND INDEXONLY = Y


33/53

JOIN

A join operation retrieves rows from more than

one table and combines them. The operation

specifies at least two tables, but they need not bedistinct.

Application joins are called inner join, left outer

join, right outer join and full outer join

DB2 internally uses three types of join method -Nested loop join, Merge Scan Join and Hybrid

Join

Hybrid join is not used for OUTER join.


34/53

Nested Loop Join (METHOD =1)

Initially two tables are picked out and one is used

as inner table and other one as composite table.

For each row in outer table, inner table is scannedfor matching rows in inner(New) table. And the

composite table is prepared. This composite table

is used as outer table at the next stage.

Process continues until and unless all the tableshave been selected.

Composite table is sorted when order of join

columns on both the table are not same.


35/53

Nested Loop Join (Method = 1)

Nested loop join is efficient when

Outer table is small. Predicates with small filter

factor reduces no of qualifying rows in outer table. The number of data pages accessed in inner table

is also small.

Highly clustered index available on join columns

of the inner table.

This join method is efficient when filtering for

both the tables(Outer and inner) is high.


36/53

Merge Scan Join (Method = 2)

DB2 scans both the tables in order of join column

If there is no efficient index to provide the order,

DB2 might sort the either or both the tables. DB2 reads a row from the outer table and keep on

reading the inner table as long as a match is there.

When there is no match, DB2 reads another row

from outer table.

If outer table has a new value, DB2 searches ahead

in the inner table.


37/53

Merge Scan Join (Method = 2)

Merge scan is used when :

Qualifying rows of inner and outer tables are large

and join predicates also does not provide muchfiltering

Tables are large and have no indexes with

matching columns


38/53

Hybrid Join(Method=4)

It is only used for Inner Join and requires an index

on the join column of inner table.

Join the outer table with RIDs from the index onthe inner table. Index of the inner table is scanned

for each row in outer table.

Sort the data on RID orders and retrieve the data

from inner table using list prefetch

Concatenates data from inner table to form the

resultant table.


39/53

Hybrid Join(Method=4)

Hybrid join is used often when a non-clustered

index available on join column of the inner table

and there are duplicate qualifying rows on outertable.

Hybrid join handles are duplicates in the outer

table as inner table is scanned only ones for each

set of duplicate values. Prefetch method is LIST SEQUENTIAL


40/53


41/53


42/53

Sequential Detection

If DB2 does not choose prefetch at bind time, it

can sometimes do that at execution time. The

method is called sequential detection. If a table is accessed repeatedly using the same

statement (SQL in a do-while loop), the data or

index leaf pages of the table can be accessed

sequentially. DB2 can use this technique if it did not choose

sequential prefetch at bind time because of an

inaccurate estimate of the no of pages to be


43/53

Sorting of data

Sort can happen on a new table or on the

composite table

Sort is required by ORDER BY or GROUP BYclause. (SORTC_GROUPBY/SORTC_ORDERBY = Y).

Sort is required to remove duplicates while

DISTINCT or UNION is used. (SORTC_UNIQ=Y)

During Nested loop and Hybrid join, composite

table is sorted and Merge scan join, both of the

tables might be sorted to make join efficient.(SORTN_JOIN/SORTC_JOIN=Y)


44/53

Sorting of data

Sort is need for subquery processing. Result of the

subquery is sorted and put into the work file for

later reference by parent query. DB2 sorts RIDs into ascending page number order

in order to perform list prefetch. This sort is very

fast and is done totally in memory

If sort is required during CURSOR processing, itis done during OPEN CURSOR. Once cursor is

closed and opened, sort is to be performed again.


45/53

View Merge

If query is using view, view name ultimately will

be resolved to table name. This process is called

view merge. the statement that references the view is combined

with the subselect that defined the view. This

combination creates a logically equivalent

statement. This equivalent statement is executedagainst the database


46/53

View Materialization

Views can not be merged if view definition

involves column functions. In that case view

materialization is required. Done in two stages :

1) The view's defining subselect is executed against

the database and the results are placed in a

temporary copy of a result table.

2) The view's referencing statement is then executed

against the temporary copy of the result table to

obtain the intended result


47/53

Query Parallelism

When DB2 plans to access data from a table or

index in a partitioned table space, it can initiate

multiple parallel operations to reduce the response

time for data or processor-intensive queries.

Two types of parallelism -

1) Query I/O parallelism - Manages concurrent I/O

request for a single query.

2) Query CP parallelism - Enables multitasking

within a single query. Query is broken into parts

and processed.


48/53

Query Parallelism

Parallel processing is enabled using

DEGREE(ANY) on BIND and REBIND for static

SET CURRENT DEGREE = ANY for dynamic The virtual buffer pool parallel sequential

threshold (VPPSEQT) value must be large enough

to provide adequate buffer pool space for parallel

processing

Degree = 1 disables parallel processing


49/53


50/53

Tools

Tools for Performance Analysis

- DB2 PM

- CANDLES OMEGAMON- SMF/RMF Data

- DB2 TRACE

Tools for Access Path Analysis- DB2 EXPLAIN

- VISUAL EXPLAIN


51/53

Risks

There is no GOLDEN RULE for DB2 SQL tuning

Wrong Analysis of performance Data and access

method information may led to more performanceoverhead

While tuning SQL in test environment, the person

should keep in mind that amount of data and DB2

sub-system setup are not same.

Person with good knowledge of DB2 should be

involved with tuning activity.


52/53

BP Buffer PoolCP Central Processor

CPC Central Processing CageDASD Direct Access Storage Device

DB2 PM DB2 Performance Monitor

DBD Database Descriptor DS Dataset

IX Index

LDS Linear VSAM DatasetRI Referential Integrity

RID Row Identifier

RMF Resource Monitoring facilitySMF System Monitoring Facility

SQL Structured Query Language

T TableTS Tablespace

VSAM Virtual Storage Access Method

GLOSSARY


53/53

db2 sql tuning.ppt

Documents