informix sql performance tuning mike walker phone: 1-888-uci for u 1-888-824-3678 fax:...

Informix SQLPerformance Tuning

Mike Walker

UCIConsulting, Inc.

Phone: 1-888-UCI FOR U1-888-824-3678

Fax: 1-609-654-0957e-mail: [email protected]

Overview:

Discuss steps for optimizing Discuss the output of the Set Explain command Finding Slow Running SQL Discuss Indexing Schemes Data Access Methods Optimizer Directives Discuss optimization techniques and examples XTREE command Correlated Sub-Queries

What will not be covered:

Engine & Database Tuning: Onconfig settings Disk/Table Layouts Fragmentation, etc

Steps for Optimizing

Optimization Goal:Increase Performance

Reduce I/O reduce I/O performed by the engine reduce I/O between the back-end and the

front-end (reduce number of database operations)

Reduce processing time

Setting up a Test Environment Identify Problem Queries

Easier to spot Easier to trace

Simplify Queries Test on a machine with minimal system activity Use database that reflects production data

Number of rows & similar distributions Want same query plan Want similar timings

Turn Set Explain on Change configuration parameters Turn PDQ on Bounce engine to clear LRUs

Optimizing the Query:Understand the Requirements

What is the object of the query? What is the information required? What is the order criteria?

Optimizing the Query:Examine the Schema

Identify the the data types and indexes on the columns being: selected used as filters used in joins used for sorting

Be aware of constraints on the data ( e.g. primary, check, etc. ) Some constraints are enforced with indexes Primary and Unique constraints may help identify when expect

single row to be returned Check constraints may hint at data distributions

Optimizing the Query:Examine the Data

Consider the number of rows examined vs. the number of rows returned

Determine the distribution of filter columns dbschema -hd <tablename> -d <database>

(if have stats on that column) Select count with group

Look at the relationship of joined tables: one-to-one one-to-many many-to-many

Optimizing the Query:Run, Examine and Modify

Run the Query:query.sql

$ timex dbaccess db query.sql > try1.out 2>&1

UPDATE STATISTICS ON TABLE query_table;SET EXPLAIN ON;SELECT . . .

Examine the Set Explain output Modify the query and/or schema (use

directives to test various paths) Run the query again

Optimizing the Query:Explain Output

The query plan is written to the file: sqexplain.out

File is created in the current directory (UNIX)

If use SQLEditor – file will be in home directory of the user that SQL was executed as

File will be appended to each time more SQL is executed in the same session

For NT, look for a file called “username.out” in %INFORMIXDIR%\sqexpln on the server


Sometimes sqexplain.out will not be written to, even though “SET EXPLAIN ON” statement has been executed

Turn off the EXPLAIN and turn it back on again:

SET EXPLAIN OFF;

SET EXPLAIN ON;

SELECT …


In IDS 9.4…

onmode –Y <sid> [0|1]

Set or unset dynamic explain

Creates file called: sqexplain.out.sid

May have “issues”…

Set Explain Output

Set Explain: Example 1QUERY:

select * from stock order by description

Estimated Cost: 6

Estimated # of Rows Returned: 15

Temporary Files Required For: Order By

1) informix.stock: SEQUENTIAL SCAN

Set Explain: Example 2QUERY:

select * from stock where unit_price>20order by stock_num

Estimated Cost: 3Estimated # of Rows Returned: 5

1) informix.stock: INDEX PATH Filters: informix.stock.unit_price > 20 (1) Index Keys: stock_num manu_code

Set Explain: Example 3QUERY:select manu_code from stock


1) informix.stock: INDEX PATH (1) Index Keys: stock_num manu_code (Key-Only)

Set Explain: Example 4

QUERY:select * from stockwhere stock_num>10 and stock_num<14


1) informix.stock: INDEX PATH (1) Index Keys: stock_num manu_code Lower Index Filter: informix.stock.stock_num > 10 Upper Index Filter: informix.stock.stock_num < 14

Set Explain: Example 5QUERY:select * from stock, itemswhere stock.stock_num = items.stock_numand items.quantity>1


1) informix.stock: SEQUENTIAL SCAN2) informix.items: INDEX PATH Filters: informix.items.quantity > 1 (1) Index Keys: stock_num manu_code Lower Index Filter: informix.items.stock_num = informix.stock.stock_num

Set Explain: Example 6QUERY:------select *from items, stockwhere items.total_price = stock.unit_price


1) informix.items: SEQUENTIAL SCAN


DYNAMIC HASH JOIN Dynamic Hash Filters: informix.items.total_price = informix.stock.unit_price

Set Explain: Example 7Table ps_ledger has the following index:

create index psaledger on ps_ledger (account,fiscal_year,accounting_period,business_unit,ledger,currency_cd,statistics_code,deptid,product,posted_total_amt

) fragment by expression ( fiscal_year = 2003 ) in dbspace1,( fiscal_year = 2004 ) in dbspace2,remainder in dbspace3

Set Explain: Example 7 cont.QUERY:------select fiscal_year, account, posted_total_amtfrom ps_ledgerwhere fiscal_year = 2003 and accounting_period = 10 and account between '1234' and '9999'

1) sysadm.ps_ledger: INDEX PATH

Filters: (ps_ledger.fiscal_year = 2003 AND ps_ledger.accounting_period = 10 )

(1) Index Keys: account fiscal_year accounting_period business_unit ledger currency_cd statistics_code deptid product posted_total_amt (Key-Only) (Serial, fragments: 0)Lower Index Filter: ps_ledger.account >= '1234'Upper Index Filter: ps_ledger.account <= '9999'

Any Questions?

Finding Slow SQL

Finding Slow SQL onstat –u

address flags sessid user tty wait tout locks nreads nwrites1b062064 Y--P--- 44948 cbsread - 1c5f3cc8 0 1 0 01b06662c ---PX-- 44961 cbsdba - 0 0 0 2022 1180081b067520 Y--P--- 39611 cbsuser - 1ecf6f00 0 1 5308 61240

address flags sessid user tty wait tout locks nreads nwrites1b062064 Y--P--- 44948 cbsread - 1c5f3cc8 0 1 0 01b06662c ---P--- 44961 cbsdba - 0 0 1 2372 1352001b067520 Y--P--- 39611 cbsuser - 1ecf6f00 0 1 5308 61240

address flags sessid user tty wait tout locks nreads nwrites1b062064 Y--P--- 44948 cbsread - 1c5f3cc8 0 1 0 01b06662c ---P--- 44961 cbsdba - 0 0 1 31294 68033081b067520 Y--P--- 39611 cbsuser - 1ecf6f00 0 1 5308 61240

Finding Slow SQL

onstat –g ntt

Individual thread network information (times):

netscb thread name sid open read write address

1d380f00 sqlexec 44961 16:46:29 16:46:29 16:46:29

>date

Wed Apr 7 16:49:49 MDT 2004

Query has been executing for 3 mins 20 secs

Finding Slow SQL

onstat –g sql 44961 or onstat –g ses 44961

Sess SQL Current Iso Lock SQL ISAM F.E.

Id Stmt type Database Lvl Mode ERR ERR Vers

44961 SELECT cbstraining CR Not Wait 0 0 7.31

Current statement name : slctcur

Current SQL statement :

select * from tab1, tab2 where tab1.a = tab2.b order by tab2.c

Last parsed SQL statement :

select * from tab1, tab2 where tab1.a = tab2.b order by tab2.c

Finding Slow SQL

These Informix “onstat” commands are easily “scriptable”!!

Create a “suite” of performance monitoring scripts

Indexing Schemes

Indexing Schemes: B+ Trees100

>

500

>

100

>

15

25

99

100

132

190

400

500

501

699

850

999

D A T A

Level 2 (Root Node)

Level 1

Level 0

Indexing Schemes:Types of Indexes

Unique Duplicate Composite Clustered Attached Detached

In 9.x, all indexes are detached - index pages and data pages are not interleaved

Indexing Schemes:Leading Portion of an Index

Consider an index on columns a, b and c on table xyz.

Index is used for:SELECT * FROM XYZWHERE a = 1 AND b = 2 AND c = 3

SELECT * FROM XYZWHERE a = 1 AND b = 2

SELECT * FROM XYZWHERE a = 1ORDER BY a, b, c

Index is not used for:SELECT * FROM XYZWHERE b = 2 AND c = 3

SELECT * FROM XYZWHERE b = 2

SELECT * FROM XYZWHERE c = 3ORDER BY b, c

Indexing Schemes: Guidelines

Evaluate Indexes on the following: Columns used in joining tables Columns used as filters Columns used in ORDER BY’s and GROUP BY’s

Avoid highly duplicate columns Keep key size small Limit indexes on highly volatile tables Use the FILLFACTOR option

Indexing Schemes:Benefits vs. Cost

Benefits Speed up Queries Guarantee

Uniqueness

Cost Maintenance of

indexes on Inserts, Updates & Deletes

Extra Disk Space

Any Questions?

How Data is Accessed

Data Access Methods

Sequential Scan Index Auto Index

Index Scans:Upper and Lower Index Filters

QUERY:select * from stockwhere stock_num>=99 and stock_num<=190


1) informix.stock: INDEX PATH (1) Index Keys: stock_num manu_code Lower Index Filter: informix.stock.stock_num >= 99 Upper Index Filter: informix.stock.stock_num <= 190


100

>

500

>

100

>

15

25

99

100

132

190

400

500

501

699

850

999


Create indexes on columns that are the most selective.

For example:

SELECT * FROM CUSTOMER WHERE ACCOUNT BETWEEN 100 and 1000 AND STATUS = “A” AND STATE = “MD”

Which column is the most selective? Account, status or state?

Index Scans:Key-Only

QUERY:

select manu_code from stock

where stock_num = 190

Estimated Cost: 2

Estimated # of Rows Returned: 15

1) informix.stock: INDEX PATH

(1) Index Keys: stock_num manu_code (Key-Only)

Lower Index Filter: informix.stock.stock_num = 190

Index Scans: Key-Only

Index Read (not Key Only)

Index Pages

stock_num

Data Pagesstock_num, manu_code, qty

Index Read (Key Only)

Index Pages

stock_num, manu_code


select manu_code from stock where stock_num = 190

Index Scans: Key-Only

Key Only has saved 10,000 “jumps” to the Data Pages

Index Pages



Table “stock” : 100,000 rows

stock_num = 190 : 10,000 rows

select manu_code from stock where stock_num = 190

Index Scans: Key-FirstQUERY:select count(e) from mytablewhere a=1and b=1and d="Y"


1) informix.mytable: INDEX PATH Filters: informix.mytable.d = 'Y' (1) Index Keys: a b c d (Key-First) (Serial, fragments: ALL) Lower Index Filter: (informix.mytable.a = 1 AND

informix.mytable.b = 1 )

Index Scans: Key-First

May not see much advantage with Key-First Indexes. They may help some – especially for large wide tables

Can gain some benefit from adding additional columns to the end of the index to reduce the jumps from the index pages to the data pages

Evaluate adding a new index or changing the index to include the key-first column earlier in the index

Any Questions?

Table Joins

Joining Tables: Join Methods

Nested Loop Join Dynamic Hash Join Sort Merge Join

Joining Tables

Consider the following query:

select * from stock, itemswhere stock.stock_num = items.stock_numand items.quantity>10

What we’re looking for is:

All of the items records with a quantity greater than 10

and their associated stock records.

Join Methods: Nested Loop Join

QUERY:select * from stock, itemswhere stock.stock_num = items.stock_numand items.quantity>10



2) informix.items: INDEX PATH Filters: informix.items.quantity > 10 (1) Index Keys: stock_num manu_code Lower Index Filter:items.stock_num =

stock.stock_num NESTED LOOP JOIN

Notice the index on the joined column

Joining Tables: Table Order

Consider the select:

select * from A, Bwhere A.join_col = B.join_col

How can the database satisfy this join?

Read from A then find matching rows in B Read from B then find matching rows in A

Joining Tables: Table OrderWho Cares?

A then B

1,000 reads from A

For each A row do an index

scan into B (4 reads)

Total reads: 5,000

(1,000 for A +

1,000 x 4 for B)

B then A

50,000 reads from B

For each B row do an index

scan into A (3 reads)

Total reads: 200,000

(50,000 for B +

50,000 x 3 for A)

Table A - 1000 rows Table B - 50,000 rows

This is a difference of 195,000 reads!!!

Joining Tables: Table OrderWhat is the best order?

A then B1,000 reads from A

For each A row do an index

scan into B (4 reads)

Total reads: 5,000

(1,000 for A + 1,000 x 4 for B)

Total Rows Returned: 10

B then AIndex scan of B (3 reads), then

the data (10 reads) for a total of 13

For each B row do an indexscan into A (3 reads)

Total reads: 43(13 for B + 10 x 3 for A)Total Rows Returned: 10

General Rule: The table which returns the fewest rows, either through a filter or the row count, should be first.

Table A - 1,000 rows Table B - 50,000 rowsselect * from A, Bwhere A.join_col = B.join_coland B.filter_col = 1

Assume 10 rows meet this condition

Joining Tables: Table OrderWhat affects the join order?

Number of rows in the tables Indexes available for:

Filters Join Columns

Data Distribution UPDATE STATISTICS is very important

Any Questions?

Optimizer Directives

Optimizer Directives

Changes the generated query plan by removing paths from consideration

Similar to Oracle’s HINTs Better than HINTs

More options Cannot be ignored Negative directives Set Explain output

Optimizer Directivesselect --+ORDERED* from A, Bwhere A.join_col = B.join_col

With the directive, ORDERED, the optimizer only considers paths that read from A then B.The lowest cost is then chosen from those paths.

A then B

Seq A, Seq B Cost:100

Seq A, Idx B Cost:50

Idx A, Idx B Cost:20

etc.

B then A

Seq B, Seq A Cost:100

Seq B, Idx A Cost:50

Idx B, Idx A Cost:10

etc.

Normally, this pathwould be chosen

With the directive,this path would be chosen

Optimizer Directives:Syntax

SELECT --+ directive text SELECT {+ directive text } UPDATE --+ directive text UPDATE {+ directive text } DELETE --+ directive text DELETE {+ directive text }

C-style comments are also valid as in:

SELECT /*+directive*/

Types of Directives

Access Methods Join Methods Join Order Optimization Goal Query Plan Only (IDS 9.3) Correlated Subquery Flattening (IDS 9.3)

Types of Directives:Access Methods

index forces use of a subset of specified indexes

avoid_index avoids use of specified indexes

full forces sequential scan of specified table

avoid_full avoids sequential scan of specified table

Types of Directives:Join Order

ordered forces table order to follow the FROM clause

Types of Directives:Optimization Goal

first_rows (N) tells the optimizer to choose a plan optimized to return the first N rows

of the result set

all_rows tells the optimizer to choose a plan optimized to return all of the results

“Query level” equivalent of: OPT_GOAL configuration parameter (instance level)

0=First Rows, -1=All Rows (default)

OPT_GOAL environment variable (environment level) SET OPTIMIZATION statement (session level)

FIRST_ROWS, ALL_ROWS

Types of Directives:Join Methods

use_nl forces nested loop join on specified tables

use_hash forces hash join on specified tables

avoid_nl avoids nested loop join on specified tables

avoid_hash avoids hash join on specified tables

Directives Examples: ORDEREDQUERY:select --+ ORDEREDcustomer.lname, orders.order_num, items.total_pricefrom customer, orders, itemswhere customer.customer_num = orders.customer_num and orders.order_num = items.order_num and items.stock_num = 6 and items.manu_code = "SMT"

DIRECTIVES FOLLOWED: ORDERED DIRECTIVES NOT FOLLOWED:

1) customer: SEQUENTIAL SCAN2) orders: INDEX PATH (1) Index Keys: customer_num Lower Index Filter: orders.customer_num =

customer.customer_num NESTED LOOP JOIN

3) items: INDEX PATH Filters: items.order_num = orders.order_num (1) Index Keys: stock_num manu_code Lower Index Filter: (items.stock_num = 6 AND

items.manu_code = 'SMT' ) NESTED LOOP JOIN

Directives Examples : INDEXQUERY:------select --+ ordered index(customer, zip_ix) avoid_index(orders," 101_4")

customer.lname, orders.order_num, items.total_price

from customer c, orders o, items iwhere c.customer_num = o.customer_num and o.order_num = i.order_num and stock_num = 6 and manu_code = "SMT"

Directives Examples : INDEX (cont.)DIRECTIVES FOLLOWED: ORDERED INDEX ( customer zip_ix )AVOID_INDEX ( orders 101_4 )DIRECTIVES NOT FOLLOWED:

1)customer: INDEX PATH (1) Index Keys: zipcode

2)orders: SEQUENTIAL SCANDYNAMIC HASH JOIN (Build Outer) Dynamic Hash Filters:c.customer_num =o.customer_num

3)items: INDEX PATH Filters:i.order_num =o.order_num (1) Index Keys: stock_num manu_code Lower Index Filter: (i.stock_num = 6

AND i.manu_code = 'SMT' ) NESTED LOOP JOIN

Directives Examples : Errors

QUERY:

select --+ ordered index(customer, zip_ix) avoid_index(orders," 222_4")

customer.lname, orders.order_num, items.total_price

from customer, orders, items

where customer.customer_num = orders.customer_num

and orders.order_num = items.order_num

and stock_num = 6 and manu_code = "SMT"

DIRECTIVES FOLLOWED:

ORDERED

INDEX ( customer zip_ix )

DIRECTIVES NOT FOLLOWED:

AVOID_INDEX( orders 222_4 ) Invalid Index Name Specified.

Types of Directives:Query Plan

EXPLAIN AVOID_EXECUTE

Generate the Query Plan (SQL Explain Output), but don’t run the SQL

Introduced in IDS 9.3

Especially useful for getting the query plans for Insert, Update and Deletes – no longer have to rewrite them as Select statements, or surround them with BEGIN WORK…ROLLBACK WORK commands


Without AVOID_EXECUTE:

SET EXPLAIN ON;

BEGIN WORK;

DELETE FROM x WHERE y=10;

ROLLBACK WORK;

With AVOID_EXECUTE:

SET EXPLAIN ON;

DELETE /*+ EXPLAIN AVOID_EXECUTE */

FROM x WHERE y=10;

Delete will NOT be performed, but the

execution plan will be written



SET EXPLAIN ON;

OUTPUT TO /dev/null

SELECT * FROM x

WHERE y=10;

OR


SET EXPLAIN ON;

DELETE /*+ EXPLAIN AVOID_EXECUTE */

FROM x WHERE y=10;

Feature can also be implemented without using the directive as…

SET EXPLAIN ON AVOID_EXECUTE;

DELETE

FROM x WHERE y=10;





Optimizer Directives: Pros & Cons

Pros:

Force the engine to execute the SQL the way that we want

Sometimes we know better!! Great for testing different plans

Cons:

Force the engine to execute the SQL the way that we want

Sometimes the engine knows better!! If new indexes added, number of rows changes

drastically, or data distributions change…then a better execution plan may be available

Any Questions?

Optimization Techniques

Optimization Techniques Use Composite Indexes Use Index Filters Create indexes for Key-Only scans Perform indexed reads for sorting Use temporary tables Simplify queries by using Union’s Avoid sequential scans of large tables Use Light Scans when possible Use Hash Joins when joining all rows from

multiple tables

Optimization Techniques (cont.)

Use the CASE/DECODE statements to combine multiple selects

Drop and recreate indexes for large modifications

Use Non Logging Tables

Use OUTER JOINS

Prepare and Execute statements

Optimization Techniques:Use Composite Indexes

Composite indexes are ones built on more than one column

The optimizer uses the leading portions of a composite index for filters, join conditions and sorts

A composite index on columns a, b and c will be used for selects involving: column a columns a and b columns a, b and c

It will not be used for selects involving only columns b and/or c since those columns are not at the beginning of the index( i.e. the leading portion )

Indexed Read

Index (stock_num)Index Pages

stock_num


select qty from stockwhere stock_num = 190and manu_code = 10



stock_num = 190 AND manu_code = 10 : 100 rows

Even with an index, that’s still 10,000+ reads

Composite Key

Index (stock_num, manu_code)Index Pages



select qty from stockwhere stock_num = 190and manu_code = 10



stock_num = 190 AND manu_code = 10 : 100 rows

Now just approx. 100 reads

Optimization Techniques:Use Index Filters

Create indexes on columns that are the most selective.

For example:

SELECT * FROM CUSTOMER WHERE ACCOUNT BETWEEN 100 and 1000 AND STATUS = “A” AND STATE = “MD”

Which column is the most selective? account, status or state?


Assume table xyz has an index on begin_idx & end_idx

With the following select:

SELECT * FROM xyzWHERE begin_idx >= 99 AND end_idx <= 150

The leading portion of the index, column begin_idx, will be used.


100

>

500

>

100

>

15

25

99

100

132

190

400

500

501

699

850

999


If we can change the query to include an upper bound

on begin_idx as follows:

SELECT * FROM xyzWHERE begin_idx >= 99 AND begin_idx <= 150 AND end_idx <= 150


100

>

500

>

100

>

15

25

99

100

132

190

400

500

501

699

850

999

Optimization Techniques:Key-Only Scans

Data for the select list is read from the index key -- No read of the data page is needed

Useful for inner tables of nested-loop joins

Useful for creating a “sub-table” for very wide tables

Optimization Techniques:Key-Only Scans

tab1: 1000 rowstab2: 1000 rows

create index tab1_idx on tab1(a);create index tab2_idx on tab2(b);

output to /dev/null Handy!select unique tab2.cfrom tab1, tab2where tab1.a = 1and tab1.b = tab2.b

Every Row in tab1 will join to every row in tab2

Every Row in tab1 will join to every row in tab2

Will NOT give Key Only Scan!

Key-Only Scansselect unique tab2.cfrom tab1, tab2where tab1.a = 1 and tab1.b = tab2.b

1) cbsdba.tab1: INDEX PATH

(1) Index Keys: a Lower Index Filter: cbsdba.tab1.a = 1


(1) Index Keys: b Lower Index Filter: cbsdba.tab2.b = cbsdba.tab1.bNESTED LOOP JOIN

Key-Only Scans

Index Read (NOT Key Only)

tab1

Index Pages

a

tab1

Data Pages

a, b, y

select unique tab2.c from tab1, tab2where tab1.a = 1 and tab1.b = tab2.b

tab2

Index Pages

b

tab2

Data Pages

b, c, z

Key-Only Scans

Index Read (NOT Key Only)


tab1

Index Pages

a

tab1

Data Pages

a, b, y

tab2

Index Pages

b

tab2

Data Pages

b, c, z

1,000 reads from tab1 Index Pages

1,000 “jumps” to tab1 Data Pages

1,000 reads from tab1 Data Pages

For each of these:


1,000 “jumps” to tab2 Data Pages

1,000 reads from tab2 Data Pages

That’s a lot of reads…and a lot of “jumps”!!

Timing: 50 seconds

Key-Only Scans

create index tab1_idx on tab1(a);create index tab2_idx on tab2(b);

Will NOT give Key Only Scan!

create index tab1_idx on tab1(a,b);create index tab2_idx on tab2(b,c);

WILL give Key Only Scan!

Change Indexes

Key-Only Scansselect unique tab2.cfrom tab1, tab2where tab1.a = 1 and tab1.b = tab2.b


(1) Index Keys: a b (Key-Only) Lower Index Filter: cbsdba.tab1.a = 1


(1) Index Keys: b c (Key-Only) Lower Index Filter: cbsdba.tab2.b = cbsdba.tab1.bNESTED LOOP JOIN

Key-Only Scans


tab1

Index Pages

a, b


tab2

Index Pages

b, c

tab1

Data Pages

a, b, y

tab2

Data Pages

b, c, z

Key-Only Scans




For each of these:


That’s a lot less reads…and no “jumps”!!

Timing: 35 seconds

tab1

Index Pages

a, b

tab1

Data Pages

a, b, y

tab2

Index Pages

b, c

tab2

Data Pages

b, c, z

Optimization Techniques:Indexed Reads for Sorting

Indexed reads cause rows to be read in the order of the indexed columns

Higher priority is given to indexes on columns used as filters

Reasons why an index will not be used to perform a sort: Columns in the sort criteria are not in the index Columns in the sort criteria are in a different order than

the index Columns in the sort criteria are from different tables

Optimization Techniques: Indexed Reads for Sorting

select * from some_table

where x = ?

and y = ?

order by z

Assume the table some_table has a composite index on columns x, y and z

Assume the table some_table has a composite index on columns x, y and z

Note: As of Informix Dynamic Server v7.31 this is done automatically by the optimizer

Note: As of Informix Dynamic Server v7.31 this is done automatically by the optimizer

select * from some_table

where x = ?

and y = ?

order by x, y, z

Optimization Techniques:Temporary Tables

Useful for batch reporting Avoid selecting a subset of data

repetitively from a larger table Create summary information that can be

joined to other tables

Disadvantage

The data in the temporary table is a copy of the real data and therefore is not changed if the original data is modified.

Disadvantage

The data in the temporary table is a copy of the real data and therefore is not changed if the original data is modified.

Optimization Techniques:Temporary Tables

select b.sku, sum(b.sz_qty) tot_qtyfrom ctn a, ctn_detail bwhere a.carton_stat = "Q"and a.ctn_id = b.ctn_idgroup by b.skuinto temp tmp1 with no log;

create index i1 on tmp1( sku )

select tot_qty from tmp1 where sku = ?

select sum(b.sz_qty)from ctn a, ctn_detail bwhere a.carton_stat = "Q" and a.ctn_id = b.ctn_id and b.sku = ?

The ctn table contains 300,000 records and very few records have a status of “Q”

The ctn table contains 300,000 records and very few records have a status of “Q”

Optimization Techniques:Using UNION’s

OR's can cause the optimizer to not use indexes

Complex where conditions can cause the optimizer to use the wrong index

Note: Informix Dynamic Server v7.3 allows UNION’s in views

Optimization Techniques: Using UNION’s

select sum(qty)from logwhere trans_id = 1and sku = ?and date_time > ?UNION

. . .select sum(qty)from logwhere trans_id = 4and sku = ?and date_time > ?

The log table has an index on date_time and a composite index on trans_id, sku and date_time

The log table has an index on date_time and a composite index on trans_id, sku and date_time

select sum(qty)from logwhere sku = ?and ( trans_id = 1 or trans_id = 2 or trans_id = 3 or trans_id = 4)and date_time > ?

Uses the index on date_timeUses the index on date_time

Uses the composite indexUses the composite index

Optimization Techniques: Eliminate OR Conditions

select sum(qty)from logwhere sku = ?and ( trans_id = 1 or trans_id = 2 or trans_id = 3 or trans_id = 4)and date_time > ?

select sum(qty)from logwhere sku = ?and trans_id in ( 1, 2, 3, 4)and date_time > ?

Uses the index on date_timeUses the index on date_time Uses the composite indexUses the composite index

Alternative to using UNIONs…

Note: Earlier versions of Informix still maynot use the composite index

Note: Earlier versions of Informix still maynot use the composite index

Optimization Techniques:Avoid Sequential Scans

and Auto Indexes Sequential scans of large tables are resource intensive –

use light scans if possible Sequential scans of small tables are not harmful Consider using permanent indexes to avoid sequential

scans when possible Create temporary indexes for batch reporting Replace Auto Indexes with real indexes On a loosely related topic…Consider changing the order of

columns for Key-First Scans

Optimization Techniques:Use Light Scans

What are they? Very efficient way to sequentially scan a table Go straight to disk, avoid the buffer pool

Database Engine

Buffers (LRUs)

Disk

Without Light Scans

Database Engine

Disk

With Light Scans

Optimization Techniques:Use Light Scans

How do you get them? Only used when sequentially scanning a table The table is bigger than the buffer pool PDQ must be on (SET PDQPRIORITY …) Dirty read isolation (SET ISOLATION TO DIRTY

READ) or no logging Monitor using onstat –g lsc

Optimization Techniques:Use Hash Joins

Good to use when joining a large number of rows from multiple tables

Typical join is NESTED LOOP, costly to do index scan over and over

Builds hash table in memory for one table, scans second and hashes into memory

PDQ must be turned on

DS_TOTAL_MEMORY should be set high

Optimization Techniques:Use Hash Joins & Light Scans

SELECT H.JRNL_ID, L.ACCOUNT, L.DEPTID, SUM(AMT) FROM JRNL_HDR H, JRNL_LN L WHERE H.JRNL_ID = L.JRNL_ID AND H.FISCAL_YEAR = 2001 AND H.JRNL_STATUS = ‘P’ GROUP BY H.JRNL_ID, L.ACCOUNT, L.DEPTID

Two tables, 4 years of data evenly distributed:JRNL_HDR – 1,000,000 rows JRNL_LN – 10,000,000 rows

This will join 250,000 header records with 2,500,000 line records.

With a nested loop join, the database will do an index read into the line table 250,000 times.

Optimization Techniques:Use Hash Joins & Light Scans

SET PDQPRIORITY 50;SET ISOLATION TO DIRTY READ;SELECT --+ FULL( H ) FULL( L ) H.JRNL_ID, L.ACCOUNT, L.DEPTID, SUM(AMT)FROM JRNL_HDR H, JRNL_LN LWHERE H.JRNL_ID = L.JRNL_ID AND H.FISCAL_YEAR = 2001 AND H.JRNL_STATUS = ‘P’GROUP BY H.JRNL_ID, L.ACCOUNT, L.DEPTID

Allows Light Scan

Forces Sequential Scan

This will read the 10 million line records and put them in a hash table, then the header table will be read from and the hash table will be used to do the join.

A better option might be to put an ordered directive and change the order of the from clause so the 250,000 header records are put in the hash table. It depends on the memory available to PDQ.

This is more efficient than a NESTED LOOP join.

Optimization Techniques:Hash Joins

tab1: 1000 rows

tab2: 1000 rows

output to /dev/null select unique tab2.cfrom tab1, tab2where tab1.a = 1and tab1.b = tab2.b

Every Row in tab1 joins to every row in tab2Every Row in tab1 joins to every row in tab2

Remember this example to demonstrate Key Only Scans?

Let’s try the same thing with a Hash Join…

Let’s try the same thing with a Hash Join…

Optimization Techniques:Hash Joins

select unique tab2.c

from tab1, tab2

where tab1.a = 1

and tab1.b = tab2.b

select /*+FULL(tab1) FULL(tab2)*/

unique tab2.c

from tab1, tab2

where tab1.a = 1

and tab1.b = tab2.b

Force Full Table Scans

Hash Joinsselect /*+FULL(tab1) FULL(tab2)*/

unique tab2.c

from tab1, tab2

where tab1.a = 1

and tab1.b = tab2.b

DIRECTIVES FOLLOWED:

FULL ( tab1 )

FULL ( tab2 )

DIRECTIVES NOT FOLLOWED:

1) cbsdba.tab1: SEQUENTIAL SCAN

Filters: cbsdba.tab1.a = 1

2) cbsdba.tab2: SEQUENTIAL SCAN

DYNAMIC HASH JOIN

Dynamic Hash Filters: cbsdba.tab1.b = cbsdba.tab2.b

Timing: 6 seconds…includes generating the explain plan…!

Compare with 35 seconds for Key-Only Scan

Optimization Techniques:CASE/DECODE

CASE Syntax:CASE WHEN condition THEN exprWHEN condition THEN expr…ELSE exprEND

DECODE Syntax:DECODE( expr, when_expr, then_expr, …, else_expr )


update customerset preferred = “Y” where stat = “A”

update customerset preferred = “N” where stat <> “A”

DECODE( stat, “A”, “Y”, ”N” )

OR…OR…

2 SQL Statements2 SQL Statements

update customerset preferred = case when stat=“A” then “Y” else “N” end

1 SQL Statement1 SQL Statement


select stat, count(*)from customergroup by stat

select count(*)from customerwhere stat = “A”

select count(*)from customerwhere stat = “I”

select count(*)from customerwhere stat = “D”

select SUM( DECODE( stat, “A”, 1, 0) ),SUM( DECODE( stat, “I”, 1, 0) ),SUM( DECODE( stat, “D”, 1, 0) )from customer

OR…OR…

select SUM( case when stat=“A” then 1 else 0 end ),SUM( case when stat=“I” then 1 else 0 end ),SUM( case when stat=“D” then 1 else 0 end )from customer

OR…OR…

3 SQL Statements – 3 scans of the table3 SQL Statements – 3 scans of the table

1 SQL Statement – 1 scan of the table

1 SQL Statement – 1 scan of the table

Optimization Techniques:Indexes on Function

Dilemma: LNAME in the customer table is mixed case Users want to enter “smith” and find all

occurrences of “Smith” regardless of case (e.g., “SMITH”, “Smith” or “SmiTH”

You can write a query like:SELECT *

FROM customer

WHERE UPPER( lname ) = “SMITH” Unfortunately this performs a sequential scan

of the table.


Solution: Version 9 allows indexes to be built on functions Functions must be what is called “NONVARIANT” Informix Built-in functions, such as UPPER are

variant Create your own function and use it


First create the new function:

CREATE FUNCTION UPSHIFT( in_str VARCHAR ) RETURNING VARCHAR WITH( NOT VARIANT )DEFINE out_str VARCHAR;OUT_STR=UPPER(in_str);RETURN( out_str );END FUNCTION

Then create the index on the function:

CREATE INDEX I_CUST1 ON CUSTOMER( UPSHIFT( lname ))


Then change the query to use the new function:

SELECT *

FROM customer

WHERE UPSHIFT( lname ) = “SMITH”

Things to note: If you get an error about creating an index on a variant

function, you may be trying to use a built-in function or you did not create the function with the NOT VARIANT clause.

SET EXPLAIN shows the index being used. There is overhead with this type of index. Index creation is not done in parallel if function is not

PARALLELIZABLE. SPL is not PARALLELIZABLE, only external functions

written in C or Java.

Optimization Techniques:Drop and Recreate Indexes

Useful for modifications to > 25% of the rows Eliminates overhead of maintaining indexes

during modification Indexes are recreated more efficiently

Indexes can deteriorate over time Use PDQPRIORITY for faster creation

DisadvantageMust have exclusive access to the table before doing this!

Locking the table may not be sufficient…!3-tier architecture can make this an even bigger pain!

DisadvantageMust have exclusive access to the table before doing this!

Locking the table may not be sufficient…!3-tier architecture can make this an even bigger pain!

Optimization Techniques:Non Logging Tables

In XPS…and introduced in IDS 7.31

Inserts, Updates and Deletes against rows in a tables are logged

For large operations this could produce significant overhead

Create the table as RAW or change it to RAW for the duration of the operation, and the operations will not be logged

Optimization Techniques:Non Logging Tables

CREATE RAW TABLE big_table (val1 integer,val2 char(100));

Do Load

ALTER TABLE big_table TYPE (STANDARD);

Create indexesCannot have indexes on a RAW Table!! Cannot have indexes on a RAW Table!!

Optimization Techniques:Use Outer Joins

SELECT cnum

FROM customer

WHERE status = “A”

FOREACH

SELECT onum

FROM ORDERS o

WHERE o.cnum = cnum

IF ( STATUS = NOTFOUND )

THEN

...

END IF

END FOREACH

Main SELECTMain SELECT

SELECT repeated for each row found

in Main Select

SELECT repeated for each row found

in Main Select

Ouch!Ouch!


SELECT cnum

FROM customer


FOREACH

SELECT onum

FROM ORDERS o

WHERE o.cnum = cnum

IF ( STATUS = NOTFOUND )

THEN

...

END IF

END FOREACH

SELECT cnum, onum

FROM customer c,

OUTER order o


AND c.cnum = o.cnum

FOREACH

IF ( onum IS NULL )

THEN

...

END IF

END FOREACH

ONLY 1 SELECTONLY 1 SELECT

Brilliant!Brilliant!


SELECT cnum, NVL( onum, 0 )

FROM customer c,

OUTER order o


AND c.cnum = o.cnum

FOREACH

IF ( onum = 0 )

THEN

...

END IF

END FOREACH

Can now check for zero instead of NULL

Can now check for zero instead of NULL

Use NVL to replace NULLs with something else

Use NVL to replace NULLs with something else

Optimization Techniques:Prepare and Execute

What happens when a SQL statement is sent to the engine?

Syntax Check Permission Check Optimization Statement is executed

Optimization Techniques:Prepare and Execute

FOR x = 1 to 1000

INSERT INTO some_table

VALUES ( x, 10 )

END FOR

Syntax CheckPermission CheckOptimizeExecute

Syntax CheckPermission CheckOptimizeExecute …at last!!

Do it ALL again!!!Do it ALL again!!!

Syntax CheckPermission CheckOptimize

Syntax CheckPermission CheckOptimize

ExecuteExecute

Once Only!Once Only!

PREPARE p1 FROM

“INSERT INTO some_table

VALUES ( ?, 10 )”

FOR x = 1 to 1000

EXECUTE p1 USING x

END FOR

Optimization Techniques:PDQ

Really should be using PDQ for batch processes and reporting

Enable for index builds (also Light Scans & Hash Joins)

Set DS_TOTAL_MEMORY as high as you can spare – set in config file or with onmode -M

Use MAXPDQPRIORITY to set the maximum priority that any single session is permitted – set in config file or with onmode -D

Use SET PDQPRIORITY n to set the PDQ for a session – or set in the environment (e.g. export PDQPRIORITY=80)

Optimization Techniques:PDQ

Monitor PDQ with onstat –g mgm

onstat –u : Will see multiple threads with the same session ID

onstat –g ses : Will see #RSAM Threads > 1

See Informix Manuals for more info

Any Questions?

Xtree

Xwindows interface Only works with Xwindows terminal Need the Xwindows libraries setup Provides a window into an executing query Useful for checking the speed & progress of

a query without waiting until it completes – great for testing different query plans!

This part of the window is calledthe display window and showsthe information about what ishappening in the query. Each ofthese boxes (nodes) designatesan operation of the query: sort,group, filter, scan.

This number represents thenumber of rows that have beenpassed to node above.

This number represents thenumber of rows examined persecond. The little speedometer is agraphical representation of thisnumber. The number isoccasionally negative which couldbe because it is a 2-byte integerand when it gets too high (i.e., toofast) it displays as a negative.

This part of the windowdisplays the entire query tree.If the tree is too big for thedisplay window (to the right),a black box will appear whichcan be dragged to scroll todifferent parts of the treewhich are displayed in thedisplay window.

Name

Enter Your Session ID

Enter yoursession id here.

ExampleSET EXPLAIN ON;

SELECTA.DSCNT_DUE_DT, A.SCHEDULED_PAY_DT, A.PYMNT_GROSS_AMT,

B.GROSS_AMT_BSE, A.DSCNT_PAY_AMTFROM PS_PYMNT_VCHR_XREF A, PS_VOUCHER B, PS_VENDOR C, PS_VENDOR_ADDR D, PS_VENDOR_PAY EWHERE A.BUSINESS_UNIT = B.BUSINESS_UNIT AND A.VOUCHER_ID = B.VOUCHER_ID AND A.REMIT_SETID = C.SETID AND A.REMIT_VENDOR = C.VENDOR_ID AND A.REMIT_SETID = D.SETID AND A.REMIT_VENDOR = D.VENDOR_ID AND A.REMIT_ADDR_SEQ_NUM = D.ADDRESS_SEQ_NUM AND D.EFF_STATUS = 'A' AND . . .

Need the explain plan to interpret xtree displayNeed the explain plan to interpret xtree display

Correlated Sub-Queries

Correlated Sub-QueriesWhat are they?

select c.*from customers c, orders owhere c.custid = o.custid and o.ord_date = TODAY

select c.*from customers cwhere c.custid in ( select custid from orders where ord_date = TODAY)

These are examples of non-correlated sub-queries.The performance of these two should be the same.

Correlated Sub-QueriesWhat are they?

select c.*from customers c, orders owhere c.custid = o.custid and o.stat = “OPEN”

select c.*from customers cwhere custid in ( select custid from orders o where o.stat = “OPEN” )

select c.*from customers cwhere exists ( select “X” from orders o where o.custid = c.custid and o.stat = “OPEN” )

CorrelatedNot Correlated

Outer query referenced in Inner query…Inner query must be repeated for each

row returned by the Outer query

Outer query referenced in Inner query…Inner query must be repeated for each

row returned by the Outer query

Correlated Sub-QueriesWhat’s wrong with them?

Consider the statement:

update customers set stat = “A”where exists ( select “X” from orders o where o.custid = customer.custid and o.cmpny = customers.cmpny and o.stat = “OPEN” )

The sub-query, on orders, is executed for every row retrieved from customers.

If customers table had 100,000 rows, the sub-query would get executed 100,000 times.

If orders only had 20 rows with stat=“OPEN” the database would be doing a lot of extra work.

Correlated Sub-queriesupdate customers set stat = “A”where exists ( select “X” from orders o where o.custid = customer.custid and o.cmpny = customers.cmpny and o.stat = “OPEN” )

update customers set stat = “A”where exists ( select “X” from orders o where o.custid = customers.custid and o.cmpny = customers.cmpny and o.stat = “OPEN” )

If orders has only 20 rows meeting the filter, the second version of the update runs much faster, assuming that customers has an index on the column custid.

The original CSQ is left since it was joining on more than one column

Add this condition to reduce the number of times the subquery is executed

and custid in ( select custid from orders o where o.stat = “OPEN” )

Correlated Sub-queries: Normal CSQ

QUERY:update orders set ship_charge = 0where exists (

select "X" from customer cwhere c.customer_num = orders.customer_num and c.state = "MD” )

1) informix.orders: SEQUENTIAL SCAN Filters: EXISTS <subquery>

Subquery: --------- Estimated Cost: 1 Estimated # of Rows Returned: 1

1) informix.c: INDEX PATH Filters: informix.c.state = 'MD' (1) Index Keys: customer_num Lower Index Filter: c.customer_num = orders.customer_num

Here’s the join between the inner and outer tables…

Here’s the join between the inner and outer tables…

…Yuk!…Yuk!

Correlated Sub-queries: Rewritten CSQ

QUERY:update orders set ship_charge = 0where customer_num in (

select customer_num from customer cwhere c.state = "MD” )

1) informix.orders: INDEX PATH

(1) Index Keys: customer_num Lower Index Filter:orders.customer_num = ANY <subquery>

Subquery: --------- 1) informix.c: SEQUENTIAL SCAN

Filters: informix.c.state = 'MD'

EXISTS has been changed to an INEXISTS has been changed to an IN

Subquery is no longer CORRELATEDSubquery is no longer CORRELATED

Look…No Join!!Look…No Join!!

…Yippee!!…Yippee!!

Correlated Sub-queries: CSQ Flattening

QUERY:update orders set ship_charge = 0where exists (

select "X" from customer cwhere c.customer_num = orders.customer_num and c.state = "MD” )

1) informix.c: SEQUENTIAL SCAN Filters: informix.c.state = 'MD'

2) informix.orders: INDEX PATH (1) Index Keys: customer_num Lower Index Filter: orders.customer_num = c.customer_num NESTED LOOP JOIN

Note: An index could be created on state to avoid the sequential scan.

Where did the subquery go?!Where did the subquery go?!

It was turned into a regular Nested Loop Join….AUTOMATICALLY!!

It was turned into a regular Nested Loop Join….AUTOMATICALLY!!

Correlated Sub-queries: CSQ Flattening

As of 9.3, optimizer directives can be used to indicate whether Subquery Flattening occurs…

/*+ USE_SUBQF */

/*+ AVOID_SUBQF */

Does this indicate that Subquery Flattening is not necessarily a good thing ????

Correlated Sub-queries: Predicate Promotion in CSQs

Correlated Subquery

select * from ps_jrnl_ln

where business_unit = 'ABC’

and process_instance = 5960

and not exists

( select "X"

from PS_SP_BU_GL_NONVW P

where P.business_unit = ps_jrnl_ln.business_unit

)

But, we know that we are limiting rows in the outer query by the filter:

business_unit=‘ABC’

But, we know that we are limiting rows in the outer query by the filter:

business_unit=‘ABC’

Then why don’t we just apply the same filter in the subquery?

Then why don’t we just apply the same filter in the subquery?

What a great idea….What a great idea….





and not exists

( select "X"


where P.business_unit = ps_jrnl_ln.business_unit

)




and not exists

( select "X"


where P.business_unit =

‘ABC’

)Correlated SubqueryCorrelated Subquery Non-Correlated SubqueryNon-Correlated Subquery

…becomes…

AUTOMATICALLY!!AUTOMATICALLY!!


QUERY:

select * from ps_jrnl_lnwhere business_unit = 'ABC’ and process_instance = 5960 and not exists( select "X"from PS_SP_BU_GL_NONVW P

where P.business_unit = ps_jrnl_ln.business_unit)

Let’s take a look at the query plan…Let’s take a look at the query plan…


1) ps_jrnl_ln: INDEX PATH Filters: NOT EXISTS <subquery>

(1) Index Keys: process_instance business_unit Lower Index Filter: (ps_jrnl_ln.business_unit =

'ABC' AND ps_jrnl_ln.process_instance = 5960 )

Subquery: --------- 1) ps_bus_unit_tbl_gl: INDEX PATH (1) Index Keys: business_unit (Key-Only) Lower Index Filter: ps_bus_unit_tbl_gl.business_unit = 'ABC'

2) ps_bus_unit_tbl_fs: INDEX PATH (1) Index Keys: business_unit descr (Key-Only) Lower Index Filter: ps_bus_unit_tbl_fs.business_unit

= ps_bus_unit_tbl_gl.business_unit NESTED LOOP JOIN

Constant Subquery OptimizationWhen this filter is checked for the first row, the query can stop immediately, if:•it’s a NOT EXISTS and a row is found•it’s an EXISTS and no rows are found

Constant Subquery OptimizationWhen this filter is checked for the first row, the query can stop immediately, if:•it’s a NOT EXISTS and a row is found•it’s an EXISTS and no rows are found

Filter condition of outer query has been

applied here

Filter condition of outer query has been

applied here

Correlated Sub-Queries:First Row/Semi-Join

QUERY:

UPDATE PS_JRNL_LN SET jrnl_line_status = ‘3’WHERE BUSINESS_UNIT='ABC'AND PROCESS_INSTANCE=5960AND EXISTS ( SELECT 'X' FROM PS_COMBO_SEL_06 A WHERE A.SETID='ABC' AND A.COMBINATION='OVERHEAD' AND A.CHARTFIELD='ACCOUNT' AND PS_JRNL_LN.ACCOUNT BETWEEN A.RANGE_FROM_06 AND A.RANGE_TO_06)

Let’s take a look at the query plan…Let’s take a look at the query plan…

Correlated Sub-Queries:First Row/Semi-Join

1) sysadm.ps_jrnl_ln: INDEX PATH

(1) Index Keys: process_instance business_unit Lower Index Filter: (ps_jrnl_ln.business_unit =

'ABC' AND ps_jrnl_ln.process_instance = 5960 )

2) informix.a: INDEX PATH (First Row)

Filters: (informix.a.range_to_06 >= ps_jrnl_ln.account AND a.tree_effdt = <subquery> )

(1) Index Keys: setid chartfield combination range_from_06 range_to_06

Lower Index Filter: (a.setid = 'ABC' AND (a.combination = 'OVERHEAD' AND a.chartfield = 'ACCOUNT' ) )

Upper Index Filter: a.range_from_06 <= ps_jrnl_ln.account NESTED LOOP JOIN (Semi Join)

Indicates that query can stop once this

condition is satisfied

Indicates that query can stop once this

condition is satisfied

Correlated Sub-Queries:Skip Duplicate

QUERY:update orders set backlog = "Y"where exists (select "X” from itemswhere orders.order_num = items.order_num and stock_num = 6 and manu_code = "SMT” )

1) informix.items: INDEX PATH (Skip Duplicate) Filters: (items.stock_num=6 AND items.manu_code='SMT' )

(1) Index Keys: order_num

2) informix.orders: INDEX PATH (1) Index Keys: order_num Lower Index Filter: orders.order_num = items.order_num NESTED LOOP JOIN

Will get unique values from the first table before

joining to the second table, so preventing

multiple updates with the same value

Will get unique values from the first table before

joining to the second table, so preventing

multiple updates with the same value

Any Questions?

informix sql performance tuning mike walker phone: 1-888-uci for u 1-888-824-3678 fax:...

Documents