db2 11.1 - a query optimizer perspective
Post on 13-Feb-2022
14 Views
Preview:
TRANSCRIPT
DB2 11.1 - A Query Optimizer Perspective
John Hornibrook
IBM Canada
Session Code: D08November 15, 2016 (17:00-18:00)| Platform: DB2 for Linux, UNIX and Windows
Objectives
• Overview of BLU technology
• Overview of BLU and MPP architecture
• Overview of BLU performance features in DB2 11.1
• Understand the new types of access plans that are possible in DB2 11.1
• Learn how to recognize these new access plans using the explain facility
2
DB2 11.1 – The Highlights
• BLU acceleration (column-organized table support) for partitioned database systems (MPP)
• BLU acceleration performance improvements
• Sorting using fast parallel radix sort on compressed and encoded data
• Nested-loop join
• OLAP functions
• Native support for more scalar functions
• Advanced automatic query decorrelation
• Faster SQL MERGE
• Additional functional support for BLU acceleration
• Declared global temporary tables
• IDENTITY and EXPRESSION generated columns
• NOT LOGGED INITIALLY
• Codepage 819
• And many more!
• This presentation focusses on query optimization…
3
4
What is DB2 with BLU Acceleration?
• Advanced technology for analytic queries in DB2 LUW• Introduced in DB2 10.5
• DB2 column-organized tables add columnar capabilities to DB2 databases
• Table data is stored column organized rather than row organized
• Using a vector processing engine
• Using this table format with star schema datamarts provides significant improvements to storage, query performance, ease of use, and time-to-value
• New unique runtime technology which leverages the CPU architecture and is built directly into the DB2 kernel
• New unique encoding for speed and compression
• This new capability is both main-memory optimized, CPU optimized, and I/O optimized
Column-organized Runtime Processing
• Column-organized processing uses fewer operations than row-organized:
• The primary join method is hash join• DB2 11.1 adds support for nest loop join
• Does table scans for analytic queries• Uses unique indexes for single-row access and certain types of joins
• Hash aggregation and distincting (duplicate removal)
• Union
• Sort (new to DB2 11.1)• Previously, sorting was done in row-store processing
• Temporary tables
5
Column-organized Processing
• Column-organized operators execute in different subsections than row-organized operators
• Column-organized subsections are processed by different sets of DB2 subagents
• Data is transferred between subsections using a column-organized table queue (CTQ)• CTQ also performs row materialization
Row processing(subsection 1)
Column processing(subsection 2)
•Subsections run concurrently
•There can be multiple column or row processing subsections
•All subsections can be processed by multiple subagents
•(MCP – multi-core parallelism)
HSJOIN
SCAN SCAN
SORT
RETURN
CTQ
6
BLU Query Optimization - Terminology
• Late materialization
• Columns are retrieved as late as possible depending on predicate filtering
• Occurs for TBSCANs and probe side of HSJOINs•e.g. SELECT C1, C2, C3 FROM T1 WHERE C1=5 AND C2=10
1.SCAN C1, apply C1=5, return row-ids
2.Using row IDs from 1), SCAN C2, apply C2=10, return row IDs
3.Using row IDs from 2), SCAN C3 and return values
• Determined dynamically by BLU runtime
• Accounted for in the optimizer’s cost model
7
Late materialization
• For HSJOIN probe side:• Retrieve columns needed for join just before the join
• Retrieve columns not required for predicate application, after all joins have been performed
8
select c.first_name,c.last_name,ds.sales_pricefrom customer c,date d,daily_sales ds
where ds.perkey = d.perkey andds.custkey = c.custkey andd.year = 2015
JOIN
SCAN
SCAN
JOIN
SCAN
Customer
Daily Sales
Date
DS.PERKEY = D.PERKEY
DS.CUSTKEY = C.CUSTKEY
LOAD DS.CUSTKEY
LOAD DS.PERKEY
LOAD DS.SALES_PRICE
BLU Query Optimization - terminology
• Row materialization
• Column-organized data is reconstructed as row-organized data
• Performed by the column-organized table queue (CTQ)
• CTQ placement is determined by the optimizer
• Subsection degree is determined by the optimizer•Dynamically readjusted at runtime with DEGREE=ANY
9
BLU Query Optimization Explain example
RETURN
( 1)
|
CTQ
( 2)
|
MDTQ
( 3)
|
TBSCAN
( 4)
|
SORT
( 5)
|
GRPBY
( 6)
|
DTQ
( 7)
|
GRPBY
( 8)
|
^HSJOIN
( 9)
/-------+--------\
^HSJOIN TBSCAN
( 10) ( 14)
/----+----\ |
TBSCAN BTQ CO-TABLE: DB2USER
( 11) ( 12) ITEM
| |
CO-TABLE: DB2USER TBSCAN
WEB_SALES ( 13)
|
CO-TABLE: DB2USER
DATE_DIM
SCANGRAN: (Intra-Partition Parallelism Scan Granularity)
200
SCANTYPE: (Intra-Partition Parallelism Scan Type)
LOCAL PARALLEL
SCANUNIT: (Intra-Partition Parallelism Scan Unit)
ROW
TQDEGREE: (Degree of Intra-Partition parallelism)
32
TQORIGIN: (Table Queue Origin type)
COLUMN-ORGANIZED DATA
AGGMODE : (Aggregation Mode)
HASHED COMPLETE
Row processing
Column processing
Subsection 1(1 subagent)
Subsection 2(32 subagents)
10
DB2 11.1 – BLU Acceleration in a Partitioned Database System
• Technology• Pervasive SMP & MPP query parallelism
• Inter + intra partition query parallelism + memory-optimized + columnar + SIMD-enabled + BLU processing
• Value Proposition• Improve Response Time
• All servers contribute to the processing of a query
• Massively Scale Data
• Significantly beyond current practical limits
• Streamline BLU Adoption
• Add BLU Acceleration to existing data warehouses
1/3 data
Hash partition(BLU Acceleration)
Query #1processing
Query #1
Query #1processing
Query #1processing
1/3 data
Hash partition(BLU Acceleration)
1/3 data
Hash partition(BLU Acceleration)
DB2 10.5 BLU Capacity
DB2 V11.1 BLUCapacity
10s of TB 1000s of TB
100s of Cores 1000s of Cores
MPP = Massively Parallel Processing
11
BLU and Database Partitioning Feature (DPF)
• Similar external implementation to row-organized tables
• Tables are partitioned across multiple database partitions• Hash partitioned based on a set of columns (distribution key)
• Randomly partitioned with no specific distribution key (new to DB2 11.1)
• Distribution key is chosen automatically if none is specified (more on this later)
• Database operations (scans, joins, aggregation, distincting. etc.) occurs on each DB partition
• Data is automatically redistributed as the query executes• Depends on partitioning key and query semantics
12
DPF Parallel join strategies
Collocated joinPartitioning keys:CUSTOMER: CUSTKEYDAILY_SALES: CUSTKEYJoin predicate:
CUSTOMER.CUSTKEY = DAILY_SALES.CUSTKEY
JOIN
Customer Daily Sales
SCAN SCAN
Directed join
Partitioning keys:
CUSTOMER: CUST_NUMBER
DAILY_SALES: CUSTKEY
Join predicate:
CUSTOMER.CUSTKEY = DAILY_SALES.CUSTKEY
Equi-join predicate on each table’s partitioning key
Tables must be in same DB partition group
Join column(s) data type must be partition compatible
No table queues (TQs) necessary
JOIN
Customer
Daily SalesSCAN
SCANDTQ
Equi-join predicate on one table’s partitioning key
Direct rows of one table to partitioning of the other
DPF Parallel join strategies
Broadcast joinPartitioning keys:STORE: STOREKEYDAILY_SALES: CUSTKEYJoin predicate:STORE.STOREKEY = DAILY_SALES.STOREKEY •No equi-join predicate on both table’s
partitioning key or no equi-join predicate.
•One table is much smaller than the other.
•Broadcast (replicate) smaller table to partitions
of the larger table.JOIN
Store
Daily SalesSCAN
SCANBTQ
BLU DPF Explain example
15
RETURN
( 1)
|
CTQ
( 2)
|
MDTQ
( 3)
|
TBSCAN
( 4)
|
SORT
( 5)
|
GRPBY
( 6)
|
DTQ
( 7)
|
GRPBY
( 8)
|
^HSJOIN
( 9)
/-------+--------\
^HSJOIN TBSCAN
( 10) ( 14)
/----+----\ |
TBSCAN BTQ CO-TABLE: DB2USER
( 11) ( 12) ITEM
| |
CO-TABLE: DB2USER TBSCAN
WEB_SALES ( 13)
|
CO-TABLE: DB2USER
DATE_DIM
Row processing
Column processing
• BLU Table queues (TQ)• M = merging
• D = directed
• B = broadcast
• Flows encoded data
• A TQ delimits a DPF subsection• Each subsection can use multiple
subagents
• Indicated by TQDEGREE argument
Subsection 1
Subsection 2
Subsection 3
Subsection 4
BLU on DPF Architecture – Common Compression Encoding
• Each column of a column-organized table is encoded/compressed independently
• Multiple compression techniques used in combination• Dictionary encoding• Prefix encoding• Offset coding• Etc.
• BLU DPF exploits a commoncompression encoding across data slices
CPUsCPUsCPUsCPUs
BLU Acceleration Dynamic In-Memory Processing
BLU BLU BLU BLU
DB2 BLU with 4 data partitions
A B C D A B C D A B C D A B C D
MyTableSlice 1
MyTableSlice 2
MyTableSlice 3
MyTableSlice 4
In the table “MyTable” columns A and B can have very different encoding, but column A in one slice of the table will have the same encoding as column A in another slice.
16
BLU and Database Partitioning Feature (DPF)
• Column compression dictionary shared across DB partitions• A dictionary is created by one member, based on data from all DB partitions
• Dictionary is then distributed to all DB partitions
• Created by LOAD (analyze phase) or INSERT (via automatic dictionary creation)
• Shared dictionary advantages:
• Avoids decoding overhead prior to data transmission (TQing)
• Flowing encoded data reduces network traffic and CPU cost
• Allows operations (filtering, joins, aggregation, etc.) at receiving partitions to work on encoded data
• Replicated materialized query tables (MQTs) are supported• Typically used to avoid moving (TQing) data by replicating data across all DB partitions
• User maintained or
• System maintained, REFRESH DEFERRED
• Requires a full refresh• Data is not encoded
17
BLU and Database Partitioning Feature (DPF)
• BLU DPF has the same system and configuration requirements as non-DPF• Follow system sizing and configuration recommendations for column-organized tables
• New DPF databases:
• Set DB2_WORKLOAD=ANALYTICS before creating DB• Existing DPF databases:
• Review requirements for using column-organized tables• http://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dbobj.
doc/doc/c0061528.html?lang=en
• See this link for manual steps:• http://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dbobj.
doc/doc/t0061527.html?lang=en
• Shadow tables are not supported
18
DB2 11.1 DISTRIBUTE BY RANDOM
• New clause on CREATE TABLE
• Ability to partition a table without choosing a partitioning key
• Can be used for row and column-organized tables
• If a primary or unique key is defined, it is used as the partitioning key
• Otherwise, an IMPLICITLY HIDDEN column is added to the table to serve as the partitioning key• Called RANDOM_DISTRIBUTION_KEY
• Excluded from SELECT *
• Otherwise, behaves like a regular column
• Simplifies schema design for some applications
• Provides syntactic compatibility with Netezza DDL
• Tradeoff: Can’t do collocated joins or complete aggregation/distincting
19
DB2 11.1 BLU Sort Support
• Allows sorting to be done using column-organized processing
• Advantage: keeps more processing that is done above the SORT in BLU
• Implements new sort algorithm called PARADIS • Highly parallelized in-place radix sort from IBM Research
• Operates on data in encoded + columnar format
• Used for ORDER BY and OLAP processing• BLU GROUP BY and DISTINCT use hashing, rather than sorting
• Truncating SORTs are supported for FETCH FIRST N ROWS ONLY• The BLU SORT only contains the top N row
20
DB2 11.1 BLU OLAP Function Support
• On-Line Analytical Processing (OLAP) • Return ranking, row numbering and aggregate functions as a scalar values in a query result
• OLAP functions supported by BLU: • RANK , DENSE_RANK , ROW_NUMBER
• OLAP column functions supported by BLU:• AVG, COUNT/COUNT_BIG, MAX, MIN, SUM
• FIRST_VALUE
• RATIO_TO_REPORT
• For all aggregation functions supported by BLU, the window aggregation group clause is limited to:• ROWS/RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
• ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
• ROWS BETWEEN CURRENT ROW AND CURRENT ROW (not supported for FIRST_VALUE)
21
BLU SORT and OLAP examples
select ascending.rnk,ascending.item_sk worst_performing,descending.item_sk best_performing
from(select item_sk,
rank() over (order by rank_col asc) rnkfrom (select ss_item_sk item_sk,
avg(ss_net_profit) rank_colfrom store_sales ss1 group by ss_item_sk )) ascending,
(select item_sk,rank() over (order by rank_col desc) rnk
from (select ss_item_sk item_sk,
avg(ss_net_profit) rank_colfrom store_sales ss1 group by ss_item_sk )) descending
where ascending.rnk = descending.rnk
order by ascending.rnkfetch first 100 rows only;
22
|
TEMP
( 10)
|
GRPBY
( 11)
|
DTQ
( 12)
|
GRPBY
( 13)
|
TBSCAN
( 14)
|
CO-TABLE: DB2USER
STORE_SALES
RETURN
( 1)
|
CTQ
( 2)
|
TBSCAN
( 3)
|
SORT
( 4)
|
HSJOIN
( 5)
/-+-\
MDTQ MDTQ
( 6) ( 15)
| |
TBSCAN TBSCAN
( 7) ( 16)
| |
SORT SORT
( 8) ( 17)
| |
TBSCAN TBSCAN
( 9) ( 18)
| |
TEMP TEMP
( 10) ( 10)
Common sub-expression TEMP
SORTs for OLAP
OLAPs processed after MDTQs
Truncated SORT for ORDER BY and FF100RO
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Industry Leading Parallel Sort
• Leverages the latest sort innovations from IBM TJ Watson Research and DB2 Development• Enhancements can increase BLU Acceleration performance by as much as 13.9X
• BLU Sort+OLAP on SMP Environment
• Configuration Details• On 4-socket Intel Xeon platform with 72 Cores and 742G RAM• 1 TB TPC-DS database• Query scenarios involving multiple sort and OLAP operations
13.9 X faster 5.4 X faster4.6 X faster
0
500
1000
1500
2000
Sort+OLAP query 1 Sort+OLAP query 2 Sort+OLAP query 3Qu
ery
ela
pse
d t
ime
(s)
Row Sort BLU Sort
23
BLU Nested-loop Join
• Nested-loop join (NLJOIN)• The all-purpose join method
• Can be used for any type of join predicate
•T1.A + T2.A >= T1.B
•SUBSTR(T1.A) = T2.X || T2.Y• Only join method that can do Cartesian joins
•i.e. no join predicates• Doesn’t apply join predicates itself
•Join predicates are applied by the inner operation (TBSCAN in BLU)
• Hash join (HSJOIN) can only apply equality predicates
24
BLU Nested-loop Join
• Why is NLJOIN important for BLU?• Because when it occurs in BLU, other processing can occur in BLU too!
• BLU NLJOIN supports early-out and outer join
• Supports any type of inner plan• Simple base table SCAN
• Complex inner will be TEMPed
• Inner index access is not currently supported
•Join predicates can be applied using the inner’s synopsis table
25
BLU Nested-loop Join Example
select c_customer_id,c_first_name,c_last_name,sum(ws_ext_sales_price)
from customer c,promotion p,web_sales ws
where p.p_promo_id = ‘Back to school' and ws.ws_sold_date_sk > p_start_date_sk and ws.ws_sold_date_sk < p_end_date_sk and c_customer_sk = ws_bill_customer_sk
group by c_customer_id,c_first_name,c_last_name
Find the customer information for web sales that occurred when the back-to-school promotion was on.
26
1.79927e+06
CTQ
( 2)
(...)
299879
GRPBY
( 6)
|
299879
HSJOIN
( 7)
/----+----\
1.99914e+06 300470
TBSCAN DTQ
( 8) ( 9)
| |
1.99914e+06 300470
CO-TABLE: DB2USER NLJOIN
CUSTOMER ( 10)
/----+----\
1 300470
BTQ TBSCAN
( 11) ( 13)
| |
1 1.20188e+08
TBSCAN CO-TABLE: DB2USER
( 12) WEB_SALES
|
1500
CO-TABLE: DB2USER
PROMOTION
299879 GRPBY ( 5)|
299879 TBSCAN( 6)|
299879 SORT ( 7)|
299879 NLJOIN( 8)
/--------+--------\1.20188e+08 0.0025 CTQ TBSCAN( 9) ( 14)| |
1.20188e+08 1 ^HSJOIN TEMP ( 10) ( 15)
/-----+-----\ |1.20188e+08 1.19948e+07 1 TBSCAN BTQ BTQ( 11) ( 12) ( 16)| | |
1.20188e+08 1.99914e+06 1 CO-TABLE: DB2USER TBSCAN CTQ
WEB_SALES ( 13) ( 17)| |
1.99914e+06 1 CO-TABLE: DB2USER TBSCAN
CUSTOMER ( 18)|
1500 CO-TABLE: DB2USER
PROMOTION
120M rows returned by joinand sent to row-processing
DB2 10.5
DB2 11.1
BLU Nested-loop Join Examples
27
select distinct ss_item_sk,ws_item_sk
from web_sales,store_sales
where ss_quantity < 10 and ss_list_price < ws_list_price
and ws_quantity > 99
1.8e+06
CTQ
( 2)
|
1.8e+06
DTQ
( 3)
|
300000
UNIQUE
( 4)
|
1.20798e+08
DTQ
( 5)
|
1.20798e+08
NLJOIN
( 6)
/----+----\
4.02673e+07 2.9999
TBSCAN TBSCAN
( 7) ( 8)
| |
4.58213e+08 6
CO-TABLE: DB2USER TEMP
STORE_SALES ( 9)
|
6
BTQ
( 10)
|
1
TBSCAN
( 11)
|
1.20188e+08
CO-TABLE: DB2USER
WEB_SALES
•The NLJOIN inner can be stored in a BLU TEMP if very filtering local predicates are applied to the base table
SS_LIST_PRICE < WS_LIST_PRICE
WS_QUANTITY > 99
BLU Nested-loop Join (Not!)
select sum(ss_quantity*ss_list_price)
sales,count(*) number_sales
from store_sales,date_dim
where ss_sold_date_sk = d_date_sk
and d_week_seq = (select
d_week_seqfrom date_dim
where d_year = 1998 and d_moy = 12 and d_dom = 16
)
28
CTQ ( 2)|1
GRPBY ( 3)|6
DTQ ( 4)|1
GRPBY ( 5)|
43944.2 ^HSJOIN( 6)
/-------+--------\4.58213e+08 6.99971TBSCAN NLJOIN( 7) ( 8)| /----+----\
4.58213e+08 1 6.99971 CO-TABLE: DB2USER BTQ FILTER
STORE_SALES ( 9) ( 11)| |
1.22836 73049 TBSCAN BTQ ( 10) ( 12)| |
73049 73049 CO-TABLE: DB2USER TBSCAN
DATE_DIM ( 13)|73049
CO-TABLE: DB2USERDATE_DIM
USAGE : (Usage of Join)
SCALAR SUBQUERY
•Some NLJOIN operators aren’t really joins!•The are a mechanism to pass scalar values•i.e. scalar subqueries
BLU Nested-loop Join (Not!)
select c_first_name,c_last_name
from customer
where c_customer_sk = 1495578
29
Rows RETURN( 1)|1
DTQ ( 2)|
0.166667 PIPE ( 3)|1
CMPEXP( 4)|1
CTQ ( 5)|1
NLJOIN( 6)
/----+----\1 0.166667
RCTQ TBSCAN( 7) ( 9)| |1 1.99914e+06
IXSCAN CO-TABLE: DB2USER( 8) CUSTOMER|
1.99914e+06 INDEX: SYSIBM
SQL160309082316090
•Some NLJOIN operators aren’t really joins!•They are a mechanism to pass scalar values•Pass row-ids from a row-processing index scan to column-processing•Extra columns retrieved, or predicates applied, by column processing•Supported for single-row access only•(Supported in 10.5 FP3)•Supported in 11.1 MPP
•IXSCAN executed using row processing•Unique index enforcing a primary or unique key constraint•Reverse CTQ flows row-ids to column processing
SQL Functions Optimized for BLU Processing
• Allow more operations to execute in column-organized processing (below the CTQ)• String Functions
• LPAD, RPAD • TO_CHAR • INITCAP
• Numeric Functions• POWER, EXP, LOG10, LN• TO_NUMBER• MOD• SIN, COS, TAN, COT, ASIN, ACOS, ATAN• TRUNCATE
• Date and Time Functions• TO_DATE • MONTHNAME, DAYNAME
• Miscellaneous• COLLATION_KEY
BLU Declared Global Temporary Tables
• Support for a column-organized DGTT
• Supports all options expect NOT LOGGED ON ROLLBACK PRESERVE ROWS
• Can be DB partitioned
• No BLU support for created global temporary tables
DECLARE GLOBAL TEMPORARY TABLE SESSION.CUST_TEMP ("C_CUSTOMER_SK" INTEGER NOT NULL , "C_FIRST_NAME" CHAR(20 OCTETS) , "C_LAST_NAME" CHAR(30 OCTETS) , etc.) DISTRIBUTE BY HASH("C_CUSTOMER_SK") IN USERTEMP1ORGANIZE BY COLUMN NOT LOGGED;
Parallel Insert into BLU DGTT
• Multiple DB agents can insert into a column-organized DGTT• Source must be a single column-organized tables (regular or DGTT)
• Source and target could be on a different number of DB partitions (MPP system)
• Must be enough rows per subagent to make it worthwhile (about 100 rows/per agent)
32
INSERT INTO SESSION.CUST_TEMP SELECT * FROM DB2USER.CUSTOMER;
SCAN
INSERT
SCAN
INSERT
CUSTOMER
SCAN
INSERT
CUST_TEMP
SCAN
INSERT DEGREE=4
Automatic Query Transformations
• UPDATE/DELETE statements with correlated SET clauses
• De-correlate when no index is available
• Correlation requires expensive looping and scanning
33
UPDATE CUST C SET LOGIN =
(SELECT C_LOGIN FROM CUST_STAGING
CS
WHERE C.CUST_ID = CS.CUST_ID)
WHERE EXISTS(
SELECT 1 FROM CUST_STAGING CS2
WHERE C.CUST_ID = CS2.CUST_ID);
(SELECT DISTINCT C.ROWID, Q1.LOGIN
FROM CUST C, CUST_STAGING CS, Q1
WHERE C.CUST_ID = CS.CUST_ID) AS Q2
CUST
C
CUST_STAGING
CS
CUST_STAGING
CS2
(SELECT CS2.C_LOGIN
FROM CUST_STAGING CS2
WHERE C.CUST_ID = CS2.CUST_ID)
AS Q1
UPDATE C SET C.C_LOGIN = Q2.LOGIN
WHERE Q2.ROWID=C.ROWID
DB2 10.5
UPDATE/DELETE statements with correlated SET clauses
• Replace correlated sub-select with a join
• Detect duplicates using COUNT(*) for each ROWID, raise -811 error if any found• Typically, there are none
34
UPDATE CUST C SET LOGIN =
(SELECT C_LOGIN FROM CUST_STAGING
CS
WHERE C.CUST_ID = CS.CUST_ID)
WHERE EXISTS(
SELECT 1 FROM CUST_STAGING CS2
WHERE C.CUST_ID = CS2.CUST_ID);
(SELECT DISTINCT C.ROWID, C2.LOGIN
FROM CUST C, CUST_STAGING CS,
CUST_STAGING CS2
WHERE C.CUST_ID = CS.CUST_ID AND
CS2.CUST_ID = CS.CUST_ID) AS Q1
CUST
C
CUST_STAGING
CS
CUST_STAGING
CS2
UPDATE CUST SET LOGIN = Q3.LOGIN
WHERE Q3.ROWID=ROWID
(SELECT MAX(Q1.LOGIN) AS LOGIN,
COUNT(*) AS CNT, Q1.ROWID
FROM Q1 GROUP BY Q1.ROWID) AS Q2
(SELECT Q2.ROWID, Q2.LOGIN,
(CASE WHEN CNT > 1 THEN RAISE_ERROR (-811) ),
FROM Q2) AS Q3
DB2 11.1
UPDATE/DELETE statements with correlated SET clauses
35
1.19948e+07 UPDATE( 2)
/-----+------\1.19948e+07 1.19948e+07RCTQ CO-TABLE: DB2USER( 3) CUSTOMER|
1.19948e+07 CTQ( 4)|
1.19948e+07 GRPBY ( 5)|
1.19948e+07 UNIQUE( 6)|
1.50027e+08 HSJOIN( 7)
/----------+-----------\1.19948e+07 1.19948e+07HSJOIN TBSCAN( 8) ( 11)
/-------+-------\ |1.19948e+07 1.19948e+07 1.19948e+07TBSCAN TBSCAN CO-TABLE: DB2USER( 8) ( 10) CUSTOMER_STAGING| |
1.19948e+07 1.19948e+07CO-TABLE: DB2USER CO-TABLE: DB2USER
CUSTOMER CUSTOMER_STAGING
1.19948e+07 UPDATE( 2)
/-----+------\1.19948e+07 1.19948e+07RCTQ CO-TABLE: DB2USER( 3) CUSTOMER
|1.19948e+07 TBSCAN( 4)
|1.19948e+07 SORT ( 5)
|1.19948e+07 NLJOIN( 6)
/----------+-----------\1.19948e+07 1
CTQ CTQ( 7) ( 11)| |
1.19948e+07 1 HSJOIN TBSCAN( 8) ( 12)
/-------+-------\ |1.19948e+07 1.19948e+07 1.19948e+07TBSCAN TBSCAN CO-TABLE: DB2USER( 9) ( 10) CUSTOMER_STAGING| |
1.19948e+07 1.19948e+07CO-TABLE: DB2USER CO-TABLE: DB2USER
CUSTOMER CUSTOMER_STAGING
DB2 10.5 DB2 11.1
•2 CTQs•TBSCAN (12) is correlated to CTQ(7), executes 12M times!•SORT not pushed down
•1 CTQ•No correlation•Distinctingpushed down•But an extra GROUP BY
UPDATE/DELETE statements with correlated SET clausesRow-store tables
36
1.19948e+07 UPDATE( 2)
/----+-----\1.19948e+07 1.19948e+07FETCH TABLE: DB2USER( 3) CUSTOMER_R
/----+-----\1.19948e+07 1.19948e+07
TBSCAN TABLE: DB2USER( 4) CUSTOMER_R|
1.19948e+07 SORT ( 5)|
1.19948e+07 HSJOIN( 6)
/---------+---------\1.19948e+07 1.19948e+07NLJOIN TBSCAN( 7) ( 10)
/------+-------\ |1.19948e+07 1 1.19948e+07 TBSCAN TBSCAN TABLE: DB2USER( 8) ( 9) CUSTOMER_STAGING_R| |
1.19948e+07 1.19948e+07TABLE: DB2USER TABLE: DB2USERCUSTOMER_R CUSTOMER_STAGING_R
•The same problem can occur for row-store tables too•TBSCAN (9) is scanned 12M times!•An index could be added to CUSTOMER_STAGING_R
DB2 10.5
1.19948e+07UPDATE( 2)
/----+-----\1.19948e+07 1.19948e+07
FETCH TABLE: BCULINUX( 3) CUSTOMER_R
/----+-----\1.19948e+07 1.19948e+07GRPBY TABLE: BCULINUX( 4) CUSTOMER_R
|1.19948e+07 TBSCAN( 5)
|1.19948e+07 SORT ( 6)
|2.42356e+07 HSJOIN( 7)
/------------+------------\1.19948e+07 1.93767e+06 HSJOIN pUNIQUE( 8) ( 11)
/------+-------\ |1.19948e+07 1.19948e+07 1.19948e+07
TBSCAN TBSCAN TBSCAN( 9) ( 10) ( 12)| | |
1.19948e+07 1.19948e+07 1.19948e+07TABLE: BCULINUX TABLE: BCULINUX TABLE: BCULINUXCUSTOMER_R CUSTOMER_STAGING_R CUSTOMER_STAGING_R
DB2 11.1
•No correlation•But an extra GROUP BY
Automatic Query Transformations
• Queries with correlated scalar sub-selects
• De-correlate when no index is available
• Correlation requires expensive looping and scanning
37
SELECT CUST_ID, LAST_NAME,
(SELECT CITY FROM CUST_ADDR CA WHERE
C.ADDR_ID = CA.ADDR_ID)
FROM CUST C;
CUST
C
CUST_ADDR
CA
(SELECT CITY FROM CUST_ADDR CA
WHERE C.ADDR_ID = CA.ADDR_ID)
AS Q1
SELECT CUST_ID, LAST_NAME, CITY
FROM CUST C, Q1
DB2 10.5
Queries with correlated scalar sub-selects
• Replace correlated scalar sub-select with an outer join
• Detect duplicates using COUNT(*) for each ROWID, raise -811 error if any found• Typically, there are none
38
SELECT CUST_ID, LAST_NAME,
(SELECT CITY FROM CUST_ADDR CA WHERE
C.ADDR_ID = CA.ADDR_ID)
FROM CUST C;
CUST
C
CUST_ADDR
CA
(SELECT C.CUST_ID, C.LAST_NAME, CA.CITY, C.CUST_PK
FROM CUST C LEFT OUTER JOIN CUST_ADDR CA ON
C.ADDR_ID = CA.ADDR_ID) AS Q1
(SELECT MAX(Q1.CITY) AS CITY, COUNT(*) AS
CNT, Q1.CUST_ID, Q1.LAST_NAME, Q1.CUST_PK
FROM Q1
GROUP BY Q1.CUST_ID,Q1.LAST_NAME,Q1.CUST_PK)
AS Q2
SELECT Q2.CUST_ID, Q2.LAST_NAME, Q2.CITY,
(CASE WHEN CNT > 1 THEN RAISE_ERROR (-811) )
FROM Q2
DB2 11.1
Queries with correlated scalar sub-selects
39
Rows RETURN( 1)
|1.19948e+07 CTQ ( 2)
|1.19948e+07 GRPBY ( 3)
|1.19948e+07 HSJOIN<( 4)
/-------+-------\5.98932e+06 1.19948e+07 TBSCAN TBSCAN( 5) ( 6)
| |5.98932e+06 1.19948e+07
CO-TABLE: DB2USER CO-TABLE: DB2USER CUSTOMER_ADDRESS CUSTOMER
Rows RETURN( 1)
|1.19948e+07 NLJOIN( 2)
/-------+-------\1.19948e+07 1 CTQ CTQ( 3) ( 5)
| |1.19948e+07 1 TBSCAN TBSCAN( 4) ( 6)
| |1.19948e+07 5.98932e+06
CO-TABLE: DB2USER CO-TABLE: DB2USERCUSTOMER CUSTOMER_ADDRESS
DB2 10.5
DB2 11.1
•2 CTQs•TBSCAN (6) is correlated to CTQ(3), executes 12M times!
•1 CTQ•No correlation•Join is pushed down•Requires an extra GROUP BY
Inline Optimization Guidelines (Hints)
• Optimization guidelines are used to specify the access plan• AKA “hints”
• e.g. Force an index scan, join method, join order, query transformation
• Specified using an XML document stored in the SYSTOOLS.OPT_PROFILE table• Various ways to enable the optimization profile for the application (bind option, special
register)
• DB2 11.1 - optimization guidelines can be specified along with the SQL statement in an SQL comment
• Only one set of guidelines can appear in a /* */ comment after the entire SQL statement
• No registry variable setting required
SELECT S.S_NAME FROM TPCD.SUPPLIER S WHERE S_NAME = ‘XYZ CORP’
/* <OPTGUIDELINES><IXSCAN TABLE=’S’ INDEX=’S_IX1’/></OPTGUIDELINES> */ 40
Conclusion
• DB2 11.1 provides many BLU performance improvements• MPP
• SORT
• OLAP
• DGTTs
• Nested-loop join
• More scalar functions
• Query transformations
• The DB2 query optimizer is fully aware of them and chooses the right access plan
• You get better performance!
41
top related