december 01-03, 2009 minneapolis, chicago, milwaukee 0 best practices to improve query performance...

45
December 01-03, 2009 •Minneapolis, Chicago, Milwaukee 1 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM, [email protected]

Upload: mariah-howard

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee1

Best Practices to Improve Query Performance in a Data Warehouse - 2

Calisto Zuzarte, STSM, IBM, [email protected]

Page 2: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee2

Data Warehouse Life Cycle

Database design / Application design– The Warehouse Application architects and Database

Administrators work together to design the queries and schema before they put the application in production

Database performance layer implementation– In order to meet SLAs, DBAs usual go through some iterations

augmenting the database with performance layer objects and set up the initial configuration to get good performance

Database tuning operations– During production, with changing requirements and change in

data, there is on-going tuning required to keep operations smooth.

Page 3: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee3

Motivation

Data warehouse environments characteristics:– Large volumes of data

• Millions/Billions of rows involved in some tables

• Large Joins

• Large Sorts,

• Large Aggregations

• Many tables involved

• Large amount of data rolled-in and rolled-out

– Complex queries • Report Queries

• Ad Hoc Queries

It is important to pay attention to query performance

Page 4: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee4

Objective

Provide recommendations from a DB2 optimizer perspective to improve query performance through the Data Warehouse life cycle

Page 5: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee5

Agenda

SESSION 1

Best Practices – Database Design

Best Practices – Application Design

Best Practices – Configuration and Operations

SESSION 2

Best Practices – Performance Layer

Page 6: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee6

Best Practices – Performance Layer

Indexes

Statistics

Constraints

Materialized Query Tables

Replicated Tables

Page 7: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee7

We want Index Only Access instead of more expensive ISCAN-FETCH or TSCAN (Table Scan)

We want to avoid SORTs particularly those that spill

We want to promote index-ORing and index-ANDing

We want to promote Star Joins

When you have range join predicates– Better performance with Nested Loop Join

When we want to avoid random I/O with better clustering

Indexes are useful when …

Page 8: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee8

ISCAN

INDEXONLY

INDEX_ORing INDEX-ANDingLIST PREFETCH

FETCH|

ISCAN

FETCH|

RIDSCN|

SORT|

ISCAN

ISCANFETCH

|RIDSCN

SORT|

ISCAN

SORT|

ISCAN ISCAN

FETCH|

SORT(RID)|

IXAND

REGULAR

Types of Index Access

Minimize List Prefetch with REORG

Encourage Index-ANDing

Encourage Index-ORing

Encourage Index Only Access

Page 9: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee9

Nested Loop Join Merge Join Hash Join

HSJOIN

outer

NLJOINMGJN

inner outer inner probe build

4030

1919

29

319

319

31219

79

0239

191940

33379

12191919

SORT/ Index SORT / Index

319

319

31219

79

51020151015

555

51020

55

155

1015

51020151015

555

4030

1919

29

3231233

3231233

3323123

Better With Index

Indexes Can Help NLJN and MGJN

Page 10: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee10

HSJOIN /------------------+-----------------\ TBSCAN DTQ | | TABLE:SUPPLIER NLJOIN /------+------\ TBSCAN FETCH | /----+----\ SORT IXSCAN TABLE: PART | | FETCH P_PK /----+---\ RIDSCN TABLE: PARTSUPP | SORT | IXAND /-----------+-----------\ NLJOIN NLJOIN /------+-----\ /----+---\ TBSCAN IXSCAN BTQ IXSCAN | | | | TABLE: PART INDEX: PS_PK TBSCAN INDEX: PS_SK | TABLE: SUPPLIER Star Join requires fact table Indexes

Indexes Promote Star Joins

Page 11: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee11

SORT Spills

The details for a SORT will indicate if the SORT spilled

The I/Os indicate that there was spilling associated with the SORT.

Minimize spills by considering indexes and (Also discussed later by balancing SORTHEAP, SHEAPTHRES and BUFFERPOOL) SORT

( 16) 6.14826e+06 1.30119e+06 | 3.65665e+07 TBSCAN ( 17) 2.00653e+06 1.14286e+06 | 3.74999e+07 TABLE: TPCD.ORDERS

Page 12: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee12

Define Indexes-To get index-only access-To avoid SORT (spills)-To promote Index-OR / Index AND-To promote star joins-To better handle range join predicate-For point Queries (Nested Loop Join)-For better clustering (MDC Indexes)

Page 13: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee13

Best Practices – Performance Layer

Indexes

Statistics

Constraints

Materialized Query Tables

Replicated Tables

Page 14: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee14

Estimating the size of intermediate results is critical to getting good query execution plans

Without sufficient information, the optimizer can only guess based on some assumptions

Data skew and statistical correlation between multiple column values introduce uncertainty

Cardinality Estimation

Page 15: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee15

Cardinality Estimation With and Without Distribution Statistics

Distribution statistics are exploited ONLY when predicates use constantsThey make a significant difference if you have skew

SELECT * FROM T WHERE C1 = 300

Equality Predicate (no distribution Statistics)• CARD = 1000, COLCARD = 100• FF = 1/COLCARD = 1/100 = 0.01 (1% of the rows)• Cardinality Estimate = 0.01 * 1000 = 10 rows

Equality Predicate with distribution • Value 300 has VALCOUNT 200• Cardinality = 200

0

50

100

150

200

250

45 300 50 200 250 10 20 50 30 280 … …

Frequency

Page 16: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee16

Collect Distribution Statistics

when you have skew and predicates with constants

Page 17: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee17

Quantile or Histogram Statistics – Date Columns

Table size ~400,000 ROWS

4 buckets each ~100,000 ROWS (number of quantiles = 4)

Assume 4 months per bucket

20080601 20081001 20090201 20090601 20101001

WHERE DateCol BETWEEN 20081115 AND 20081101

~100,000

Cardinality Estimate = ((20081115 – 20081101) / (20090201 – 20081001)) * 100000 = (14 / 9200 ) * 100000 = ~152 ROWS

Page 18: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee18

Quantile or Histogram Statistics – Date Columns

Table size ~400,000 ROWS

16 buckets each ~25,000 ROWS (number of quantiles = 16)

20080601 20081001 20090201 20090601 20101001

20081101 20081201 20090101

~25,000

WHERE DateCol BETWEEN 20081115 AND 20081101

Cardinality Estimate = ((20081115 – 20081101) / (20081201 - 20081101)) * 25000 = (14 / 100 ) * 25000 = ~3500 ROWS

Could be more accurate if 20081130

Page 19: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee19

Collect a large number of Quantile Statistics

on DATE columns

Page 20: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee20

Column Group Statistics

Example: WHERE COUNTRY = ‘Germany’ And CITY = ‘Frankfurt’

No CGS: Selectivity = ½ * 1/3 = 1/6 …Estimate 1 row

With CGS: Selectivity = 1/3 …Estimate 2 rows

Country City Hotel Name

Germany Bremen Hilton

Germany Bremen Best Western

Germany Frankfurt InterCity

Germany Frankfurt Shangri-La

Canada Toronto Four Seasons

Canada Toronto Intercontinental

1

2

3

Page 21: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee21

Column Group Statistics

The DB2 optimizer can exploit column group statistics for better estimates

SELECT … FROM dw.hotel H

WHERE H.country = ? and H.city = ?

RUNSTATS ON TABLE dw.hotel … ON COLUMNS (… (country, city) … ) …

If you have an index on (country, city) we have FIRST2KEYCARD statistic so no need of CGS

Page 22: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee22

Collect Column Group Statistics

when you have multiple equality predicates on the

same table on columns that might be correlated

Page 23: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee23

Join Cardinality Estimation

Join cardinality estimation is typically based on inclusion and uniformity

PRODUCT.PROD_ID = SALES.PROD_ID– PRODUCT Table

• 20000 rows• 10000 distinct PROD_IDs

– SALES• 100000 rows• 5000 distinct PROD_IDs

– Join Cardinality• 20000*100000 / MAX(10000, 5000) = 200000

Page 24: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee24

CUSTID CNAME

1 ‘ABC’

2 ‘DEF’

3 ‘GHI’

4 ‘IBM’

5 ‘JKL’

6 ‘MNO’

7 ‘PQR’

… …

… …

100 ‘XYZ’

CUSTID # of Rows

4 2000000

10 700000

8 500000

63 300000

72 100000

9 50000

12 20000

… …

… …

5 5

CUST Table Frequency Statistics – SALES Table

SELECT … FROM SALES, CUSTWHERE CUST.CNAME = ‘IBM’ AND CUST.CUSTID = SALES.CUSTID

Cardinality Estimate with Uniformity : Cardinality Estimate of CUST after local predicate = (1/100 * 100) = 1Join Card Estimate= (1 * 10000000) / MAX(1, 100) = 100000Actual Cardinality : 2000000 !!!!!!!!!!!!!!!!!!!!

100 rows100 distinct values

10000000 rows100 distinct values

Problem Scenario - Skew

Page 25: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee25

What Are Statistical Views

Views with statistics that can be used by the DB2 optimizer

If appropriate for a query, these statistics can be exploited to get better cardinality estimates

The view need not be directly referenced in the query - matched like MQTs

Single table or simple join views works best

Page 26: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee26

Creating Statistical Views

Create a view like a regular view :CREATE VIEW cust_fact AS(SELECT C.* FROM CUSTOMER C, FACT F

WHERE C.CUST_ID = F.CUST_ID)

Alter the view to make it a statistical viewALTER VIEW cust_fact ENABLE QUERY OPTIMIZATION

Collecting StatisticsRUNSTATS ON TABLE dba.cust_fact WITH DISTRIBUTION

Page 27: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee27

Define Statistical Views

when you have skew on join columns

Page 28: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee28

Best Practices – Performance Layer

Indexes

Statistics

Constraints

Materialized Query Tables

Replicated Tables

Page 29: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee29

Referential Integrity (RI)

Facilitates aggregation push down– Example follows

Eliminates redundant joins in views– RI helps determine that queries that do not require data from a

primary key table need not do that join even if it is in the view

Helps with Materialized Query Table matching– Allows Queries to match MQTs with more dimension table joins

Page 30: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee30

Exploiting Referential Integrity Constraints

Find the top 20 stores in terms of total sales, and include the store name and city information

SELECT st.store_id, st.name, st.city, sum(F.sales) as sm FROM salesF as F, store as ST WHERE F.store_id = ST.store_id GROUP BY st.store_id, st.name, st.city ORDER BY sm desc FETCH FIRST 20 ROWS ONLY;

Page 31: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee31

Exploiting Referential Integrity Constraints

f.store_id=st.store_id /* FK=PK */

sum(sales) as sm group by store_id, name, city

store_id, name, city, sm

order by sm descfetch first 20 rows only

store_id, name, city

2,000 rows store_id, sales

1,000,000 rows

JOIN

Sales (F) Store (ST)

GROUPBY

SORT

SELECT

2,000 rows

20 rows

1,000,000 rows

20 rows

Page 32: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee32

Exploiting Referential Integrity Constraints

f.store_id=st.store_id /* FK=PK */

sum(sales) as sm group by store_id

store_id, name, city, sm

order by sm descfetch first 20 rows only

store_id, name, city

2,000 rows

GROUPBY

Sales (F) Store (ST)

JOIN

SORT

SELECT

2,000 rows

store_id, sales

1,000,000 rows

20 rows

20 rows

2,000 rows

Page 33: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee33

Exploiting Referential Integrity Constraints

f.store_id=st.store_id /* FK=PK */

sum(sales) as sm group by store_id

store_id, name, city, sm

order by sm descfetch first 20 rows only

store_id, name, city

2,000 rows

GROUPBY

Sales (F) Store (ST)

SORT

JOIN

SELECT

2,000 rows

store_id, sales

1,000,000 rows

20 rows

20 rows

20 rows

Page 34: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee34

Define Referential Integrity

Whenever possible(Informational if you know for sure)

Page 35: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee35

Best Practices – Performance Layer

Indexes

Statistics

Constraints

Materialized Query Tables

Replicated Tables

Page 36: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee36

Consider Materialized Query Tables

Fact

GB

JOIN

Dim2

Dim1

JOIN

Fact

GB

JOIN

Dim2

Dim1

JOIN

Fact

GB

JOIN

Dim2

Dim1

JOIN

Fact

GB

JOIN

Dim2

Dim1

JOIN

Joe’s Query

Sue’s Query

Bob’s QueryFact

GB

JOIN

Dim2

Dim1

JOIN

MQT

Q9 Q9

Bob’s QSue’s

Joe’s Q

Page 37: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee37

Best Practices - Defining Materialized Query Tables

Compute and use the dimension lattice to assist in MQT selection. Determine cardinality of aggregates that appear to be good candidates by executing COUNT queries against base tables.

Try to achieve at least a 10X reduction in size between fact and the MQT

Build MQTs with a reasonable number of GROUP BY columns (3 to 4 dimension keys) at a time based on query patterns

As far as possible build the MQT from the fact table alone. This is easier to manage although the dimension tables may need to be joined for most queries.

Page 38: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee38

Best Practices - MQT Matching

Define Referential Integrity to help with matching MQTs that contain more tables than the queries

Define Functional Dependencies for thinner MQTs

Use COUNT_BIG instead of COUNT for DPF MQTs

Define indexes on MQTs

Keep statistics up-to-date

Include as many Foreign Key Columns as the size allows in the MQT

– Allows joins of dimension tables when needed

Define base table columns NOT NULL as far as possible– For example we can match SUM(A + B) with SUM(A) + SUM(B)

Page 39: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee39

Best Practices – MQT Maintenance

REFRESH IMMEDIATE– Create an index on the GROUP BY columns

– If not aggregating, create the index on the set of columns that form a unique key

– If possible, define the base table columns used in the GROUP BY clause of REFRESH IMMEDIATE MQTs as NOT NULL

– Define RI relationships wherever applicable

– Always keep the base table and MQT statistics up-to-date

REFRESH DEFERRED– If log space is an issue, consider ALTER TABLE … NOT LOGGED

INITIALLY

– An MQT can be temporarily toggled into a regular table by using • ALTER TABLE … DROP MATERIALIZED QUERY

• ALTER TABLE … ADD MATERIALIZED QUERY

– For Full REFRESH, consider LOAD FROM CURSOR.

Page 40: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee40

Best Practices MQTs

Define Referential Integrity Make FK columns

NOT NULL

Define FunctionalDependencies

Use COUNT_BIG(*) in DPFConsider GROUPING SETS,

ROLLUP or CUBE

10X Reduction ?

Indexes and RUNSTATS

Include FK columns in the MQT

Page 41: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee41

Best Practices – Performance Layer

Indexes

Statistics

Constraints

Materialized Query Tables

Replicated Tables

Page 42: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee42

Replicated Tables

Replicate dimension tables (unless collocated with fact table)

Benefit : Avoids data movement, Avoids too many agents

Important : Define suitable indexes

If too large, replicate a subset of frequently used columns and/or rows

BTQ

JOIN

BTQ

JOIN

SALES SALESCUST SALES

BTQ

JOIN

CUSTCOPY

CUSTCOPY

CUSTCOPY

Page 43: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee43

Largest dimension table- Collocate with the fact table

All other dimension tables-Single partition and replicated

- At least the HOT columns

Page 44: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

December 01-03, 2009 •Minneapolis, Chicago, Milwaukee44

Summary

Session 2 : Best practices to improve data warehouse query performance related to the performance layer

– Indexes

– Statistics (Column Group Statistics and Statistical Views)

– Referential Integrity

– Materialized Query Tables

– Replicated Tables

Recap : Session 1 covered Database Design, Application Design, Configuration and Operations

– These include considerations related to Parallelism, Partitioning, Schema, Application queries, Configuration

Page 45: December 01-03, 2009 Minneapolis, Chicago, Milwaukee 0 Best Practices to Improve Query Performance in a Data Warehouse - 2 Calisto Zuzarte, STSM, IBM,

45

© Copyright IBM Corporation [current year]. All rights reserved.U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.  WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.  IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.

Please update paragraph below for the particular product or family brand trademarks you mention such as WebSphere, DB2, Maximo, Clearcase, Lotus, etc

IBM, the IBM logo, ibm.com, [IBM Brand, if trademarked], and [IBM Product, if trademarked] are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

If you have mentioned trademarks that are not from IBM, please update and add the following lines:

[Insert any special 3rd party trademark names/attributions here] Other company, product, or service names may be trademarks or service marks of others.

Disclaimer