db2_an introduction to materialized query tables

9

Click here to load reader

Upload: api-25919427

Post on 11-Apr-2015

421 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DB2_An Introduction to Materialized Query Tables

01/02/2006 07:09 PMDB2 Basics: An introduction to materialized query tables

Page 1 of 9http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

DB2 Basics: An introduction to materialized querytablesLevel: Introductory

Roman Melnyk ([email protected]), DB2 Information Development, IBM Canada Ltd.

08 Sep 2005

The definition of a materialized query table (MQT) is based upon the result of a query. MQTs cansignificantly improve the performance of queries. This article introduces you to MQTs, summarytables, and staging tables, and shows you, by way of working examples, how to get up and runningwith materialized query tables.

A materialized query table (MQT) is a table whose definition is based upon the result of a query. The data that iscontained in an MQT is derived from one or more tables on which the materialized query table definition isbased. Summary tables (or automatic summary tables, ASTs), which are familiar to IBM® DB2® UniversalDatabase™ (UDB) for Linux, UNIX®, and Windows® (DB2 UDB) users, are considered to be a specialized typeof MQT. The fullselect that is part of the definition of a summary table contains a GROUP BY clausesummarizing data from the tables that are referenced in the fullselect.

You can think of an MQT as a kind of materialized view. Both views and MQTs are defined on the basis of aquery. The query on which a view is based is run whenever the view is referenced; however, an MQT actuallystores the query results as data, and you can work with the data that is in the MQT instead of the data that is inthe underlying tables.

Materialized query tables can significantly improve the performance of queries, especially complex queries. If theoptimizer determines that a query or part of a query could be resolved using an MQT, the query might berewritten to take advantage of the MQT.

An MQT can be defined at table creation time as maintained by the system or maintained by the user. Thefollowing sections introduce you to these two types of MQTs, as well as summary tables and staging tables. Theexamples that follow require a connection to the SAMPLE database; if you don’t have the SAMPLE databasecreated on your system, you can create it by entering the db2sampl command from any command prompt.

Maintained by system MQTsThe data in this type of materialized query table is maintained by the system. When you create this type of MQT,you can specify whether the table data will be a REFRESH IMMEDIATE or REFRESH DEFERRED. TheREFRESH keyword lets you specify how the data is to be maintained. DEFERRED means that the data in thetable can be refreshed at any time using the REFRESH TABLE statement. Neither REFRESH DEFERRED norREFRESH IMMEDIATE system-maintained MQTs allow insert, update, or delete operations to be executedagainst them. However, REFRESH IMMEDIATE system-maintained MQTs are updated with changes made tothe underlying tables as a result of insert, update, or delete operations.

Listing 1 shows an example of creating a REFRESH IMMEDIATE system-maintained MQT. The table, which isnamed EMP, is based on the underlying tables EMPLOYEE and DEPARTMENT in the SAMPLE database.Because REFRESH IMMEDIATE MQTs require that at least one unique key from each table referenced in thequery appear in the select list, we first define a unique constraint on the EMPNO column in the EMPLOYEEtable and on the DEPTNO column in the DEPARTMENT table. The DATA INITIALLY DEFERRED clausesimply means that data will not be inserted into the table as part of the CREATE TABLE statement. After being

Page 2: DB2_An Introduction to Materialized Query Tables

01/02/2006 07:09 PMDB2 Basics: An introduction to materialized query tables

Page 2 of 9http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

simply means that data will not be inserted into the table as part of the CREATE TABLE statement. After beingcreated, the MQT is in check pending state (see Demystifying table and table space states), and cannot be querieduntil the SET INTEGRITY statement has been executed against it. The IMMEDIATE CHECKED clause specifiesthat the data is to be checked against the MQT's defining query and refreshed; the NOT INCREMENTAL clausespecifies that integrity checking is to be done on the whole table. A query executed against the EMP materializedquery table shows that it is now fully populated with data.

Listing 1. Creating an MQT that is to be maintained by the system

connect to sample...

alter table employee add unique (empno)alter table department add unique (deptno)

create table emp as (select e.empno, e.firstnme, e.lastname, e.phoneno, d.deptno, substr(d.deptname, 1, 12) as department, d.mgrno from employee e, department d where e.workdept = d.deptno) data initially deferred refresh immediate

set integrity for emp immediate checked not incremental

select * from emp

EMPNO FIRSTNME LASTNAME PHONENO DEPTNO DEPARTMENT MGRNO------ ------------ --------------- ------- ------ ------------ ------000010 CHRISTINE HAAS 3978 A00 SPIFFY COMPU 000010000020 MICHAEL THOMPSON 3476 B01 PLANNING 000020000030 SALLY KWAN 4738 C01 INFORMATION 000030000050 JOHN GEYER 6789 E01 SUPPORT SERV 000050000060 IRVING STERN 6423 D11 MANUFACTURIN 000060000070 EVA PULASKI 7831 D21 ADMINISTRATI 000070000090 EILEEN HENDERSON 5498 E11 OPERATIONS 000090000100 THEODORE SPENSER 0972 E21 SOFTWARE SUP 000100000110 VINCENZO LUCCHESSI 3490 A00 SPIFFY COMPU 000010000120 SEAN O'CONNELL 2167 A00 SPIFFY COMPU 000010000130 DOLORES QUINTANA 4578 C01 INFORMATION 000030...000340 JASON GOUNOT 5698 E21 SOFTWARE SUP 000100

32 record(s) selected.

connect reset

Maintained by user MQTsThe data in this type of materialized query table is maintained by the user. Only a REFRESH DEFERREDmaterialized query table can be defined as MAINTAINED BY USER. The REFRESH TABLE statement (usedfor system-maintained MQTs) cannot be issued against user-maintained MQTs. User-maintained MQTs do allowinsert, update, or delete operations to be executed against them.

Listing 2 shows an example of creating a REFRESH DEFERRED user-maintained MQT. The table, which isnamed ONTARIO_1995_SALES_TEAM, is based on the underlying tables EMPLOYEE and SALES in the

Page 3: DB2_An Introduction to Materialized Query Tables

01/02/2006 07:09 PMDB2 Basics: An introduction to materialized query tables

Page 3 of 9http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

named ONTARIO_1995_SALES_TEAM, is based on the underlying tables EMPLOYEE and SALES in theSAMPLE database. Again, the DATA INITIALLY DEFERRED clause means that data will not be inserted intothe table as part of the CREATE TABLE statement. After being created, the MQT is in check pending state (seeDemystifying table and table space states), and cannot be queried until the SET INTEGRITY statement has beenexecuted against it. The MATERIALIZED QUERY IMMEDIATE UNCHECKED clause specifies that the tableis to have integrity checking turned on, but is to be taken out of check pending state without being checked forintegrity violations.

Next, to populate the MQT with some data, we will import data that had been exported from the EMPLOYEE andSALES tables. The exporting query matches the defining query for the MQT. Then we will insert another recordinto the ONTARIO_1995_SALES_TEAM table.

A query executed against the ONTARIO_1995_SALES_TEAM materialized query table shows that it is nowfully populated with the imported and inserted data, demonstrating that user-maintained MQTs can indeed bemodified directly.

Listing 2. Creating an MQT that is to be maintained by the user

connect to sample...

create table ontario_1995_sales_team as (select distinct e.empno, e.firstnme, e.lastname, e.workdept, e.phoneno, 'Ontario' as region, year(s.sales_date) as year from employee e, sales s where e.lastname = s.sales_person and year(s.sales_date) = 1995 and left(s.region, 3) = 'Ont') data initially deferred refresh deferred maintained by user

set integrity for ontario_1995_sales_team materialized query immediate unchecked

export to ontario_1995_sales_team.del of del select distinct e.empno, e.firstnme, e.lastname, e.workdept, e.phoneno, 'Ontario' as region, year(s.sales_date) as year from employee e, sales s where e.lastname = s.sales_person and year(s.sales_date) = 1995 and left(s.region, 3) = 'Ont'...Number of rows exported: 2

import from ontario_1995_sales_team.del of del insert into ontario_1995_sales_team...Number of rows committed = 2

insert into ontario_1995_sales_team values ('006900', 'RUSS', 'DYERS', 'D44', '1234', 'Ontario', 1995)

select * from ontario_1995_sales_team

EMPNO FIRSTNME LASTNAME WORKDEPT PHONENO REGION YEAR------ ------------ --------------- -------- ------- ------- -----------000110 VINCENZO LUCCHESSI A00 3490 Ontario 1995000330 WING LEE E21 2103 Ontario 1995006900 RUSS DYERS D44 1234 Ontario 1995

3 record(s) selected.

connect reset

Page 4: DB2_An Introduction to Materialized Query Tables

01/02/2006 07:09 PMDB2 Basics: An introduction to materialized query tables

Page 4 of 9http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

connect reset

Summary tablesYou will recall that a summary table is a specialized type of MQT whose fullselect contains a GROUP BY clausesummarizing data from the tables that are referenced in the fullselect. Listing 3 shows a simple example ofcreating a summary table. The table, which is named SALES_SUMMARY, is based on the underlying tableSALES in the SAMPLE database. Once again, the DATA INITIALLY DEFERRED clause means that data willnot be inserted into the table as part of the CREATE TABLE statement. The REFRESH DEFERRED clausemeans that the data in the table can be refreshed at any time using the REFRESH TABLE statement. A queryagainst this MQT right after it was created, but before the REFRESH TABLE statement was issued, returns anerror. After the REFRESH TABLE statement executes, the query runs successfully.

A subsequent insert operation into the SALES table, followed by a summary table refresh and a query against thesummary table, shows that the change to the underlying table is reflected in the summary table: salesperson Lee'stotal sales in the Ontario-South region have increased by 100. Similar behavior can be observed in response toupdate or delete operations against the underlying SALES table.

Listing 3. Creating a summary table

connect to sample...

create table sales_summary as (select sales_person, region, sum(sales) as total_sales from sales group by sales_person, region) data initially deferred refresh deferred

select * from sales_summary

SALES_PERSON REGION TOTAL_SALES--------------- --------------- -----------SQL0668N Operation not allowed for reason code "1" on table"MELNYK.SALES_SUMMARY". SQLSTATE=57016

refresh table sales_summary

select * from sales_summary

SALES_PERSON REGION TOTAL_SALES--------------- --------------- -----------GOUNOT Manitoba 15GOUNOT Ontario-North 1GOUNOT Ontario-South 10GOUNOT Quebec 24LEE Manitoba 23LEE Ontario-North 8LEE Ontario-South 34LEE Quebec 26LUCCHESSI Manitoba 3LUCCHESSI Ontario-South 8LUCCHESSI Quebec 3

11 record(s) selected.

Page 5: DB2_An Introduction to Materialized Query Tables

01/02/2006 07:09 PMDB2 Basics: An introduction to materialized query tables

Page 5 of 9http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

11 record(s) selected.

insert into sales values ('06/28/2005', 'LEE', 'Ontario-South', 100)

refresh table sales_summary

select * from sales_summary

SALES_PERSON REGION TOTAL_SALES--------------- --------------- -----------...LEE Ontario-North 8LEE Ontario-South 134LEE Quebec 26...

11 record(s) selected.

update sales set sales = 50 where sales_date = '06/28/2005' and sales_person = 'LEE' and region = 'Ontario-South'

refresh table sales_summary

select * from sales_summary

SALES_PERSON REGION TOTAL_SALES--------------- --------------- -----------...LEE Ontario-North 8LEE Ontario-South 84LEE Quebec 26...

11 record(s) selected.

delete from sales where sales_date = '06/28/2005' and sales_person = 'LEE' and region = 'Ontario-South'

refresh table sales_summary

select * from sales_summary

SALES_PERSON REGION TOTAL_SALES--------------- --------------- -----------...LEE Ontario-North 8LEE Ontario-South 34LEE Quebec 26...

11 record(s) selected.

connect reset

Page 6: DB2_An Introduction to Materialized Query Tables

01/02/2006 07:09 PMDB2 Basics: An introduction to materialized query tables

Page 6 of 9http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

Staging tablesYou can incrementally refresh a REFRESH DEFERRED MQT if it has a staging table associated with it. Thestaging table collects changes that need to be applied to synchronize the MQT with its underlying tables. You cancreate a staging table using the CREATE TABLE statement; then, when the underlying tables of the MQT aremodified, the changes are propagated and immediately appended to the staging table. The idea is to use thestaging table to incrementally refresh the MQT, rather than regenerate the MQT from scratch. Incrementalmaintenance provides significant performance improvement. The staging table is pruned when the refreshoperation is complete.

After it is created, a staging table is in a pending (inconsistent) state; it must be brought out of this state before itcan start collecting changes to its underlying tables. You can accomplish this by using the SET INTEGRITYstatement.

Listing 4 shows an example of using a staging table with a summary table. The summary table, which is namedEMP_SUMMARY, is based on the underlying table EMPLOYEE in the SAMPLE database. You'll recall that theDATA INITIALLY DEFERRED clause means that data will not be inserted into the table as part of the CREATETABLE statement. The REFRESH DEFERRED clause means that the data in the table can be refreshed at anytime using the REFRESH TABLE statement. The staging table, which is named EMP_SUMMARY_S, isassociated with the summary table EMP_SUMMARY. The PROPAGATE IMMEDIATE clause specifies that anychanges made to the underlying table as part of an insert, update, or delete operation are cascaded to the stagingtable. SET INTEGRITY statements are issued against both tables to take them out of their pending states.

Not unexpectedly, a query against the summary table at this point returns no data. The REFRESH TABLEstatement returns a warning, a reminder that the "integrity of non-incremental data remains unverified." This, too,is not unexpected. Another query against the summary table returns no data as well. However, after we insert anew row of data into the underlying EMPLOYEE table, a query against the staging table EMP_SUMMARY_Sreturns one row, corresponding to the data that was just inserted. The staging table has the same three columnsthat its underlying summary table has, plus two additional columns that are used by the system:GLOBALTRANSID (the global transaction ID for each propagated row) and GLOBALTRANSTIME (thetimestamp of the transaction). Another query against the summary table returns no data, but after the REFRESHTABLE statement executes this time, the query runs successfully.

Listing 4. Using a staging table with a summary table

connect to sample...

create table emp_summary as (select workdept, job, count(*) as count from employee group by workdept, job) data initially deferred refresh deferred

create table emp_summary_s for emp_summary propagate immediate

set integrity for emp_summary materialized query immediate unchecked

set integrity for emp_summary_s staging immediate unchecked

select * from emp_summary

WORKDEPT JOB COUNT-------- -------- -----------

0 record(s) selected.

Page 7: DB2_An Introduction to Materialized Query Tables

01/02/2006 07:09 PMDB2 Basics: An introduction to materialized query tables

Page 7 of 9http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

0 record(s) selected.

refresh table emp_summarySQL1594W Integrity of non-incremental data remains unverified by the database manager. SQLSTATE=01636

select * from emp_summary

WORKDEPT JOB COUNT-------- -------- -----------

0 record(s) selected.

insert into employee values ('006900', 'RUSS', 'L', 'DYERS', 'D44', '1234', '1960-05-05', 'FIELDREP', 5, 'M', '1940-04-02', 10000, 100, 1000)

select * from emp_summary_s

WORKDEPT JOB COUNT GLOBALTRANSID GLOBALTRANSTIME-------- -------- ----------- -------------------... -----------------------------...D44 FIELDREP 1 x'00000000000000CD' x'20050822201344536158000000'

1 record(s) selected.

select * from emp_summary

WORKDEPT JOB COUNT-------- -------- -----------

0 record(s) selected.

refresh table emp_summarySQL1594W Integrity of non-incremental data remains unverified by the databasemanager. SQLSTATE=01636

select * from emp_summary

WORKDEPT JOB COUNT-------- -------- -----------D44 FIELDREP 1

1 record(s) selected.

connect reset

SummaryThe SYSCAT.TABDEP system catalog view contains a row for every dependency that a materialized query tablehas on some other object. You can query this view to obtain a dependency summary for the MQTs that we havecreated (Listing 5). MQTs have a DTYPE value of 'S.' The TABNAME column lists the names of the MQTs, andthe BNAME column lists the names of the database objects on which the corresponding MQTs depend. TheBTYPE column identifies the object type: 'T' for table, 'I' for index, and 'F' for function instance.

Page 8: DB2_An Introduction to Materialized Query Tables

01/02/2006 07:09 PMDB2 Basics: An introduction to materialized query tables

Page 8 of 9http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

Listing 5. Querying the SYSCAT.TABDEP system catalog view to see MQT dependencies on otherdatabase objects

connect to sample...

select substr(tabname,1,24) as tabname, dtype, substr(bname,1,24) as bname, btype from syscat.tabdep where tabschema = 'MELNYK' and dtype = 'S'

TABNAME DTYPE BNAME BTYPE------------------------ ----- ------------------------ -----EMP S DEPARTMENT TEMP S EMPLOYEE TEMP S SQL050829104058970 IEMP S SQL050829104058800 IEMP_SUMMARY S EMPLOYEE TONTARIO_1995_SALES_TEAM S LEFT1 FONTARIO_1995_SALES_TEAM S SALES TONTARIO_1995_SALES_TEAM S EMPLOYEE TSALES_SUMMARY S SALES T

9 record(s) selected.

connect reset

We have seen that a materialized query table, whose definition is based upon the result of a query, can be thoughtof as a kind of materialized view. MQTs are important because they can significantly decrease the response timefor complex queries. This article has introduced you to the basic concepts around maintained by system MQTsand maintained by user MQTs, as well as summary tables and staging tables, and these concepts were illustratedby working examples that you can run yourself. To learn more about materialized query tables, or for moredetailed information about any of the topics covered in this article, see the DB2 Information Center.

ResourcesDB2 Universal Database for Linux, UNIX and Windows Support is the ideal place to locate resources suchas the Version 8.2 Information Center and PDF product manuals.

For the latest DB2 information online, including more detailed information about materialized query tables,visit the DB2 Information Center.

Learn about Demystifying table and table space states in DB2 UDB.

Refer to the IBM DB2 Universal Database SQL Reference, Volume 1 and IBM DB2 Universal DatabaseSQL Reference, Volume 2 for detailed SQL documentation.

About the author

Page 9: DB2_An Introduction to Materialized Query Tables

01/02/2006 07:09 PMDB2 Basics: An introduction to materialized query tables

Page 9 of 9http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509melnyk/

Roman B. Melnyk , Ph.D., is a senior member of the DB2 Information Development team, specializingin database administration, DB2 utilities, and SQL. During more than nine years at IBM, Roman haswritten numerous DB2 books, articles, and other related materials. Roman coauthored DB2 Version 8:The Official Guide (Prentice Hall Professional Technical Reference, 2003), DB2: The CompleteReference (Osborne/McGraw-Hill, 2001), DB2 Fundamentals Certification for Dummies (HungryMinds, 2001), and DB2 for Dummies (IDG Books, 2000).