c4.lab01 - db2 9.7 storage optimization lab
TRANSCRIPT
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
1/22
IBM DB2 9.7
DB2 StorageOptimization
I
Data Management Emerging Partnership and Technologies
IBM Canada Lab
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
2/22
2
Contents
1. OBJECTIVES OF THIS LAB.........................................................................3
2. DB2 ROW COMPRESSION ..........................................................................3
2.1 SUGGESTED READING .................................................................................3
3. DB2 ROW COMPRESSION BASICS AND SETUP......................................4
3.1 ENVIRONMENT SETUP REQUIREMENTS.........................................................43.2 CREATE ROW COMPRESSION ENABLED TABLE..............................................43.3 LOAD DATA INTO TABLE ..............................................................................9
3.4
ESTIMATE COMPRESSION SAVINGS ............................................................133.5 PERFORM AN OFFLINE REORG...................................................................13
3.6 COMPRESSING XMLDATA .........................................................................143.7 CLEAN UP ................................................................................................18
4. APPENDIX ..................................................................................................19
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
3/22
3
1. Objectives of This Lab
In this lab, you will learn the basics on how to enable compression for a table and collectvarious statistics.
Row Compression Compress data by utilizing the automatically created data dictionary. Estimate savings if an offline REORG was to be performed. Perform offline REORG and compare the actual results with the estimate. View compression statistics at various stages.
2. DB2 Row Compression
Every business model looks for the best way to minimize production costs in order tomaximize profits. One of the most prominent features introduced in DB2 is the ability toperform row compression. The goal of row compression is to reduce table storage.When using this feature, DB2 is able to shrink row sizes by building and utilizing acompression dictionary, which is used as an alphabet to compress the rows of data inthe database tables. Since more data can reside on a data page, fewer data pages areallocated per table. The net effect is a reduced rate of storage consumption which maylead to fewer disks needed when allocating storage requirements. Since disk storage isone of the most expensive components on any data server, this feature can significantly
reduce Total Cost of Ownership (TCO).
2.1 Suggested reading
Information Management Technical Library at developerWorks
Contains articles and tutorials.
http://www.ibm.com/developerworks/views/db2/library.jsp
An Expert's Guide to DB2 Technology
Great read and useful information.
http://blogs.ittoolbox.com/database/technology
DB2 9 Fundamentals Certification Study Guide (Author: Roger E. Sanders)
Learn the basics and get ready for certification
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
4/22
4
3. DB2 Row Compression Basics and Setup
In this portion of the lab, we will be taking a closer look at the Row Compression feature
available since DB2 9. We will see how the Automatic Dictionary Creation (ADC) featureavailable since DB2 9.5 works and utilize the estimate statistics in DB2 9.5. Finally youwill work with the newly added features for XML compression in DB2 9.7.
3.1 Environment Setup Requirements
To complete this lab you will need the following:
DB2 9.7 Bootcamp VMware image
VMware Player 2.x or VMware Workstation 5.x or later
For help on how to obtain these components please follow the instructions specified inVMware Basics and Introductionfrom module 1.
3.2 Create Row Compression Enabled Table
First login as the db2inst1 user and start the instance. From there you will be able toexecute a script to create the FEATURE1 database that is needed for this lab, create acompression-enabled table and load data into it.
1. Start the VMware image by clicking the button in VMware.
2. At the login prompt, login with the following credentials:
Username: db2inst1
Password: password
3. Read and accept the license agreement.
4. Start the graphical interface by issuing the following command at the prompt:
db2inst1@db2rules:~> startx
5. Open a terminal window as follows:
Right-click on the Desktop area and choose the Open Terminal item:
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
5/22
5
6. Start the DB2 Administration Server (DAS) as follows (Note: If you are continuingfrom the previous Range Partitioning exercise, the DAS is already started, youcan skip this step.). Enter the command su - dasusr1 in the terminal window to login as the
user dasusr1 who controls the DAS. Enter the command db2admin start to start the DAS. Make sure the DAS
is started successfully. (Note: If you notice a message that says The DB2Administration Server is already active, thats OK. Please continue to thenext step.
Enter the command exit to exit out of the user dasusr1
7. Start the Database Manager as follows: Enter the command su db2inst1 in the terminal window to login as the
user db2inst1, the db2inst1 database instance owning user. Enter the command db2start to start the database instance. Make sure the
instance is started successfully.
8. Now you are going to create a table that will be enabled for compression. You want
to create a table that is large enough to notice a difference when the data iscompressed. We have provided you with such data, and we have also included ascript that will create a database for this exercise.
Create a working database and a compression-enabled table. Take a look at the
setupCompressionLab.sqlscript.
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
6/22
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
7/22
7
b) Create the table PEOPLE
create table PEOPLE (EMP_ID integer not null,EMP_NAME varchar(50) not null,EMP_PHONE varchar(20),EMP_EMAIL varchar(50),EMP_DEPT integer not null,EMP_ADDRESS varchar(100),EMP_SALARY integer,EMP_BONUS integer,constraint salaryok
CHECK (EMP_SALARY >= 0),constraint bonusok
CHECK (EMP_BONUS >= 0),
DATE date,
primary key (EMP_ID)) COMPRESS YES;ALTER TABLE people APPEND ON;
Note: The COMPRESS YES clause in the create table statementenable data compression on table PEOPLE at table creation time.The table data will not be compressed if the COMPRESS attribute wasnot set.
If you do not enable COMPRESS YES at table creation time, you canalso enable (or disable) compression by using the COMPRESS YES (orCOMPRESS NO) clause in the ALTER TABLE statement at a later time
Create the table by executing the script as follows:
db2 tvf setupCompressionLab.sql
9. View the new table PEOPLE in your database FEATURE1 by typing the followingcommand:
db2 list tables for schema db2inst1
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
8/22
8
10. Since you have looked at the DDL used to create the PEOPLE table, you knowhow each column is defined and the constraints created.
However, to see the structure of the PEOPLE table, you can also type the followingcommand:
db2 describe select * from people
11. View the compression status of the PEOPLE table by typing the followingcommand:
db2 select name, compression from sysibm.systableswhere creator = DB2INST1
Note:The DB2INST1 in the command above must be typed in capital letters.
The compression status for table PEOPLE is R which means only RowCompression is enabled. See the Appendix section for the more information on thedifferent compression techniques available in DB2.
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
9/22
9
3.3 Load Data into Table
Now we are going to start to load data into the table PEOPLE gradually. We will beloading 3 data files one by one, and at the same time check compression results so thatwe can notice how compression savings increase gradually as more data is added to thetable. In this exercise we utilize the Automatic Dictionary Creation (ADC) feature in DB29.5. As you will see, we do not need to issue an offline REORG command to create thedata dictionary in DB2 9.5.
1. Load the first set of data, which is about 1.3MB in size, into the table PEOPLE byfollowing the below steps:
Type the following command into the terminal window:
db2 load from PEOPLE_1to10000.TBL of del insert into people
After the load utility has finished with the load. Check the summary to make surethe load operation has successfully loaded 10000 rows into the PEOPLE table.
Set integrity on the data that was just loaded and update statistical information tothe DB2 optimizer on the table PEOPLE by executing the commands below:
db2 set integrity for people immediate checkeddb2 runstats on table db2inst1.people
Make sure the above two commands finished successfully.
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
10/22
10
We have loaded 10000 rows into the table PEOPLE and we want to see whethera compression dictionary was automatically created. We have provided you witha script, which will query the system catalog tables with the following SQL queryfor you in order to prevent typing errors.
select AVGROWSIZE, PCTROWSCOMPRESSED,AVGROWCOMPRESSIONRATIO, AVGCOMPRESSEDROWSIZE,PCTPAGESSAVED from syscat.tables where tabname ='PEOPLE' and tabschema = 'DB2INST1';
Execute the script as follows:
db2 tvf view_compression_results.sql
Note:From the fresh statistics we can see that no rows were compressed, i.e.AVGCOMPRESSIONROWSIZE is 0 and that the AVGROWSIZE is 136. Thismeans we have not reached the Automatic Dictionary Creation (ADC) thresholdor the amount of data that is in the table equals the threshold. In the latter case,subsequently loaded rows will be compressed using the automatically createddictionary.
In this particular situation, we have not yet reached the threshold and nodictionary has been created yet. Remember that at this point you can still performan offline reorg and compress the records and see the compression results.What we want to see in this exercise is how the Automatic Dictionary Creationwill kick in and compress the incoming rows, which are loaded after the thresholdhas been reached.
Record the statistics observed in the First Data Load Compression StatisticsTable in the Appendix section of this lab.
2. Load the second set of data, which is again about 1.3MB in size, into the tablePEOPLE by following the below steps:
Type in the following command in the terminal window:
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
11/22
11
db2 load from PEOPLE_10001to20000.TBL of del insert intopeople
Check to see that the load utility has successfully loaded 10000 more rows into thePEOPLE table.
Set integrity on the data that was just loaded and update statistical information tothe DB2 optimizer on the table PEOPLE by executing the commands below:
db2 set integrity for people immediate checkeddb2 runstats on table db2inst1.people
Make sure the above two commands complete successfully.
Now that we have loaded another set of 10000 rows into the table, we want to seewhether any of them were compressed and if a dictionary was automaticallycreated. Execute the following command to see the compression results:
db2 tvf view_compression_results.sql
Note:After loading the second set of data into the PEOPLE table, we can seesome compression results. The AVGROWSIZE value (122) has decreased andwe have saved 10% of the pages used. Please keep in mind that thesecompression savings are based on the rows that were compressed and thoserows are only the ones which were loaded AFTER ADC has reached itsthreshold and created the dictionary. Therefore, the data which was loaded withsize up to the threshold value is still uncompressed (this is known as thesampling data). One needs to perform an offline reorg in order to compress ALLdata in the table. Lets load another 10000 rows in the table and see how thecompression results changes.
Record the statistics observed in the Second Data Load CompressionStatistics Table in the Appendix section of this lab.
3. Load the third set of data into the table PEOPLE by following the below steps:
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
12/22
12
Type in the following command in the terminal window:
db2 load from PEOPLE_20001to30000.TBL of del insert intopeople
Check to see that the load utility has successfully loaded 10000 more rows into thePEOPLE table.
Set integrity on the data that was just loaded and update statistical information tothe DB2 optimizer on the table PEOPLE by executing the commands below:
db2 set integrity for people immediate checkeddb2 runstats on table db2inst1.people
Make sure the above two commands complete successfully.
Now we have loaded another set of 10000 rows into the table, we want to see what
the current compression savings is. Type in the following command to look at thestatistics:
db2 tvf view_compression_results.sql
Note: The average size for the rows has decreased to 101 and we have now23% savings in pages. This shows us that each set of rows that get inserted intothe table respects the created dictionary and will be compressed when possible.
Record the statistics observed in the Third Data Load Compression StatisticsTable in the Appendix section of this lab.
With the exercise above, you saw that the Automatic Dictionary Creationfeaturebuilds a compression dictionary automatically and any rows are loaded following thedictionary creation are compressed. If you are interested, you can continue loading dataand monitor the change in the compression results. We have provided you with anadditional 10000 rows that can be loaded into the table in thePEOPLE_30001to40000.TBL file.
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
13/22
13
3.4 Estimate Compression Savings
Now we are going to estimate the compression savings if you were to issue an offlinereorg operation on the table PEOPLE to compress all the data in it.
We are going to use the new table user defined function (UDF) to perform this estimate.Execute the following command in your terminal window:
db2 select * fromtable(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO('DB2INST1','PEOPLE','ESTIMATE')) as T
Note:You can see that the estimate UDF suggests that if we were to perform an offlinereorg on the table PEOPLE, we would achieve 55% of pages saved.
3.5 Perform an Offline Reorg
Perform an offline reorg on the table PEOPLE and compare the actual results with theestimate obtained in the previous step.
1. Issue the following command:
db2 reorg table people
2. Now update your statistics by running the following command:
db2 runstats on table db2inst1.people
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
14/22
14
3. Statistic should be updated and we can take a look at the actual compressionstatistics by executing the following:
db2 tvf view_compression_results.sql
Note: We have achieved 55% of pages saved by performing the offline reorg operation.Comparing this result with the estimate we received from the table UDF in the EstimateCompression Savings step shows the estimation was exact to the actual result. Alsocompare the average row size (when using compression) between the estimate obtainedusing the table UDF and the actual compression statistics.
Record the statistics observed in the Offline Reorg Compression StatisticsTable in the Appendix section of this lab.
3.6 Compressing XML data
At this point, you know that row compression allows you to make efficient use of yourstorage space by compressing your table data. One limitation in DB2 9.5 is that only therow data in a table object was eligible for compression. Since XML data is stored in aseparate location (XDA), XML documents could not benefit from row compressionunless they were inlined. Inlined XML documents are stored together with the table rowdata and not in the XDA area. To be inlined, the XML document must have a maximumof 32KB in size, since the whole document has to be stored in the same disk page aspart of the row.
DB2 9.7 extends the row compression mechanism so that any XML document in theXDA area is now eligible for compression as well. Besides that, no special setups areneeded since the XML compression is seamlessly integrated with the row compressionmechanism you know well. For example, our AD_COLUMN table is alreadyenabled for
XML compression since both Data and XML row compression are enabled/disabled bythe COMPRESS YES/NO option of the CREATE TABLE or ALTER TABLE statements.ADC is also available for XML data as well as the compression estimate functions.
The only thing you should be aware of is that there are 2 different compressiondictionaries being used by DB2. One for compressing the base table row data, and onefor compressing XML data located in the XDA area.
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
15/22
15
Follow the steps below to learn how you can use XML compression in DB2 9.7.
1. As mentioned before, XML compression is already enabled for table AD_COLUMNas the COMPRESS YES option was specified when creating the table.
2. Alter the AD_COLUMN table to add a new XML column called INFO.
Type in the following command in the terminal window:
db2 alter table db2inst1.people add column info xml
3. Increase the size of the log buffer in order to avoid any full log errors:
db2 update db cfg for feature1 using LOGFILSIZ 2500db2 terminatedb2stopdb2startdb2 connect to feature1
4. Now insert some XML data into table AD_COLUMN and update the statistics:
cd /home/db2inst1/Documents/LabScripts/Compressiondb2 tvf insertXMLData1.sql
5. You will use the ADMIN_GET_TAB_COMPRESS_INFO_V97 function to check thecurrent compression status, by specifying REPORT as the value for the 3 rdparameter.
Since we are interested specifically on the XML compression, we can filter theresults using the OBJECT_TYPE column. A value of DATA means the row hasinformation about table row data compression. A value of XML means the row
has information about XDA (XML Data Area) compression.
We have provided you with a script that uses this UDF to report the currentstatus of XML compression. You can see the query below:
db2 select * from table(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO_V97('DB2INST1','AD_COLUMN','REPORT')) as T WHERE OBJECT_TYPE='XML'
Execute the script:
db2 tvf report_XML_compression.sql
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
16/22
16
As showed in the figure above, the XML dictionary is NOT BUILT and obviouslythe percentage of pages saved is ZERO. This means that ADC did not kick in asthere is still not enough XML data for sampling a dictionary. As we mentionedbefore, the Data dictionary and the XML dictionary are independent of eachother, so DB2 will only generate a particular dictionary when there is enough datafor sampling. Thus at any point in time, you can have neither created, just one ofthem created, or both existing at the same time.
6. Insert more XML data into PEOPLE and update the statistics:
db2 tvf insertXMLData2.sql
7. Check again the XML compression status.
db2 tvf report_XML_Compression.sql
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
17/22
17
The dictionary has been created and the XML data is now compressed. You cansee the savings on the image above.
8. The same way you can recreate the Data dictionary, you can also recreate the XMLdictionary running the REORG command with the RESETDICTIONARY option. Thisis useful when your data patterns change along the time and the current dictionary isnot so effective anymore.
Run the following script to modify the XML data in your table:
db2 tvf modify_XML_data.sqldb2 runstats on table db2inst1.people
9. Run the REORG command. Notice you must also use the LONGLOBDATA option,since XML is considered large object, otherwise the XML dictionary will not berecreated.
Run the command below:
db2 reorg table db2inst1.people LONGLOBDATA RESETDICTIONARY
10. Check again the XML compression status. You will see a change in the numbers
since DB2 is compressing a different set of XML values.
db2 tvf report_XML_Compression.sql
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
18/22
18
11. Disconnect from the database.
db2 terminate
3.7 Clean Up
We have provided you with a script which can clean up your environment from thecompression exercise. You dont need to use it but it is available for you when you need
it.
/home/db2inst1/Documents/LabScripts/Compression/cleanupPeople.ddl
You can execute it as such:
su - root
cd /home/db2insy1/Documents/Compression
db2 tvf cleanupPeople.sqlEnter the password password when prompted.
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
19/22
19
4. Appendix
Catalog Statistics
AVGROWSIZE will be added to SYSCAT.TABLES and will show the averagephysical row length in a table, including both compressed anduncompressed rows. This will be of use in determining if the row countis approaching the maximum number of rows per page.
PCTPAGESSAVED will be added to SYSCAT.TABLES which will representan estimate of the percentage of pages saved by using compression.
PCTROWSCOMPRESSEDwill be added to SYSCAT.TABLES and SYSSTATand represents the percentage of the number of rows that arecompressed to the total number of rows in the table. This field will beused by the optimizer to help determine the decompression CPU cost.
AVGROWCOMPRESSIONRATIO will be added to SYSCAT.TABLES andSYSSTAT and indicates the average compression ratio for allcompressed rows. This will be used by the optimizer to help determinethe cost of expanding a compressed row.
AVGCOMPRESSEDROWSIZEwill be added to SYSCAT.TABLES and
SYSSTAT and contains the average physical (on-disk) length of allcompressed rows. This will be used by the optimizer to help determine
the cost of expanding a compressed row.
SYSIBM.SYSTABLES
Value Meaning
N No compression is set
V Only Value compression is set
R Only Row compression is set
B Both Value and Row compressionare set
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
20/22
20
Dictionary Build and Retention Behaviours
The following tables illustrate the dictionary build and retention behavioursdepending upon the value of the COMPRESSattribute in the table definition and
the specified reorg table directive RESETDICTIONARYorKEEPDICTIONARY.
The RESETDICTIONARYand KEEPDICTIONARYoptions apply to the REORGtable operation (in DB2 9 and DB2 9.5) and the LOAD REPLACE table operation(in DB2 9.5).
Using the RESETDICTIONARYoption:
Value of COMPRESSAttribute in Table
Definition
DictionaryExists
Result Outcome
YES YES Build new dictionary; rowscompressed
YES NO Build new dictionary; rowscompressed
NO YES Remove dictionary; all rowsuncompressed
NO NO No effect
Using the KEEPDICTIONARYoption:
Value of COMPRESSAttribute in TableDefinition
DictionaryExists
Result Outcome
YES YES Preserve dictionary; rowscompressed
YES NO ** Build dictionary; rowscompressed
NO YES Preserve dictionary; all rowsuncompressed
NO NO No effect
** Provided that the table has reached the Automatic Dictionary Creation (ADC)threshold, and that a sufficient volume of data exists in the table when it reachesthis threshold.
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
21/22
21
PEOPLE Table Compression Statistic Results
First Data Load:
Compression Status
AVGROWSIZE
PCTPAGESSAVED
PCTROWSCOMPRESSED
AVGROWCOMPRESSIONRATIO
AVGCOMPRESSEDROWSIZE
Second Data Load:
Compression Status
AVGROWSIZEPCTPAGESSAVED
PCTROWSCOMPRESSED
AVGROWCOMPRESSIONRATIO
AVGCOMPRESSEDROWSIZE
Third Data Load:
Compression Status
AVGROWSIZE
PCTPAGESSAVED
PCTROWSCOMPRESSEDAVGROWCOMPRESSIONRATIO
AVGCOMPRESSEDROWSIZE
Offline Reorg:
Compression Status
AVGROWSIZE
PCTPAGESSAVED
PCTROWSCOMPRESSED
AVGROWCOMPRESSIONRATIOAVGCOMPRESSEDROWSIZE
-
8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab
22/22