c4.lab01 - db2 9.7 storage optimization lab

Upload: pareja007

Post on 02-Jun-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    1/22

    IBM DB2 9.7

    DB2 StorageOptimization

    I

    Data Management Emerging Partnership and Technologies

    IBM Canada Lab

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    2/22

    2

    Contents

    1. OBJECTIVES OF THIS LAB.........................................................................3

    2. DB2 ROW COMPRESSION ..........................................................................3

    2.1 SUGGESTED READING .................................................................................3

    3. DB2 ROW COMPRESSION BASICS AND SETUP......................................4

    3.1 ENVIRONMENT SETUP REQUIREMENTS.........................................................43.2 CREATE ROW COMPRESSION ENABLED TABLE..............................................43.3 LOAD DATA INTO TABLE ..............................................................................9

    3.4

    ESTIMATE COMPRESSION SAVINGS ............................................................133.5 PERFORM AN OFFLINE REORG...................................................................13

    3.6 COMPRESSING XMLDATA .........................................................................143.7 CLEAN UP ................................................................................................18

    4. APPENDIX ..................................................................................................19

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    3/22

    3

    1. Objectives of This Lab

    In this lab, you will learn the basics on how to enable compression for a table and collectvarious statistics.

    Row Compression Compress data by utilizing the automatically created data dictionary. Estimate savings if an offline REORG was to be performed. Perform offline REORG and compare the actual results with the estimate. View compression statistics at various stages.

    2. DB2 Row Compression

    Every business model looks for the best way to minimize production costs in order tomaximize profits. One of the most prominent features introduced in DB2 is the ability toperform row compression. The goal of row compression is to reduce table storage.When using this feature, DB2 is able to shrink row sizes by building and utilizing acompression dictionary, which is used as an alphabet to compress the rows of data inthe database tables. Since more data can reside on a data page, fewer data pages areallocated per table. The net effect is a reduced rate of storage consumption which maylead to fewer disks needed when allocating storage requirements. Since disk storage isone of the most expensive components on any data server, this feature can significantly

    reduce Total Cost of Ownership (TCO).

    2.1 Suggested reading

    Information Management Technical Library at developerWorks

    Contains articles and tutorials.

    http://www.ibm.com/developerworks/views/db2/library.jsp

    An Expert's Guide to DB2 Technology

    Great read and useful information.

    http://blogs.ittoolbox.com/database/technology

    DB2 9 Fundamentals Certification Study Guide (Author: Roger E. Sanders)

    Learn the basics and get ready for certification

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    4/22

    4

    3. DB2 Row Compression Basics and Setup

    In this portion of the lab, we will be taking a closer look at the Row Compression feature

    available since DB2 9. We will see how the Automatic Dictionary Creation (ADC) featureavailable since DB2 9.5 works and utilize the estimate statistics in DB2 9.5. Finally youwill work with the newly added features for XML compression in DB2 9.7.

    3.1 Environment Setup Requirements

    To complete this lab you will need the following:

    DB2 9.7 Bootcamp VMware image

    VMware Player 2.x or VMware Workstation 5.x or later

    For help on how to obtain these components please follow the instructions specified inVMware Basics and Introductionfrom module 1.

    3.2 Create Row Compression Enabled Table

    First login as the db2inst1 user and start the instance. From there you will be able toexecute a script to create the FEATURE1 database that is needed for this lab, create acompression-enabled table and load data into it.

    1. Start the VMware image by clicking the button in VMware.

    2. At the login prompt, login with the following credentials:

    Username: db2inst1

    Password: password

    3. Read and accept the license agreement.

    4. Start the graphical interface by issuing the following command at the prompt:

    db2inst1@db2rules:~> startx

    5. Open a terminal window as follows:

    Right-click on the Desktop area and choose the Open Terminal item:

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    5/22

    5

    6. Start the DB2 Administration Server (DAS) as follows (Note: If you are continuingfrom the previous Range Partitioning exercise, the DAS is already started, youcan skip this step.). Enter the command su - dasusr1 in the terminal window to login as the

    user dasusr1 who controls the DAS. Enter the command db2admin start to start the DAS. Make sure the DAS

    is started successfully. (Note: If you notice a message that says The DB2Administration Server is already active, thats OK. Please continue to thenext step.

    Enter the command exit to exit out of the user dasusr1

    7. Start the Database Manager as follows: Enter the command su db2inst1 in the terminal window to login as the

    user db2inst1, the db2inst1 database instance owning user. Enter the command db2start to start the database instance. Make sure the

    instance is started successfully.

    8. Now you are going to create a table that will be enabled for compression. You want

    to create a table that is large enough to notice a difference when the data iscompressed. We have provided you with such data, and we have also included ascript that will create a database for this exercise.

    Create a working database and a compression-enabled table. Take a look at the

    setupCompressionLab.sqlscript.

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    6/22

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    7/22

    7

    b) Create the table PEOPLE

    create table PEOPLE (EMP_ID integer not null,EMP_NAME varchar(50) not null,EMP_PHONE varchar(20),EMP_EMAIL varchar(50),EMP_DEPT integer not null,EMP_ADDRESS varchar(100),EMP_SALARY integer,EMP_BONUS integer,constraint salaryok

    CHECK (EMP_SALARY >= 0),constraint bonusok

    CHECK (EMP_BONUS >= 0),

    DATE date,

    primary key (EMP_ID)) COMPRESS YES;ALTER TABLE people APPEND ON;

    Note: The COMPRESS YES clause in the create table statementenable data compression on table PEOPLE at table creation time.The table data will not be compressed if the COMPRESS attribute wasnot set.

    If you do not enable COMPRESS YES at table creation time, you canalso enable (or disable) compression by using the COMPRESS YES (orCOMPRESS NO) clause in the ALTER TABLE statement at a later time

    Create the table by executing the script as follows:

    db2 tvf setupCompressionLab.sql

    9. View the new table PEOPLE in your database FEATURE1 by typing the followingcommand:

    db2 list tables for schema db2inst1

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    8/22

    8

    10. Since you have looked at the DDL used to create the PEOPLE table, you knowhow each column is defined and the constraints created.

    However, to see the structure of the PEOPLE table, you can also type the followingcommand:

    db2 describe select * from people

    11. View the compression status of the PEOPLE table by typing the followingcommand:

    db2 select name, compression from sysibm.systableswhere creator = DB2INST1

    Note:The DB2INST1 in the command above must be typed in capital letters.

    The compression status for table PEOPLE is R which means only RowCompression is enabled. See the Appendix section for the more information on thedifferent compression techniques available in DB2.

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    9/22

    9

    3.3 Load Data into Table

    Now we are going to start to load data into the table PEOPLE gradually. We will beloading 3 data files one by one, and at the same time check compression results so thatwe can notice how compression savings increase gradually as more data is added to thetable. In this exercise we utilize the Automatic Dictionary Creation (ADC) feature in DB29.5. As you will see, we do not need to issue an offline REORG command to create thedata dictionary in DB2 9.5.

    1. Load the first set of data, which is about 1.3MB in size, into the table PEOPLE byfollowing the below steps:

    Type the following command into the terminal window:

    db2 load from PEOPLE_1to10000.TBL of del insert into people

    After the load utility has finished with the load. Check the summary to make surethe load operation has successfully loaded 10000 rows into the PEOPLE table.

    Set integrity on the data that was just loaded and update statistical information tothe DB2 optimizer on the table PEOPLE by executing the commands below:

    db2 set integrity for people immediate checkeddb2 runstats on table db2inst1.people

    Make sure the above two commands finished successfully.

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    10/22

    10

    We have loaded 10000 rows into the table PEOPLE and we want to see whethera compression dictionary was automatically created. We have provided you witha script, which will query the system catalog tables with the following SQL queryfor you in order to prevent typing errors.

    select AVGROWSIZE, PCTROWSCOMPRESSED,AVGROWCOMPRESSIONRATIO, AVGCOMPRESSEDROWSIZE,PCTPAGESSAVED from syscat.tables where tabname ='PEOPLE' and tabschema = 'DB2INST1';

    Execute the script as follows:

    db2 tvf view_compression_results.sql

    Note:From the fresh statistics we can see that no rows were compressed, i.e.AVGCOMPRESSIONROWSIZE is 0 and that the AVGROWSIZE is 136. Thismeans we have not reached the Automatic Dictionary Creation (ADC) thresholdor the amount of data that is in the table equals the threshold. In the latter case,subsequently loaded rows will be compressed using the automatically createddictionary.

    In this particular situation, we have not yet reached the threshold and nodictionary has been created yet. Remember that at this point you can still performan offline reorg and compress the records and see the compression results.What we want to see in this exercise is how the Automatic Dictionary Creationwill kick in and compress the incoming rows, which are loaded after the thresholdhas been reached.

    Record the statistics observed in the First Data Load Compression StatisticsTable in the Appendix section of this lab.

    2. Load the second set of data, which is again about 1.3MB in size, into the tablePEOPLE by following the below steps:

    Type in the following command in the terminal window:

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    11/22

    11

    db2 load from PEOPLE_10001to20000.TBL of del insert intopeople

    Check to see that the load utility has successfully loaded 10000 more rows into thePEOPLE table.

    Set integrity on the data that was just loaded and update statistical information tothe DB2 optimizer on the table PEOPLE by executing the commands below:

    db2 set integrity for people immediate checkeddb2 runstats on table db2inst1.people

    Make sure the above two commands complete successfully.

    Now that we have loaded another set of 10000 rows into the table, we want to seewhether any of them were compressed and if a dictionary was automaticallycreated. Execute the following command to see the compression results:

    db2 tvf view_compression_results.sql

    Note:After loading the second set of data into the PEOPLE table, we can seesome compression results. The AVGROWSIZE value (122) has decreased andwe have saved 10% of the pages used. Please keep in mind that thesecompression savings are based on the rows that were compressed and thoserows are only the ones which were loaded AFTER ADC has reached itsthreshold and created the dictionary. Therefore, the data which was loaded withsize up to the threshold value is still uncompressed (this is known as thesampling data). One needs to perform an offline reorg in order to compress ALLdata in the table. Lets load another 10000 rows in the table and see how thecompression results changes.

    Record the statistics observed in the Second Data Load CompressionStatistics Table in the Appendix section of this lab.

    3. Load the third set of data into the table PEOPLE by following the below steps:

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    12/22

    12

    Type in the following command in the terminal window:

    db2 load from PEOPLE_20001to30000.TBL of del insert intopeople

    Check to see that the load utility has successfully loaded 10000 more rows into thePEOPLE table.

    Set integrity on the data that was just loaded and update statistical information tothe DB2 optimizer on the table PEOPLE by executing the commands below:

    db2 set integrity for people immediate checkeddb2 runstats on table db2inst1.people

    Make sure the above two commands complete successfully.

    Now we have loaded another set of 10000 rows into the table, we want to see what

    the current compression savings is. Type in the following command to look at thestatistics:

    db2 tvf view_compression_results.sql

    Note: The average size for the rows has decreased to 101 and we have now23% savings in pages. This shows us that each set of rows that get inserted intothe table respects the created dictionary and will be compressed when possible.

    Record the statistics observed in the Third Data Load Compression StatisticsTable in the Appendix section of this lab.

    With the exercise above, you saw that the Automatic Dictionary Creationfeaturebuilds a compression dictionary automatically and any rows are loaded following thedictionary creation are compressed. If you are interested, you can continue loading dataand monitor the change in the compression results. We have provided you with anadditional 10000 rows that can be loaded into the table in thePEOPLE_30001to40000.TBL file.

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    13/22

    13

    3.4 Estimate Compression Savings

    Now we are going to estimate the compression savings if you were to issue an offlinereorg operation on the table PEOPLE to compress all the data in it.

    We are going to use the new table user defined function (UDF) to perform this estimate.Execute the following command in your terminal window:

    db2 select * fromtable(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO('DB2INST1','PEOPLE','ESTIMATE')) as T

    Note:You can see that the estimate UDF suggests that if we were to perform an offlinereorg on the table PEOPLE, we would achieve 55% of pages saved.

    3.5 Perform an Offline Reorg

    Perform an offline reorg on the table PEOPLE and compare the actual results with theestimate obtained in the previous step.

    1. Issue the following command:

    db2 reorg table people

    2. Now update your statistics by running the following command:

    db2 runstats on table db2inst1.people

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    14/22

    14

    3. Statistic should be updated and we can take a look at the actual compressionstatistics by executing the following:

    db2 tvf view_compression_results.sql

    Note: We have achieved 55% of pages saved by performing the offline reorg operation.Comparing this result with the estimate we received from the table UDF in the EstimateCompression Savings step shows the estimation was exact to the actual result. Alsocompare the average row size (when using compression) between the estimate obtainedusing the table UDF and the actual compression statistics.

    Record the statistics observed in the Offline Reorg Compression StatisticsTable in the Appendix section of this lab.

    3.6 Compressing XML data

    At this point, you know that row compression allows you to make efficient use of yourstorage space by compressing your table data. One limitation in DB2 9.5 is that only therow data in a table object was eligible for compression. Since XML data is stored in aseparate location (XDA), XML documents could not benefit from row compressionunless they were inlined. Inlined XML documents are stored together with the table rowdata and not in the XDA area. To be inlined, the XML document must have a maximumof 32KB in size, since the whole document has to be stored in the same disk page aspart of the row.

    DB2 9.7 extends the row compression mechanism so that any XML document in theXDA area is now eligible for compression as well. Besides that, no special setups areneeded since the XML compression is seamlessly integrated with the row compressionmechanism you know well. For example, our AD_COLUMN table is alreadyenabled for

    XML compression since both Data and XML row compression are enabled/disabled bythe COMPRESS YES/NO option of the CREATE TABLE or ALTER TABLE statements.ADC is also available for XML data as well as the compression estimate functions.

    The only thing you should be aware of is that there are 2 different compressiondictionaries being used by DB2. One for compressing the base table row data, and onefor compressing XML data located in the XDA area.

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    15/22

    15

    Follow the steps below to learn how you can use XML compression in DB2 9.7.

    1. As mentioned before, XML compression is already enabled for table AD_COLUMNas the COMPRESS YES option was specified when creating the table.

    2. Alter the AD_COLUMN table to add a new XML column called INFO.

    Type in the following command in the terminal window:

    db2 alter table db2inst1.people add column info xml

    3. Increase the size of the log buffer in order to avoid any full log errors:

    db2 update db cfg for feature1 using LOGFILSIZ 2500db2 terminatedb2stopdb2startdb2 connect to feature1

    4. Now insert some XML data into table AD_COLUMN and update the statistics:

    cd /home/db2inst1/Documents/LabScripts/Compressiondb2 tvf insertXMLData1.sql

    5. You will use the ADMIN_GET_TAB_COMPRESS_INFO_V97 function to check thecurrent compression status, by specifying REPORT as the value for the 3 rdparameter.

    Since we are interested specifically on the XML compression, we can filter theresults using the OBJECT_TYPE column. A value of DATA means the row hasinformation about table row data compression. A value of XML means the row

    has information about XDA (XML Data Area) compression.

    We have provided you with a script that uses this UDF to report the currentstatus of XML compression. You can see the query below:

    db2 select * from table(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO_V97('DB2INST1','AD_COLUMN','REPORT')) as T WHERE OBJECT_TYPE='XML'

    Execute the script:

    db2 tvf report_XML_compression.sql

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    16/22

    16

    As showed in the figure above, the XML dictionary is NOT BUILT and obviouslythe percentage of pages saved is ZERO. This means that ADC did not kick in asthere is still not enough XML data for sampling a dictionary. As we mentionedbefore, the Data dictionary and the XML dictionary are independent of eachother, so DB2 will only generate a particular dictionary when there is enough datafor sampling. Thus at any point in time, you can have neither created, just one ofthem created, or both existing at the same time.

    6. Insert more XML data into PEOPLE and update the statistics:

    db2 tvf insertXMLData2.sql

    7. Check again the XML compression status.

    db2 tvf report_XML_Compression.sql

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    17/22

    17

    The dictionary has been created and the XML data is now compressed. You cansee the savings on the image above.

    8. The same way you can recreate the Data dictionary, you can also recreate the XMLdictionary running the REORG command with the RESETDICTIONARY option. Thisis useful when your data patterns change along the time and the current dictionary isnot so effective anymore.

    Run the following script to modify the XML data in your table:

    db2 tvf modify_XML_data.sqldb2 runstats on table db2inst1.people

    9. Run the REORG command. Notice you must also use the LONGLOBDATA option,since XML is considered large object, otherwise the XML dictionary will not berecreated.

    Run the command below:

    db2 reorg table db2inst1.people LONGLOBDATA RESETDICTIONARY

    10. Check again the XML compression status. You will see a change in the numbers

    since DB2 is compressing a different set of XML values.

    db2 tvf report_XML_Compression.sql

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    18/22

    18

    11. Disconnect from the database.

    db2 terminate

    3.7 Clean Up

    We have provided you with a script which can clean up your environment from thecompression exercise. You dont need to use it but it is available for you when you need

    it.

    /home/db2inst1/Documents/LabScripts/Compression/cleanupPeople.ddl

    You can execute it as such:

    su - root

    cd /home/db2insy1/Documents/Compression

    db2 tvf cleanupPeople.sqlEnter the password password when prompted.

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    19/22

    19

    4. Appendix

    Catalog Statistics

    AVGROWSIZE will be added to SYSCAT.TABLES and will show the averagephysical row length in a table, including both compressed anduncompressed rows. This will be of use in determining if the row countis approaching the maximum number of rows per page.

    PCTPAGESSAVED will be added to SYSCAT.TABLES which will representan estimate of the percentage of pages saved by using compression.

    PCTROWSCOMPRESSEDwill be added to SYSCAT.TABLES and SYSSTATand represents the percentage of the number of rows that arecompressed to the total number of rows in the table. This field will beused by the optimizer to help determine the decompression CPU cost.

    AVGROWCOMPRESSIONRATIO will be added to SYSCAT.TABLES andSYSSTAT and indicates the average compression ratio for allcompressed rows. This will be used by the optimizer to help determinethe cost of expanding a compressed row.

    AVGCOMPRESSEDROWSIZEwill be added to SYSCAT.TABLES and

    SYSSTAT and contains the average physical (on-disk) length of allcompressed rows. This will be used by the optimizer to help determine

    the cost of expanding a compressed row.

    SYSIBM.SYSTABLES

    Value Meaning

    N No compression is set

    V Only Value compression is set

    R Only Row compression is set

    B Both Value and Row compressionare set

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    20/22

    20

    Dictionary Build and Retention Behaviours

    The following tables illustrate the dictionary build and retention behavioursdepending upon the value of the COMPRESSattribute in the table definition and

    the specified reorg table directive RESETDICTIONARYorKEEPDICTIONARY.

    The RESETDICTIONARYand KEEPDICTIONARYoptions apply to the REORGtable operation (in DB2 9 and DB2 9.5) and the LOAD REPLACE table operation(in DB2 9.5).

    Using the RESETDICTIONARYoption:

    Value of COMPRESSAttribute in Table

    Definition

    DictionaryExists

    Result Outcome

    YES YES Build new dictionary; rowscompressed

    YES NO Build new dictionary; rowscompressed

    NO YES Remove dictionary; all rowsuncompressed

    NO NO No effect

    Using the KEEPDICTIONARYoption:

    Value of COMPRESSAttribute in TableDefinition

    DictionaryExists

    Result Outcome

    YES YES Preserve dictionary; rowscompressed

    YES NO ** Build dictionary; rowscompressed

    NO YES Preserve dictionary; all rowsuncompressed

    NO NO No effect

    ** Provided that the table has reached the Automatic Dictionary Creation (ADC)threshold, and that a sufficient volume of data exists in the table when it reachesthis threshold.

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    21/22

    21

    PEOPLE Table Compression Statistic Results

    First Data Load:

    Compression Status

    AVGROWSIZE

    PCTPAGESSAVED

    PCTROWSCOMPRESSED

    AVGROWCOMPRESSIONRATIO

    AVGCOMPRESSEDROWSIZE

    Second Data Load:

    Compression Status

    AVGROWSIZEPCTPAGESSAVED

    PCTROWSCOMPRESSED

    AVGROWCOMPRESSIONRATIO

    AVGCOMPRESSEDROWSIZE

    Third Data Load:

    Compression Status

    AVGROWSIZE

    PCTPAGESSAVED

    PCTROWSCOMPRESSEDAVGROWCOMPRESSIONRATIO

    AVGCOMPRESSEDROWSIZE

    Offline Reorg:

    Compression Status

    AVGROWSIZE

    PCTPAGESSAVED

    PCTROWSCOMPRESSED

    AVGROWCOMPRESSIONRATIOAVGCOMPRESSEDROWSIZE

  • 8/10/2019 C4.LAB01 - DB2 9.7 Storage Optimization Lab

    22/22