exadatahcc –case study - data warehouse · pdf file4/30/2011 · agenda •...

22
® Exadata HCC – Case Study - By Diwakar Kasibhotla Sr. Data Architect

Upload: vokien

Post on 30-Jan-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Exadata HCC – Case Study

- By Diwakar Kasibhotla

Sr. Data Architect

Page 2: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

About Diwakar Kasibhotla

• Works as Sr. Data Architect for OppenheimerFunds.

• 15 yrs. experience in Data warehouse implementations performing role of a Data Architect, Data Modeler, Performance Tuning, ETL/BI architecture

• Currently working as lead Architect for Exadatamigrations at OppenheimerFunds.

• Author of blogsite http://dwarehouse.wordpress.com

Page 3: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

About OppenheimerFunds

• OppenheimerFunds is one of the nation’s largest and most respected asset management companies.

• More than $192 billion in assets under management (As of April 30, 2011)

Page 4: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Agenda

• Objective

• Case Study Details

• Options for Exadata HCC

• HCC Limitations

• Using Compression Advisor

• Migrate existing tables/partitions/subpartitions to HCC

• Real life compression ratio

• Query Performance after using HCC

• Load Performance after using HCC

• Effect of sorting on HCC ratio

• Summary

Page 5: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Objective

• Evaluate Exadata new features like HCC to save disk space.

• Compare Load and Query performance before and after HCC with different options on Production data.

Page 6: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Case Study details

• Exadata Specs: – ¼ rack, Single Instance Database (size 1TB), SATA drives

• Table specs:– Name: COMP_TEST (Copy of Production table)

– Partitions: Daily Partitions (9 Partitions)

– Sub partitions: 10 Sub partitions for each partition (90 Sub Partitions)

– No Indexes

– Row count = 67,458,438 (Production Data)

Page 7: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Exadata HCC - Options

Warehouse Compression – Optimized for query performance

– COMPRESS FOR QUERY HIGH – High Compression ratio without affecting query times but some load performance hit

– COMPRESS FOR QUERY LOW - Low compression ratio without affecting load times

Archive Compression – Optimized for Space– COMPRESS FOR ARCHIVE HIGH - Very High Compression Ratio with slower query times.

– COMPRESS FOR ARCHIVE LOW – High compression ration with slower query response times.

Page 8: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

HCC Limitations

• To maximize storage savings with Exadata Hybrid Columnar Compression, data must be loaded using data warehouse bulk loading techniques.

– Insert statements with the APPEND hint

– Parallel DML

– Direct Path SQL*LDR

– Create Table as Select (CTAS)

• Before using HCC make sure that ETL is using Bulk Loading techniques

Page 9: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Compression Advisor

• The DBMS_COMPRESSION package provides an interface to facilitate choosing the correct compression level for an application.

• The package take a table as input and estimates the compression ration and resulting size in blocks if you compressed the table.

• We used DBMS_COMPRESSION.GET_COMPRESSION_RATIOprocedure to estimate the compression ratio.

• GET_COMPRESSION_RATIO procedure creates 2 copies of your table (compressed and uncompressed) and calculates the ratio.

• To use this procedure there must be at least 1,000,000 rows in your table.

Page 10: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Using Compression Advisior

DECLARE

blkcnt_comp PLS_INTEGER;

blkcnt_uncmp PLS_INTEGER;

row_comp PLS_INTEGER;

row_uncm PLS_INTEGER;

comp_ratio PLS_INTEGER;

comp_type VARCHAR2(30);

BEGIN

dbms_compression.get_compression_ratio(‘DW’, ‘DW’, ‘COMP_TEST’, NULL, dbms_compression.COMP_FOR_QUERY_HIGH, blkcnt_comp, blkcnt_uncmp, row_comp, row_uncm, comp_ratio, comp_type);

dbms_output.put_line(‘Block Count Compressed: ‘ || TO_CHAR(blkcnt_comp));

dbms_output.put_line(‘Block Count UnCompressed: ‘ || TO_CHAR(blkcnt_uncmp));

dbms_output.put_line(‘Row Count Per Block Compressed: ‘ || TO_CHAR(row_comp));

dbms_output.put_line(‘Row Count Per Block UnCompressed: ‘ || TO_CHAR(row_uncm));

dbms_output.put_line(‘Compression Ratio: ‘ || TO_CHAR(comp_ratio));

dbms_output.put_line(‘Compression Type: ‘ || comp_type);

END;

Possible Values

dbms_compression.COMP_FOR_QUERY_HIGH

dbms_compression.COMP_FOR_QUERY_LOW

dbms_compression.COMP_FOR_ARCHIVE_HIGH

dbms_compression.COMP_FOR_ARCHIVE_LOW

Page 11: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Compression Advisor - Output

SQL> /

Compression Advisor self-check validation successful.

select count(*) on both Uncompressed and EHCC Compressed format = 1000001 rows

Block Count Compressed: 2661Block Count UnCompressed: 8481Row Count Per Block Compressed: 376Row Count Per Block UnCompressed: 117Compression Ratio: 3Compression Type: “Compress For Query High”

Page 12: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Migrate table data to HCC

• We can use 2 methods to migrate uncompressed data to HCC– Using Datapump

• Step 1: Create Table structure to be compressed on the Exadatadatabase with HCC option using impdp sqlfile option.

• Step 2: Unload data using DataPump – this will compress the data during Direct path Load

– Alter Table move option after data is migrated to Exadata

• We can move existing uncompressed partitions/subpartitions to HCC by using Alter table statement

SQL> alter table COMP_TEST move subpartition P20101219_ACL compress for query high ;

Note: You should drop or disable all the indexes before the move and create or enable them after the move.

Page 13: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Real life compression ratio

Table size without HCC

SQL> select round((sum(bytes)/1024/1024/1024),2) as size_gigfrom user_segmentswhere segment_name = ‘COMP_TEST’ ;

SIZE_GIG———-7.23

Table size after HCC – Compress for Query High

SQL> select round((sum(bytes)/1024/1024/1024),2) as size_gigfrom user_segmentswhere segment_name = ‘COMP_TEST’ ;

SIZE_GIG———-1.76

Table size after HCC – Compress for Query Low

SQL> select round((sum(bytes)/1024/1024/1024),2) as size_gigfrom user_segmentswhere segment_name = ‘COMP_TEST’ ;

SIZE_GIG———-2.68

Table size after HCC - Compress for Archive High

SQL> select round((sum(bytes)/1024/1024/1024),2) as size_gigfrom user_segmentswhere segment_name = ‘COMP_TEST’ ;

SIZE_GIG———-1.35

Table size after HCC – Compress for Archive Low

SQL> select round((sum(bytes)/1024/1024/1024),2) as size_gigfrom user_segmentswhere segment_name = ‘COMP_TEST’ ;

SIZE_GIG———-1.65

Page 14: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Real life compression ratio - Summary

Compression TypeAdvisor

Compression

Actual

Compression

QUERY HIGH 3 4.1

QUERY LOW 2 2.69

ARCHIVE HIGH 4 5.35

ARCHIVE LOW 3 4.38

NOTE: It took twice the amount time to compress an existing table using ARCHIVE HIGH

Page 15: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Query Performance on HCC tables

Query used to check performance against uncompressed tables.

SQL> select fu_key, fi_key, sum(period_1)from comp_testgroup by fu_key, fi_key;

92633 rows selected.

Elapsed: 00:00:26.07

Page 16: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Query Performance on HCC tables

Compression Type

Timing in

Seconds

without HCC

Timing in Seconds

with HCC

QUERY HIGH 26.07 12.04

QUERY LOW 26.07 11.12

ARCHIVE HIGH 26.07 13.71

ARCHIVE LOW 26.07 12.03

Page 17: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Changes in ETL Process

• If you are using Partition Exchange to load data into your fact tables then you need to do the following changes to make sure that your new partitions are using HCC:

– Alter your stage table to use HCC.

– Load data into stage table using bulk load – no change

– If you were not using bulk load then change your ETL to use bulk loading.

– Partition exchange stage table with your target fact table – no change in ETL

Page 18: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Load Performance on HCC table

• ETL using Bulk Loading on HCC tables

• Timings include loading Dimensions (uncompressed) and fact tables.

Compression Type

Timing in

MM:SS

without HCC

Timing in MM:SS

with HCC

QUERY HIGH 23.49 19.02

QUERY LOW 23.49 22.32

ARCHIVE HIGH 23.49 41.06

ARCHIVE LOW 23.49 20.11

Page 19: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Effect of Sorting on HCC

• It was suggested that loading data into HCC tables after Sorting would increase the compression ratio resulting in more space savings.

• Data in COMP_TEST table was loaded (bulk load) by sorting (order by clause) the data on each of the dimension columns.

• Different sort columns were used– Order by FU_KEY,FI_KEY,IS_PROD_KEY,SA_PROD_KEY, SE_PROD_KEY, SUB_PROD_KEY, AT_KEY, BT_KEY

– Order by FU_KEY,FI_KEY,IS_PROD_KEY,SA_PROD_KEY, SE_PROD_KEY, SUB_PROD_KEY, AT_KEY

– Order by FU_KEY,FI_KEY,IS_PROD_KEY,SA_PROD_KEY, SE_PROD_KEY, SUB_PROD_KEY

– Order by FU_KEY,FI_KEY,IS_PROD_KEY,SA_PROD_KEY, SE_PROD_KEY

– Order by FU_KEY,FI_KEY,IS_PROD_KEY,SA_PROD_KEY

– Order by FU_KEY,FI_KEY,IS_PROD_KEY

– Order by FU_KEY,FI_KEY

Page 20: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Effect of Sorting on HCC

• There was no significant change in compression ratio by using sorting and loading.

• There was no significant change in query performance by using sorting and loading.

• There was an increase in ETL load times because of sort and load operation.

Page 21: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Conclusion

• Based on the analysis done option “COMPRESS FOR QUERY HIGH” works best for us as it gives a good compression ratio with a better query/load times compared to table without compression.

• For archived tables with more than billion records we choose “COMPRESS FOR QUERY HIGH” because the compression ratio difference was not that significant and also the time it took to use “COMPRESS FOR ARCHIVE HIGH” was almost 10 times higher than “COMPRESS FOR QUERY HIGH”

Page 22: ExadataHCC –Case Study - Data Warehouse · PDF file4/30/2011 · Agenda • Objective • Case Study Details • Options for Exadata HCC • HCC Limitations • Using Compression

®

Questions