ids storage optimization - colorado informix...
TRANSCRIPT
IDS Storage OptimizationDeep Dive: Compression, Repack and Shrink
CO Informix Users Group March 4, 2015
The Mark Scranton Group March 4, 2015
CO Informix Users Group Update
2
Website: coiug.weebly.com
The Mark Scranton Group March 4, 2015
Introduction
3
THANKS to:
• Mike Walker • George Wilson • BLM for hosting. • Email list being merged with IIUG list. • Recent list ~40. Mike owns new list with millions of Denver-based faithful Informix users.
• Now for the really important question …
What does Informix stand for?
The Mark Scranton Group March 4, 2015
Introduction
• This presentation will:
• present a brief overview of IDS Storage Optimization
• share recent benchmarks/proofs data from a recent client.
5
The Mark Scranton Group March 4, 2015
History
6
• First came … • small engines/boxes - can’t store data very long. • disk expensive, processing power slow. • XPS - cluster many boxes together to allow more
storage, memory and cpus as one engine. • IDS 7 was ready with PDQ/fragmentation … but the
hardware wasn’t.
Cluster of iPods running XPS, circa 200x
I’m very fast!!
The Mark Scranton Group March 4, 2015
History
• Then came … • more disk, more memory, more cpu power. • data warehousing … store lots of data and run really
long reports. • XPS - reduction of nodes while increasing data
retention. • IDS - handling DW environments better with increases
in horsepower.
7
The Mark Scranton Group March 4, 2015
History
8
• IDS starts inheriting XPS features for larger environments. • IDS can handle what used to be XPS/DW-only
environments. • Now … disk is significantly cheaper.
• buy a lot, store data forever. • “I need analytics”, or “big data” • I/O still major factor in overall performance. • More disk == more data == more I/O. • Thus the need for “storage optimization.”
The Mark Scranton Group March 4, 2015
IDS Storage Optimization Overview
compress repack shrink
• typically work together, but are independent. • if compression is not licensed, repack and shrink can still have a dramatic impact on disk usage. (more on licensing later…)
9
The Mark Scranton Group March 4, 2015
IDS Storage Optimization Overview
10
compress repack shrink
Compress rows reducing the amount of required disk space.
Consolidates free space in partitions.
Returns free space to the dbspace.
+ +
License required. Included in IDS 11.5.xc4, XPS 8.51.xC3 or greater.
Table Index Simple Blobs (Text/Blob)
IDS 11.5.xC4 X
IDS 12.1 X X X
XPS 8.51.xC3 X
The Mark Scranton Group March 4, 2015
IDS Storage Optimization Overview
• Administration Methods:
• SQL / sysadmin API / Scheduler
• OAT - Open Admin Tool - very robust in latest version.
• Server Studio
11
The Mark Scranton Group March 4, 2015
IDS Storage Optimization Overview
• What Can’t Be Compressed!
• Data stored in Blobspace Blobs
• System catalog tables
• Temp tables
• Dictionary tables
• Tables in sysuser, sysmaster, sysutils, syscdr, syscdcv1
12
The Mark Scranton Group March 4, 2015
Compression Overview
• Options: • all compression operations are available “online” • users still have access to partition(s). • “offline” version typically not faster.
• can choose the candidates for compression: • in-row data, simple large objects, or both.
• a customized dictionary of tokens is built for each partition (more details later).
• all partitions with more than 2000 rows can be compressed.
13
The Mark Scranton Group March 4, 2015
Compression Overview
• Index Compression!
• any existing detached B-tree partition may be compressed, repacked, and shrunk while the index remains available to queries.
• available with IDS 12.
• cannot uncompress a compressed index - must drop and rebuild.
• leaf level only compressed.
• 2000 unique keys is needed for a compression dictionary to be created.
14
The Mark Scranton Group March 4, 2015
Compression Overview
• Auto Compression!
• Create the dictionary so compression takes place when there are 2000 rows minimum.
• Rows compressed as data loaded, as well as any new rows/extents.
• Not available for indexes or blobs yet.
15
The Mark Scranton Group March 4, 2015
Compression Overview
• Auto Compression!
• Even if there are not enough rows, return message will indicate “Auto compression is set.”
• compression task - run all the time to always have estimates
• can be implemented with no downtime - leave old data alone, compress anything new coming in.
• automatic compression for newly created structures
16
The Mark Scranton Group March 4, 2015
Compression Algorithm
• Lempel-Ziv (LZ) based algorithm – static dictionary, built by random sampling.
• Frequently repeating patterns replaced with 12-bit symbol numbers.
• Dictionary tries to capture the “best” patterns (frequency x length).
• Any byte that does not match a pattern is also replaced with a 12- bit reserved symbol number.
• Patterns can be up to 15 bytes long. ■ 12-bits means 4,096 symbols:
• 256 reserved symbols for bytes that match no pattern. –3,840 pattern symbols.
17
The Mark Scranton Group March 4, 2015
The Compression Dictionary
18
engine memory
dbspace with compressed partition A
dbspace with compressed partition B
A B75K - 150K each
A dictionary is created by sampling a set of rows from an existing table or table fragment and then creating a lookup table of symbols that represent byte patterns.
The Mark Scranton Group March 4, 2015
Pre-Compress Steps
• create_dictionary (optional)
• creates the compression dictionary.
• any rows inserted or updated afterwards will be compressed.
• previously existing rows will not be compressed.
• estimate_compression
• estimates the compression ratio ie: disk savings.
• if data is already compressed, estimates the current compression ratio.
19
The Mark Scranton Group March 4, 2015
Compress
• creates the dictionary if it does not exist.
• compresses all previously existing rows.
• all tasks done in the background as a small transaction to allow access to partition.
• rows being compressed only lock for a short duration.
• leaves “holes” in pages/slots, but pages still allocated to partition.
20
The Mark Scranton Group March 4, 2015
Repack
• moves rows/partition blobs (v12) within a fragment to consolidate free space at the end.
• Can repack with or without compressing first.
• “repack” - table is fully accessible.
• “repack_offline” – table is locked, no query access
• good repack candidates:
• tables with lots of free pages/space, ie: purged often, maintenance scripts “broken.”
21
The Mark Scranton Group March 4, 2015
Repack
• Warnings:
• If light appends (onstat -g lap) occurs on a fragment while a repack operation is occurring:
• the repack operation does not complete the consolidation of space at the end of a table or fragment.
• To complete the repack process, you must run a second repack operation after light append activity has completed. This second repack operation builds on the work of the first repack operation.
22
The Mark Scranton Group March 4, 2015
Shrink
• returns any free space at end of table or fragment to the dbspace
• normally done after a repack.
• After the rows are repacked, the reclaim operation truncates the unused portion of the partition, and returns the space back to the dbspace where the partition is located. For an index, this returns free space at the end of the index to the dbspace, thus reducing the total size of the index.
23
The Mark Scranton Group March 4, 2015
Memory Benefits
• BUFFERPOOLs: less pages per partition, more available, less I/O.
• Page Mgmt:
• data set size - thrashing when required pages don’t fit into the cache. Less pages = less/no thrashing.
• less dirty pages to flush for replacement.
• more clean pages available, thus less chance of forced fg write or lru write.
• Smaller Btrees - includes less work for btree scanners.
• Note: Rows get uncompressed for processing
• data in virtual portion (sorts, hash tables for HJ, etc.) not compressed.
24
The Mark Scranton Group March 4, 2015
Performance Benefits
25
• less I/O
• less bufferpool page mgmt
• lower cost typically - more attractive.
• reduced forced flushing to disk (lru, fg writes)
• smaller b-trees (v12):
• fewer leaf nodes to scan.
• hit ratio improved
• less I/O, larger/full data set in buffer cache.
The Mark Scranton Group March 4, 2015
Performance Benefits
26
Andreas Weininger, IBM
March 2011
The long list of performance benefits can result in:
The Mark Scranton Group March 4, 2015
Archive Benefits
• Archive/Restore
• no cpu overhead for archive/restore compress/decompress.
• less network bandwidth required.
• faster archive/restore - takes load off engine, lpar, cpu pools. (archive month-end example)
• less disk needed for disk caching and tape targets. ie: logical logs to disk then to tape, dbspaces to tape (TSM example).
27
The Mark Scranton Group March 4, 2015
Benefits
28
Memory Disk PerformanceBackup/Restore HDR Optimizer Network
fewer & smaller
drives
X X
less redundant
storage
X X X X
SSDs X X X
less pages for partition
X X X X X X
less row splits
X X X X
smaller btrees (v12)
X X X X
buffer pg mgmt
X X X X
The Mark Scranton Group March 4, 2015
Warnings
• tables that “stay cached” will perform well without compression.
• rows have to be materialized for each access.
• 9.4 brought “hot pages stay above” approach to caching. (pre you could set a partition to be memory resident).
• tables with “their own bufferpool” will stay cached if they fit.
• Stale pages will not see performance benefits due to I/O and then materialization.
29
The Mark Scranton Group March 4, 2015
Proof: Compress, repack and shrink
30
OP Dur (s) Dur (hrs)
Logs Used (200MB) Numrows
dbspace free GB Extents pgs alloc pgs used data pgs
pgs alloc (GB)
pgs used (GB)
data pgs (GB)
baseline 530,393,437 52 66 10,681,054 10,522,451 10,585,967 40.7 40.1 40.4compress 3404s 0h 56m 624 10,681,054 10,587,270 10,585,967 40.7 40.4 40.4delta 0 -64,819 0 0.0 -0.2 0.0
repack 17340s 4h 49m 52.683 46 4,260,331 10,608,229 4,270,212 16.3 40.5 16.3
delta 6,420,723 -85,778 6,315,755 24.5 -0.3 24.1
shrink 5s 0h 0m 5s 79 46 4,270,738 4,270,738 4,270,212 16.3 16.3 16.3
delta 6,410,316 6,251,713 6,315,755 24.5 23.8 24.1
• 1.6B row table
• 3 fragments - 1 shown below.
The Mark Scranton Group March 4, 2015
Proofs: Repack
• Another large table - started a repack to see what would happen. !
• 05/08/14 13:21:20 SCHAPI Repack for xx:”informix".table_y started
• 05/08/14 13:21:20 Repacked 1 rows in xx:”informix”.table_y
• 05/08/14 13:21:20 SCHAPI Shrink for xx:”informix".table_y started
• 05/08/14 13:21:20 admin_fragment_command('fragment repack shrink ','48234498') succeeded
31
The Mark Scranton Group March 4, 2015
Repack Proof #1
32
sec hrs numrows
dbspace free space (GB) extents pgs alloc pgs used data pages
pgs alloc (GB)
pgs used (GB)
data pgs (GB)
OP 176,768,260 3.591 42 4,521,930 4,521,930 2,488,739 17.2 17.2 9.5repack 3276s 0h 54m 3.591 42 4,521,930 4,521,930 2,325,776 17.2 17.2 8.9
delta 0 0 0 0 162,963 0.000 0.000 0.622shrink 2s 12.58 2,326,063 2,326,063 2,325,776 8.9 8.9 8.9
delta 8.989 0 2,195,867 2,195,867 2,196,154 8.377 8.377 0.622
• 520M row table
• 3 fragments - 1 shown below.
The Mark Scranton Group March 4, 2015
Repack Proof #2
33
Repack Testing
table_x nrows pgs alloc pgs_used data_pages extentsfrag1 pre 4,589,275 882,392 882,392 170,136 40
frag2 pre 4,587,485 853,032 853,032 170,073 37
frag3 pre 4,584,694 916,013 916,013 169,963 35
Pre-repack Totals 13,761,454 2,651,437 2,651,437 510,172 112
frag1 post 4,589,275 171,796 171,796 171,774 17
frag2 post 4,587,485 171,734 171,734 171,712 17
frag3 post 4,584,694 171,671 171,671 171,649 17
Post-repack Totals 13,761,454 515,201 515,201 515,135 51
Deltas 2,136,236 2,136,236 -4,963 61GB 8.15 8.15 -0.0189323425292969
Random Luck Pick
• table not getting purged as scheduled.
• client has 77 tables in similar shape, but not as big.
The Mark Scranton Group March 4, 2015
Row Split Reduction Proof
34
create table split (col1 char(5000));
insert 1 row
The Mark Scranton Group March 4, 2015
Row Split Reduction Proof
35
home page (291:1235)
slot 1: len 4064 (4056 + 4 byte slot + 4 byte fwd ptr)
slot 1: len 940 (936 + 4-byte slot entry)
remainder page (291:1236)
The Mark Scranton Group March 4, 2015
Row Split Reduction Proof
36
03/02/15 11:55:43 SCHAPI Create Dictionary for xx:"informix".split started 03/02/15 11:55:43 Partition 1048662 does not contain enough rows to build a compression dictionary. The partition must contain at least 2000 rows to perform this operation. 03/02/15 11:55:43 rscCommand: rscCreateDictionaryFlag failed, isfd = 3, table = 'xx:"informix".split' 03/02/15 11:55:43 ERROR ba:"informix".split: rscCommand failed for partnum 1048662
do the compress
insert some rows
The Mark Scranton Group March 4, 2015
Row Split Reduction Proof
37
lots of remainder pages
The Mark Scranton Group March 4, 2015
Row Split Reduction Proof
38
do the compress again
home page (291:1235)
pages still allocated
compressed
The Mark Scranton Group March 4, 2015
OBTW
• compress - onstat -p
39
Not “new compress” … this was when the engine would move data rows around on a page to free up contiguous space for a new row.
The Mark Scranton Group March 4, 2015
Compression Catalogs
• sysmaster tables:
• syscompdicts_full table – includes binary dictionary; access restricted to user “informix”
• syscompdicts view – globally accessible; omits binary dictionary for security
40
The Mark Scranton Group March 4, 2015
Fun Facts
• Newsgroup discussion: “…does compression work IF you don’t have the license?”
• YES. A current client on 11.5 - compression works fine without a license.
41
The Mark Scranton Group March 4, 2015
Fun Facts
• Client Example:
• IBM Informix Storage Optimization Feature for Enterprise Edition Processor Value Unit (PVU) License + SW Subscription & Support 12 Months
• cost - $247K for Enterprise AIX; 38 cores; 1680 PVUs
• Many clients using compression as a result of an IWA purchase.
42
The Mark Scranton Group March 4, 2015
Tips
43
execute function sysadmin:task("fragment compress", “0x02a00002"); <=== single fragment only. !03/02/15 08:34:24 SCHAPI Create Dictionary for xx:"informix".invoice_line started 03/02/15 08:34:24 SCHAPI Compress for xx:"informix".invoice_line started
The Mark Scranton Group March 4, 2015
Tips
• Logical logs will get chewed up quickly.
44
marching through logical logs (200MB each) (expected)
The Mark Scranton Group March 4, 2015
Tips
45
onstat -g ppd
Dictionary info • crTS - created timestamp
The Mark Scranton Group March 4, 2015
Tips
• onstat -g dsk
46