a primer on multidimensional clustering for udb luw
DESCRIPTION
A Primer on Multidimensional Clustering for UDB LUW. He once made an SQL statement run faster just by staring at it. He has never had a Network Security firewall rule refuse him access. He had the "Backspace" and "Delete" keys permanently removed from his keyboard. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/1.jpg)
A Primer on Multidimensional
Clustering for UDB LUW
![Page 2: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/2.jpg)
2
The DB2 Optimizer asks him for the best access path.
He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.
He can type all SQL syntax with 100% accuracy from memory.
He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.
He had the "Backspace" and "Delete" keys permanently removed from his keyboard.
His sysadmins call him daily to ask if they can give him more disk space.
He has never had a Network Security firewall rule refuse him access.
On a slow day, he will reorg large tables completely in his mind.
He once made an SQL statement run faster just by staring at it.
He has never clicked on the “undo” arrow.
The DB2 Optimizer asks him for the best access path.
He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.
He can type all SQL syntax with 100% accuracy from memory.
He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.
He had the "Backspace" and "Delete" keys permanently removed from his keyboard.
His sysadmins call him daily to ask if they can give him more disk space.
He has never had a Network Security firewall rule refuse him access.
On a slow day, he will reorg large tables completely in his mind.
He once made an SQL statement run faster just by staring at it.
He has never clicked on the “undo” arrow.
The DB2 Optimizer asks him for the best access path.
He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.
He can type all SQL syntax with 100% accuracy from memory.
He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.
He had the "Backspace" and "Delete" keys permanently removed from his keyboard.
His sysadmins call him daily to ask if they can give him more disk space.
He has never had a Network Security firewall rule refuse him access.
On a slow day, he will reorg large tables completely in his mind.
He once made an SQL statement run faster just by staring at it.
He has never clicked on the “undo” arrow.
The DB2 Optimizer asks him for the best access path.
He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.
He can type all SQL syntax with 100% accuracy from memory.
He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.
He had the "Backspace" and "Delete" keys permanently removed from his keyboard.
His sysadmins call him daily to ask if they can give him more disk space.
He has never had a Network Security firewall rule refuse him access.
On a slow day, he will reorg large tables completely in his mind.
He once made an SQL statement run faster just by staring at it.
He has never clicked on the “undo” arrow.
The DB2 Optimizer asks him for the best access path.
He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.
He can type all SQL syntax with 100% accuracy from memory.
He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.
He had the "Backspace" and "Delete" keys permanently removed from his keyboard.
His sysadmins call him daily to ask if they can give him more disk space.
He has never had a Network Security firewall rule refuse him access.
On a slow day, he will reorg large tables completely in his mind.
He once made an SQL statement run faster just by staring at it.
He has never clicked on the “undo” arrow.
![Page 3: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/3.jpg)
A brief bio…• 29 years IT, 19 years of DBA experience
o UDB LUW on AIXo DB2/ZOSo Oracle
• Longest query I ever tuned was over 4 feet long when printed out
• Favorite saying: “Even a blind squirrel finds a nut once in a while”
![Page 4: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/4.jpg)
Agenda• What is clustering?• What is multidimensional clustering (MDC)?• Some design guidelines for MDC
![Page 5: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/5.jpg)
Left Outer Join
JOIN
L
T E
F
![Page 6: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/6.jpg)
Backup
back
Hint: The most important thing to a DBA
![Page 7: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/7.jpg)
Create table in tablespace
tablecreatetablespace
![Page 8: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/8.jpg)
What is Clustering?• Physical sequence of rows in a DB2 table.• Determined by defining one index as the
“clustering index”.• As rows are inserted, DB2 attempts to put them
in correct clustering location• During Reorg, rows are sorted in clustering order
before reloading back into table• Is the table clustered or is the index clustered??
![Page 9: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/9.jpg)
9
Regular non-clustering indexes
Table
IndexOn Region
Index onYear
![Page 10: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/10.jpg)
10
Clustering Index
Table
Clustering IndexOn Region
Index onYear
![Page 11: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/11.jpg)
11
Why are reads faster when a table is clustered??
• The first I/O reads a page into memory which contains many rows with the same key or a range of key valueso Example: App needs 500 rows for a given region….
• If the DBMS knows that it will need to fetch several or many consecutive pages, then it can begin “prefetching” extents (multiple pages) into memory before application needs it
18 IOs 3 IOsvs vs 1 IO
![Page 12: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/12.jpg)
Sequential Prefetch• “Holy Grail” when accessing large numbers of
rows• Significant reduction in I/O• Physical reads vs Logical reads• Tablespace Page size (bytes)• Tablespace Extent size (pages)• Tablespace Prefetch size (pages)
![Page 13: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/13.jpg)
How UDB uses clustering
• Sequential prefetch is turned on if UDB determines cost savings
• Clustered data makes it more likely for sequential prefetch to be turned on
• Optimizer looks at clusterratio and clusterfactor (on syscat.indexes)
• Sequential detection can be turned on dynamically during query execution
![Page 14: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/14.jpg)
So what’s the shortfall with Clustering?
• Clustering deteriorates over time (probably) – requiring reorgs
• Record based indexes with a pointer for every single record, so can become very large in size
• Only get one choice for the clustering index.• If Joe needs the table clustered by timestamp and
Bill needs it clustered by policy #, one of them will probably be unhappy.
![Page 15: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/15.jpg)
Partitioned Database
DA TA BA SE
![Page 16: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/16.jpg)
MultiDimensional Clustering
DimensionalclusteringDimensionalclusteringDimensionalclustering
![Page 17: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/17.jpg)
MultiDimensional Clustering (MDC)
• What if your data could be physically sequenced in more than one way at the same time??
• Great in theory, but how do you make this happen in real life on a real table??
![Page 18: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/18.jpg)
MultiDimensional Clustering
• Data is physically grouped together by “dimensions” into separate blocks, or extents
• Each page belongs to exactly one block• All blocks are of equal size• Tablespace Page size (bytes)• Tablespace Extent size (pages)• Tablespace Prefetch size (pages)
![Page 19: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/19.jpg)
What is an extent?• An extent is a set of contiguous data pages on
disk, specified at tablespace creation time.• Physical size of an extent determined by:
o Extent Size (# of pages)o Page Size (kb)
![Page 20: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/20.jpg)
Color
Year
Age
2002 2003 2004
1
2
3
Red
Blue
Green
MDC with Three Dimensions
![Page 21: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/21.jpg)
What is a (logical) cell?
• Contains all rows for a unique combination of dimension values
• Physically made up of one or more blocks (extents)
• Blocks are only allocated for logical cells which actually have records for a given combination of dimension values
![Page 22: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/22.jpg)
Color
Year
Age
2002 2003 2004
1
2
3
Red
Blue
Green
2002, Red, 1
A “Cell”
![Page 23: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/23.jpg)
What is a Slice?• A slice is a set of blocks having a particular
dimension key.
![Page 24: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/24.jpg)
Color
Year
Age
2002 2003 2004
1
2
3
Red
Blue
Green
2002, Red, 1 2003, Red, 1 2004, Red, 1
A Red “Slice” of the Color
Dimension
![Page 25: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/25.jpg)
Color
Year
Age
2002 2003 2004
1
2
3
Red
Blue
Green
2004, Red, 1
2004, Blue, 1
2004, Green, 1
A 2004 “Slice” of the Year
Dimension
![Page 26: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/26.jpg)
Color
Year
Age
2002 2003 2004
1
2
3
Red
Blue
Green
2004, Red, 1
2004, Blue, 1
2004, Green, 1
2003, Red, 1
2003, Blue, 1
2003, Green, 1
2002, Red, 1
2002, Blue, 1
2002, Green, 1
A 1 “Slice” of the Age
Dimension
![Page 27: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/27.jpg)
How MDC works
• Rows are organized in extents based upon dimensions
Dimension BlockIndex onColor
Red Red
Red Red Red
Blue
Blue BlueBlueBlue
BlueBlue
GreenGreenGreen
Green Green Green
2000
2000
2000
2000
2000
2000
2000
2001
2001
2001
2001
2001
2002
2002
2002
2002
2003
2003
Dimension BlockIndex on Year
![Page 28: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/28.jpg)
MultiDimensional Clustering
• MDC introduces indexes that are block-based – much smaller than record-basedo A pointer for each block instead of a pointer for
each row• MDC allows a table to be physically clustered on
more than one key or dimension • MDC table is able to maintain and guarantee
clustering over all dimensions automatically and continuously
![Page 29: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/29.jpg)
MDC Indexes• A dimension block index is automatically created
for each dimension specified• A composite block index is automatically created
containing all columns across all dimensions• Composite index used to maintain clustering• Much lower overhead for logging
![Page 30: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/30.jpg)
Creating an MDC table
Create table t1 (age int, color char(10), year char(4), c1 int, c2 int)
organize by dimensions (age, color, year)
• Three dimension block indexes (one each for age, color and year).
• A composite block index is also created which includes (age,color, year).
• Traditional “RID” indexes can also be created on an MDC
• Can logical AND/OR between BID and RID indexes
![Page 31: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/31.jpg)
Color
Year
Age
2002 2003 2004
1
2
3
Red
Blue
Green
2004, Red, 1
2004, Blue, 1
2004, Green, 1
2003, Red, 1
2003, Blue, 1
2003, Green, 1
2002, Red, 1
2002, Blue, 1
2002, Green, 1
Select Processing in MDC (ex #1)
Select …From TableWhere Age = ‘1’
![Page 32: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/32.jpg)
Color
Year
Age
2002 2003 2004
1
2
3
Red
Blue
Green
2002, Red, 1 2003, Red, 1 2004, Red, 1
Select …From TableWhere color = ‘Red’
Select Processing in MDC (ex #2)
![Page 33: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/33.jpg)
Color
Year
Age
2002 2003 2004
1
2
3
Red
Blue
Green
2002, Red, 1
Select Processing in MDC (ex #3)
Select …
From Table
Where color = ‘Red’
And Age = 1
And Year = ‘2002’
![Page 34: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/34.jpg)
Insert Processing in MDC
• Probe composite block index to see if this is a new combination of dimensions (new logical cell)
• If existing, search list of BIDs to look for space to insert row
• If new logical cell or all blocks full for an existing cell, then create a new block
![Page 35: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/35.jpg)
Delete Processing in MDC
• If the record being deleted is not the last record in block, UDB just deletes the record and removes its RID from any record based indexes
• If deleting last record in block, UDB frees the block by changing its IN_USE status bit and removing the BID from all block indexes and also remove RID from record based indexes
![Page 36: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/36.jpg)
Update Processing in MDC
• Updates on non-dimension values are done in place just as with regular tableso No need to update block indexes unless no
space is found and a new block needs to be added to cell
• Updates of dimension values are treated as delete/inserto Block indexes will need to be updated
![Page 37: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/37.jpg)
MDC Benefits• Can cluster in multiple dimensions• Clustering is automatically and dynamically
maintained over time.• Reorg not necessary for re-clustering• Block indexes are much smaller and have much
less overhead for maintenance and logging
![Page 38: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/38.jpg)
Design Guidelines for MDC
• MDC is great tool• But, used incorrectly, can make things worse just
as much as it can make things better• Requires knowledge of data and data useage by
users
![Page 39: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/39.jpg)
MDC Design• Most important design criteria for MDC is to select
proper dimension columns and appropriate exent size
• Columns that are used in queries as equality or range predicates
• Low cardinality• Desire high density – blocks are mostly full• Generally no more than 3 or 4 dimensions
![Page 40: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/40.jpg)
MDC Size Considerations
• At least one extent will be allocated for every unique combination of dimensions in the data
• Evaluate dimension volumetrics and row size to establish tablespace extent sizeo Select dimcol1, dimcol2, dimcol3, count(*)
from table• Example: 8k page size * 32 page extent size
gives 256k extent size• If you have 1 million unique dimension
combinations – minimum table size of 256 GB!!
![Page 41: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/41.jpg)
What happens if you choose wrong??
• A high cardinality column(s) will explode the size of your table and destroy performance!!!
• Remember that a block is physically allocated for each unique combination of dimension key values
• NEVER use a high cardinality column or a unique column for an MDC dimension
![Page 42: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/42.jpg)
Down Right StupidStupid
Stupid
Stupid
Stupid
Choosing a unique column as a dimension is just:
![Page 43: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/43.jpg)
Using column expressions with MDC• What if a column is a good dimension candidate,
but cardinality is way too high (ex: timestamp column)
• Create table t1 (c1 timestamp, c2 int, c3 int generated always as year(c1)) organize by dimensions (c2, c3)
• Monotonic – generated column increases/decreases the same as base column
• A non-monotonic column will only allow equality or IN predicates on the base column to use the block index
![Page 44: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/44.jpg)
MDC tables and database partitioning
• DB2 LUW DPF partitioning is just a way to spread the data across partitions (not range partitioning like DB2/ZOS
• The reason for partitioning a table is independent of whether the table is an MDC table or a regular table
• Can partition on a dimension column or a non-dimension columno However, partitioning on a dimension column means
that all rows for a particular dimension value exist on only 1 partition
• If partitioning, remember that logical cells can spread across partitionso Important for sizing of extents
![Page 45: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/45.jpg)
Block Index Considerations
• Composite block index columns are ordered based upon “organize by dimensions” clause
• Create table t1 (c1 int, c2 int, c3 int, c4 int) organize by dimensions (c1, c4, (c3,c1), c2)o Composite index will be (c1,c4,c3,c2)
• Create table t1 (c1 int, c2 int, c3 int, c4 int) organize by dimensions (c1, c2, (c3,c1), c4)o Composite index will be (c1,c2,c3,c4)
![Page 46: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/46.jpg)
The Customer is always right
EverythingElse
Customer
![Page 47: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/47.jpg)
![Page 48: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/48.jpg)
To make a long story short
![Page 49: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/49.jpg)
Questions???
![Page 50: A Primer on Multidimensional Clustering for UDB LUW](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813eb7550346895da91d98/html5/thumbnails/50.jpg)