adams - hash joins oracle

14
Steve Adams [email protected] Ixor a Unix + Oracle = Hash Join Internals

Upload: rockerabc123

Post on 15-Nov-2014

108 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Hash Join Internals

Page 2: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Hash Functions

• A hash function maps arbitrary key values to integer “hash values”

• Data is stored in an array based on the hash values

• Data access is very efficient• just compute the hash value of the key

and lookup the corresponding array element

Page 3: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Hash Buckets

• Too many possible hash values• hash tables would be large and sparse

• Hash values are mapped to hash buckets using the mod() function• 100056732 % 100 = 32

• Each hash table element is a “bucket”

• Hash values that map to the same bucket are stored on a “collision chain”

Page 4: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Numeric keys

• Key data itself is the hash value

• Hash bucket is computed using MOD• binary hashes are cheap to compute

• just a bitwise SHIFT operation

• prime number hashes randomize the distribution of keys to hash buckets

• this prevents uneven or “skew” distributions

Page 5: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Non-numeric Keys

• Internal hash function to get hash value• examples:

• ‘Adams’ 3016007180

• ‘Millsap’ 1765538108

• DBMS_UTILITY.GET_HASH_VALUE()

• Binary MOD() used to map hash values to hash buckets

Page 6: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Hash Tables

Hash Function Hash Value

Key Value

Hash Bucket Number

Bucket 2 Bucket 3 Bucket 4 Bucket 5 Bucket 6 Bucket 7 Bucket 8

Hash Table

Bucket 1hash

bucketheaders

collisionchains

KeyHash Value

KeyHash Value

KeyHash Value

KeyHash Value

KeyHash Value

KeyHash Value

KeyHash Value

KeyHash Value

Page 7: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Hash Joins

• Concept• read first row source and build a hash table• read second row source and join via hash table• applicable to (in)-equality joins & CBO only

• Approach• first “partition” both inputs by hash value• for corresponding pairs of partitions

• build an in-memory hash table from one input

• probe the hash table with rows from the other

Page 8: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Hash Bucket Bitmap

Hash Table PartitioningHash Table Data

Partition 2 Partition 3 Partition 4Partition 1

Hash Bucket Bitmap

hashbucketbitmap

hashtable

partitionbuffers

savedpartition

extents

Page 9: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Bit Vector Filtering

• When partitioning the first input• build a bitmap of non-empty hash buckets

• When partitioning the second input• check the hash bucket bitmap• keys that map to empty hash buckets cannot be

joined• for equality joins these rows can be immediately

excluded

Page 10: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Partition Histogram

• Partition sizes will be uneven if the data distribution is skew

• Partition histogram records for each partition pair• number of keys

• bytes of memory required

• Allows dynamic role reversal• for each partition, the hash table is built from the input

with the smaller memory requirement

• Allows optimum memory use when joining multiple partitions simultaneously

Page 11: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Hash Join Processing

• Phases• initialization (planning memory use)• build input partitioning• probe input partitioning

• may begin to return rows

• joining of saved partitions• may require sub-partitioning of large partitions

• Hash area memory used differently in each phase

Page 12: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Build Input Partitioning

Hash Function

Hash Value

Join Key Non-Key Columns

Join Key Non-Key Columns

Hash Partition Number

Hash Bucket Number

Partition Histogram

Hash Bucket Bitmap

Partition Buffers

Input Buffers

Hash Area

Page 13: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Probe Input Partitioning

Hash Function

Hash Area

Partition Histogram

Hash Bucket Bitmap

Partition Buffers

Input Buffers

Hash TableOutput Rows

Page 14: Adams - Hash Joins Oracle

Steve [email protected]

Unix + Oracle =

Joining Saved Partitions

Partition Histogram

Input Buffers

Hash Area

Hash TableOutput Rows