adams - hash joins oracle
TRANSCRIPT
Steve [email protected]
Unix + Oracle =
Hash Functions
• A hash function maps arbitrary key values to integer “hash values”
• Data is stored in an array based on the hash values
• Data access is very efficient• just compute the hash value of the key
and lookup the corresponding array element
Steve [email protected]
Unix + Oracle =
Hash Buckets
• Too many possible hash values• hash tables would be large and sparse
• Hash values are mapped to hash buckets using the mod() function• 100056732 % 100 = 32
• Each hash table element is a “bucket”
• Hash values that map to the same bucket are stored on a “collision chain”
Steve [email protected]
Unix + Oracle =
Numeric keys
• Key data itself is the hash value
• Hash bucket is computed using MOD• binary hashes are cheap to compute
• just a bitwise SHIFT operation
• prime number hashes randomize the distribution of keys to hash buckets
• this prevents uneven or “skew” distributions
Steve [email protected]
Unix + Oracle =
Non-numeric Keys
• Internal hash function to get hash value• examples:
• ‘Adams’ 3016007180
• ‘Millsap’ 1765538108
• DBMS_UTILITY.GET_HASH_VALUE()
• Binary MOD() used to map hash values to hash buckets
Steve [email protected]
Unix + Oracle =
Hash Tables
Hash Function Hash Value
Key Value
Hash Bucket Number
Bucket 2 Bucket 3 Bucket 4 Bucket 5 Bucket 6 Bucket 7 Bucket 8
Hash Table
Bucket 1hash
bucketheaders
collisionchains
KeyHash Value
KeyHash Value
KeyHash Value
KeyHash Value
KeyHash Value
KeyHash Value
KeyHash Value
KeyHash Value
Steve [email protected]
Unix + Oracle =
Hash Joins
• Concept• read first row source and build a hash table• read second row source and join via hash table• applicable to (in)-equality joins & CBO only
• Approach• first “partition” both inputs by hash value• for corresponding pairs of partitions
• build an in-memory hash table from one input
• probe the hash table with rows from the other
Steve [email protected]
Unix + Oracle =
Hash Bucket Bitmap
Hash Table PartitioningHash Table Data
Partition 2 Partition 3 Partition 4Partition 1
Hash Bucket Bitmap
hashbucketbitmap
hashtable
partitionbuffers
savedpartition
extents
Steve [email protected]
Unix + Oracle =
Bit Vector Filtering
• When partitioning the first input• build a bitmap of non-empty hash buckets
• When partitioning the second input• check the hash bucket bitmap• keys that map to empty hash buckets cannot be
joined• for equality joins these rows can be immediately
excluded
Steve [email protected]
Unix + Oracle =
Partition Histogram
• Partition sizes will be uneven if the data distribution is skew
• Partition histogram records for each partition pair• number of keys
• bytes of memory required
• Allows dynamic role reversal• for each partition, the hash table is built from the input
with the smaller memory requirement
• Allows optimum memory use when joining multiple partitions simultaneously
Steve [email protected]
Unix + Oracle =
Hash Join Processing
• Phases• initialization (planning memory use)• build input partitioning• probe input partitioning
• may begin to return rows
• joining of saved partitions• may require sub-partitioning of large partitions
• Hash area memory used differently in each phase
Steve [email protected]
Unix + Oracle =
Build Input Partitioning
Hash Function
Hash Value
Join Key Non-Key Columns
Join Key Non-Key Columns
Hash Partition Number
Hash Bucket Number
Partition Histogram
Hash Bucket Bitmap
Partition Buffers
Input Buffers
Hash Area
Steve [email protected]
Unix + Oracle =
Probe Input Partitioning
Hash Function
Hash Area
Partition Histogram
Hash Bucket Bitmap
Partition Buffers
Input Buffers
Hash TableOutput Rows
Steve [email protected]
Unix + Oracle =
Joining Saved Partitions
Partition Histogram
Input Buffers
Hash Area
Hash TableOutput Rows