hash tables briana b. morrison adapted from william collins
TRANSCRIPT
![Page 1: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/1.jpg)
Hash Tables
Briana B. Morrison
Adapted from William Collins
![Page 2: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/2.jpg)
Hashing 2
averageTimeS(n), THE AVERAGE TIME
FOR A SUCCESSFUL SEARCH
averageTimeU(n), … UNSUCCESSFUL …
worstTimeS(n)
worstTimeU(n)
![Page 3: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/3.jpg)
Hashing 3
LET’S START WITH A REVIEW OFEARLIER SEARCH TECHNIQUES:
![Page 4: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/4.jpg)
Hashing 4
Sequential Search
Given a vector of integers:
v = {12, 15, 18, 3, 76, 9, 14, 33, 51, 44}
What is the best case for sequential search? O(1) when value is the first element
What is the worst case? O(n) when value is last element, or value is not in the list
What is the average case? O(1/2 * n) which is O(n)
![Page 5: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/5.jpg)
Hashing 5
SEQUENTIAL SEARCH IN STL // Postcondition: if there is an item in the range of iterators // from first (inclusive) through last // (exclusive) that is equal to value, the // iterator returned is the first iterator i in that // range such that *i = value. Otherwise, // last is returned. The worstTime(n) is O(n). template <typename InputIterator, typename T> InputIterator find(InputIterator first, InputIterator last, const T& value) { while (first != last && *first != value) ++first; return first; }
![Page 6: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/6.jpg)
Hashing 6
THE worstTimeU(n) IS LINEAR IN n.
DITTO FOR worstTimeS(n),averageTimeU(n), AND averageTimeS(n).
![Page 7: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/7.jpg)
Hashing 7
Binary Search
Given a vector of integers:v = {3, 9, 12, 14, 15, 18, 33, 44, 51, 76}
What is the best case for binary search? O(1) when element is the middle element
What is the worst case? O(log n) when element is first, last, or not in list
What is the average case? O(log n)
![Page 8: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/8.jpg)
Hashing 8
BINARY SEARCH OF A SORTED
CONTAINER: template <typename ForwardIterator, typename T> inline bool binary_search (ForwardIterator first, ForwardIterator last, const T& value) example: if (binary_search (vector.begin(), vector.end(), value))
![Page 9: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/9.jpg)
Hashing 9
Do you remember how binary search works? Distance len = last - first; Distance half; RandomAccessIterator middle; while (len > 0) { half = len / 2; middle = first + half; if (*middle < value) { first = middle + 1; len = len - half - 1; } else len = half; } return first; }
![Page 10: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/10.jpg)
Hashing 10
THE worstTimeU(n) IS LOGARITHMIC INn.
DITTO FOR worstTimeS(n),averageTimeU(n), AND averageTimeS(n).
![Page 11: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/11.jpg)
Hashing 11
NOW LET’S FOCUS ON AN UNUSUALBUT VERY EFFICIENT SEARCHTECHNIQUE:
HASHING
![Page 12: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/12.jpg)
Hashing 12
THE CLASS IN WHICH HASHING IS
IMPLEMENTED IS THE hash_map
CLASS. THIS IS NOT YET IN THE
STANDARD TEMPLATE LIBRARY.
![Page 13: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/13.jpg)
Hashing 13
TO A USER, THE hash_map CLASS
IS SIMILAR TO THE map CLASS,
EXCEPT hash_map HAS ONLY A FEW
METHODS, SUCH AS insert, erase, AND
find. AND THE TIMING ESTIMATES
FOR THOSE METHODS ARE LOWERTHAN IN THE map CLASS.
![Page 14: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/14.jpg)
Hashing 14
RECALL THAT EACH VALUE (THATIS, ITEM) IN A MAP IS A PAIR WHOSE
FIRST COMPONENT IS OF TYPE Key
AND WHOSE SECOND COMPONENT IS
OF TYPE T. THE KEYS ARE UNIQUE,THAT IS, NO TWO DISTINCT VALUESHAVE THE SAME KEY.
![Page 15: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/15.jpg)
Hashing 15
HERE ARE THE METHOD
INTERFACES FOR THE hash_map
CLASS:
![Page 16: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/16.jpg)
Hashing 16
1. // Postcondition: this hash_map is empty. hash_map( );
2. // Postcondition: the number of items in this hash_map// has been returned.
int size( );
![Page 17: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/17.jpg)
Hashing 17
3. // Postcondition: If an item with x's key had already been// inserted into this hash_map, the pair// returned consists of an iterator positioned// at the previously inserted item, and false. // Otherwise, the pair returned consists of
// an iterator positioned at the newly inserted// item, and true. Timing estimates are// discussed later.
pair<iterator, bool> insert ( const value_type<const key_type, T>& x);
![Page 18: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/18.jpg)
Hashing 18
4. // Postcondition: if this hash_map already contains a value// whose key part is key, a reference to that// value's second component has been// returned. Otherwise, a new value, <key,// T( )>, is inserted into this hash_map. Timing// estimates are discussed later.
T& operator[ ] (const key_type& key);
![Page 19: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/19.jpg)
Hashing 19
5. // Postcondition: If this hash_map contains a value whose// first component equals key, an iterator// positioned at that value has been returned.// Otherwise, an iterator at the same
// position as end() has been returned. // Timing estimates are discussed later. iterator find (const key_type& key);
6. // Precondition: itr is positioned at value in this hash_map. // Postcondition: the value that itr is positioned at has been // deleted from this hash_map. Timing // estimates are discussed later in this chapter. void erase (iterator itr);
![Page 20: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/20.jpg)
Hashing 20
7. // Postcondition: an iterator positioned at the beginning // of this hash_map has been returned. // Timing estimates are discussed later. iterator begin( );
8. // Postcondition: an iterator has been returned that can be// used in comparisons to terminate iterating// through this hash_map.
iterator end( );
9. // Postcondition: the space for this hash_map object has// been deallocated.~hash_map( );
![Page 21: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/21.jpg)
Hashing 21
Map vs. Hashmap
What are the differences between a map and a hashmap? Interface Efficiency Applications Implementation
![Page 22: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/22.jpg)
Hashing 22
WE’LL STUDY THE TIME ESTIMATES
AFTER WE DEFINE THE METHODS.
BUT BASICALLY, FOR find, insert, AND
erase,
averageTime(n) IS CONSTANT!
![Page 23: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/23.jpg)
Hashing 23
FIELDS IN THE hash_map CLASS
![Page 24: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/24.jpg)
Hashing 24
CONTIGUOUS
array? vector? deque? heap?
LINKED
Linked? list? map?
BUT NONE OF THESE WILL GIVE
CONSTANT AVERAGE TIME FOR
SEARCHES, INSERTIONS AND
REMOVALS.
![Page 25: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/25.jpg)
Hashing 25
HERE IS THE BASIC IDEA:
buckets // an array of values
count // the number of values in the hash_map
![Page 26: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/26.jpg)
Hashing 26
LET’S SEE WHERE THAT LEADS.
SUPPOSE persons IS A HASH MAPTHAT WILL HOLD UP TO 1000VALUES. EACH VALUE CONSISTSOF A UNIQUE 3-DIGIT INTEGER (THEKEY), AND A NAME.
![Page 27: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/27.jpg)
Hashing 27
buckets count 0 1 2 . . . 999
![Page 28: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/28.jpg)
Hashing 28
Persons [351] = “Prashant”;
persons [108] = “Barrett”;
persons[435] = “Lin”;
WHERE SHOULD WE STORE THEVALUE WHOSE KEY IS 351?
![Page 29: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/29.jpg)
Hashing 29
buckets count
0
108 351 435
999
3
108 Barrett
351 Prashant
435 Lin
? ?…
…
…
…
![Page 30: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/30.jpg)
Hashing 30
NOW FOR SOMETHING SLIGHTLY
DIFFERENT: SUPPOSE persons IS A
HASH MAP THAT HOLDS UP TO 1000
VALUES. EACH VALUE CONSISTS OF
A 10-DIGIT TELEPHONE NUMBER
(THE KEY), AND A NAME.
![Page 31: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/31.jpg)
Hashing 31
persons [9876543210] = “Prashant”;
persons [6103301256] = “Barrett”;
persons [6103309816] = “Lin”;
persons [4153576256] = “Sutey”;
WHERE SHOULD THESE VALUES
BE STORED?
![Page 32: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/32.jpg)
Hashing 32
9876543210 6103301256
6103309816 4153576256
To make these values fit into the table, we need to mod by the table size; i.e., key % 1000.
210
OOPS!
816
256
![Page 33: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/33.jpg)
Hashing 33
WHEN TWO DIFFERENT KEYS MAP TOTHE SAME INDEX, THAT IS CALLED ACOLLISION.
KEYS THAT MAP TO THE SAME INDEXARE CALLED SYNONYMS.
![Page 34: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/34.jpg)
Hashing 34
HASHING:
AN ALGORITHM THAT TRANSFORMSA KEY INTO AN ARRAY INDEX.
![Page 35: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/35.jpg)
Hashing 35
THE ALGORITHM HAS TWO PARTS:
1. A HASH FUNCTION: AN EASILYCOMPUTABLE OPERATION ON THE
KEY THAT RETURNS AN unsigned
long, WHICH IS THEN CONVERTED
INTO AN INDEX IN THE ARRAY
buckets;
2. A COLLISION HANDLER.
![Page 36: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/36.jpg)
Hashing 36
Hash Codes Suppose we have a table of size N A hash code is:
A number in the range 0 to N-1 We compute the hash code from the key You can think of this as a “default position” when
inserting, or a “position hint” when looking up A hash function is a way of computing a hash code Desire: The set of keys should spread evenly over
the N values When two keys have the same hash code: collision
![Page 37: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/37.jpg)
Hashing 37
Hash Functions
A hash function should be quick and easy to compute.
A hash function should achieve an even distribution of the keys that actually occur across the range of indices for both random and non-random data.
Calculation should involve the entire search key.
![Page 38: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/38.jpg)
Hashing 38
Examples of Hash Functions Usually involves taking the key, chopping it
up, mix the pieces together in various ways Examples:
Truncation – ignore part of key, use the remaining part as the index
Folding – partition the key into several parts and combine the parts in a convenient way (adding, etc.)
After calculating the index, use modular arithmetic. Divide by the size of the index range, and take the remainder as the result
![Page 39: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/39.jpg)
Hashing 39
Example Hash Function
h f(2 2 ) = 2 2 2 2 % 7 = 1
h f(4 ) = 4 4 % 7 = 4
0
1
4
6
23
5
t ab leE n t ry [1 ]
tab leE n t ry [4 ]
![Page 40: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/40.jpg)
Hashing 40
Devising Hash Functions Simple functions often produce many collisions
... but complex functions may not be good either! It is often an empirical process
Adding letter values in a string: same hash for strings with same letters in different order
Better approach:size_t hash = 0;for (size_t i = 0; i < s.size(); ++i)
hash = hash * 31 + s[i];
![Page 41: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/41.jpg)
Hashing 41
Devising Hash Functions (2) The String hash is good in that:
Every letter affects the value The order of the letters affects the value The values tend to be spread well over the integers
![Page 42: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/42.jpg)
Hashing 42
Devising Hash Functions (3)
Guidelines for good hash functions:
Spread values evenly: as if “random”
Cheap to compute
Generally, number of possible values much greater than table size
![Page 43: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/43.jpg)
Hashing 43
Hash Code Maps
Memory address: We reinterpret the memory
address of the key object as an integer
Good in general, except for numeric and string keys
Integer cast: We reinterpret the bits of the
key as an integer Suitable for keys of length
less than or equal to the number of bits of the integer type (e.g., char, short, int and float on many machines)
Component sum: We partition the bits of
the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows)
Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double on many machines)
![Page 44: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/44.jpg)
Hashing 44
Hash Code Maps (cont.)
Polynomial accumulation: We partition the bits of the key
into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a0 a1 … an1
We evaluate the polynomial
p(z) a0 a1 z a2 z2 … … an1zn1
at a fixed value z, ignoring overflows
Especially suitable for strings (e.g., the choice z 33 gives at most 6 collisions on a set of 50,000 English words)
Polynomial p(z) can be evaluated in O(n) time using Horner’s rule:
The following polynomials are successively computed, each from the previous one in O(1) time
p0(z) an1
pi (z) ani1 zpi1(z) (i 1, 2, …, n 1)
We have p(z) pn1(z)
![Page 45: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/45.jpg)
Hashing 45
HERE IS THE START OF THE
hash_map CLASS:
template<typename Key, typename T, typename HashFunc> class hash_map {
THE THIRD TEMPLATE PARAMETER
IS A FUNCTION CLASS: A CLASS IN
WHICH THE FUNCTION-CALL
OPERATOR, operator( ), IS
OVERLOADED. THIS IS THE HASH
FUNCTION CLASS.
![Page 46: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/46.jpg)
Hashing 46
THE HEADING FOR operator( ) IS
unsigned long operator( ) (const key_type& key)
FOR EXAMPLE, WE CAN DEFINE A
SIMPLE HASH FUNCTION CLASS IF
EACH KEY IS AN int:
class hash_func { public: unsigned long operator( ) (const int& key) { return (unsigned long)key; } // overloaded operator( ) } // class hash_func
![Page 47: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/47.jpg)
Hashing 47
HERE IS A PROGRAM WITH A
hash_map CLASS IN WHICH EACHVALUE CONSISTS OF A TELEPHONE
EXTENSION AND THE PERSON ATTHAT EXTENSION. THE ABOVE
hash_func IS USED.
![Page 48: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/48.jpg)
Hashing 48
int main() { typedef hash_map<int, string, hash_func> hash_class; hash_class extensions; hash_class::iterator itr; extensions [5520] = "Yvonne"; extensions [5415] = "Jim"; extensions [5416] = "Penny"; extensions [5537] = "Chun Wai"; extensions [5273] = "Jim"; for (itr = extensions.begin(); itr != extensions.end(); itr++) cout << (*itr).first << " " << (*itr).second << endl; cout << "The number of items is " << extensions.size() << endl;
![Page 49: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/49.jpg)
Hashing 49
if (extensions.find (5537) != extensions.end()) { cout << endl << "At extension " << 5537 << " is " << extensions [5537] << endl; extensions.erase (extensions.find (5537)); } // if for (itr = extensions.begin( ); itr != extensions.end( ); itr++) cout << (*itr).first << " " << (*itr).second << endl; return 0; } // main
![Page 50: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/50.jpg)
Hashing 50
HERE IS THE OUTPUT: 5520 Yvonne 5537 Chun Wai 5415 Jim 5416 Penny 5273 Jim The number of items is 5 At extension 5537 is Chun Wai 5520 Yvonne 5415 Jim 5416 Penny 5273 Jim
![Page 51: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/51.jpg)
Hashing 51
THERE IS NO OBVIOUS ORDER OFTHE KEYS. IF THE CONTAINER MUST
ALWAYS BE IN ORDER, USE A map
INSTEAD OF A hash_map.
![Page 52: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/52.jpg)
Hashing 52
HERE IS ANOTHER hash_func CLASS,ONE IN WHICH THE KEY IS A STRINGOF UP TO 20 CHARACTERS.BASICALLY, WE ADD UP THE ASCIIVALUES OF THE KEY’S CHARACTERS.TO FURTHER SPREAD OUT THERESULT, PARTIAL TOTALS ARE MUL-TIPLIED BY 13, AND THE FINAL TOTALIS MULTIPLIED BY A BIG PRIME.
![Page 53: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/53.jpg)
Hashing 53
class hash_func{ public:
unsigned long operator( ) (const string& key) { const unsigned long BIG_PRIME = 4294967291; unsigned long total = 0;
for (unsigned i = 0; i < key.length(); i++) total += 13 * key [i]; return total * BIG_PRIME; } // operator( )}; // class hash_func
![Page 54: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/54.jpg)
Hashing 54
THE hash_func CLASS IS SUPPLIED BY
THE USER / CLIENT PROGRAMMER.
THE hash_map CLASS CONVERTS THE
unsigned long RETURNED BY operator( )
INTO AN ARRAY INDEX BY TAKING
THE REMAINDER % CAPACITY OF
buckets.
![Page 55: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/55.jpg)
Hashing 55
EXERCISE: SUPPOSE THE CAPACITY OF buckets IS 203, AND FOR key1, key2, AND key3,
THE unsigned long NUMBERS
RETURNED BY hash_func (const string&
key) ARE 202, 203, AND 204
RESPECTIVELY. AT WHAT
LOCATIONS WOULD THE VALUES
WITH KEYS key1, key2, AND key3 BE
STORED?
![Page 56: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/56.jpg)
Hashing 56
AS YOU MIGHT HAVE GUESSED,
HASHING IS INEFFICIENT WHEN
THERE ARE A LOT OF COLLISIONS.
![Page 57: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/57.jpg)
Hashing 57
USERS OF THE hash_map CLASS“HOPE” THAT THE KEYS ARE
SCATTERED RANDOMLYTHROUGHOUT THE TABLE. THIS
HOPE IS FORMALLY STATED AS
FOLLOWS:
![Page 58: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/58.jpg)
Hashing 58
THE UNIFORM HASHING ASSUMPTION
EACH KEY IS EQUALLY LIKELY TOHASH TO ANY ONE OF THE TABLEADDRESSES, INDEPENDENTLY OFWHERE THE OTHER KEYS HAVEHASHED.
![Page 59: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/59.jpg)
Hashing 59
EVEN IF THE UNIFORM HASHINGASSUMPTION HOLDS, THERE MAYSTILL BE COLLISIONS.
![Page 60: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/60.jpg)
Hashing 60
Collision Handlers
NOW WE’LL LOOK AT SPECIFIC COLLISION HANDLERS:
Chaining Linear Probing (Open Addressing) Double Hashing Quadratic Hashing
![Page 61: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/61.jpg)
Hashing 61
Collision Handling
Collisions occur when different elements are mapped to the same cell
Chaining: let each cell in the table point to a linked list of elements that map there
Chaining is simple, but requires additional memory outside the table
01234 451-229-0004 981-101-0004
025-612-0001
![Page 62: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/62.jpg)
Hashing 62
CHAINING (ALSO CALLED CHAINED
HASHING): AT INDEX i IN buckets,
STORE THE LIST OF ALL VALUES
WHOSE KEYS HASH TO i. HERE ARE THE FIELDS FOR CHAINED
HASHING:
![Page 63: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/63.jpg)
Hashing 63
list <value_type< const key_type, T> >* buckets; // at each index in the array buckets, // we will store the list of all // items whose keys hashed to that index int count, // number of items in this hash_map length; // number of buckets in this hash_map // these two fields are used to calculate the load to // know when to increase the size of the table hash_func hash; // hash is a function object
![Page 64: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/64.jpg)
Hashing 64
Chaining with Separate Lists Example
< b u ck et 0 >
< b u ck et n - 1 >
< b u ck et 2 >
< b u ck et 1 >
. . . .
< B uc k e t 1 > 8 9 ( 1 ) 4 5 ( 2 )
< B uc k e t 0 >
< B uc k e t 3 > 1 4 ( 1 )
< B uc k e t 2 > 3 5 ( 1 )
< B uc k e t 1 0 > 5 4 ( 1 ) 7 6 ( 2 )
< B uc k e t 6 > 9 4 ( 1 )
< B uc k e t 9 >
< B uc k e t 8 >
< B uc k e t 7 >
< B uc k e t 5 >
< B uc k e t 4 >
7 7 ( 1 )
![Page 65: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/65.jpg)
Hashing 65
Chaining Picture
Two items hashed to bucket 3
Three items hashed to bucket 4
![Page 66: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/66.jpg)
Hashing 66
INSERT VALUES WITH THESE KEYS:
21555516127178626358610330935861033090007178621359717862745121555543586103300451
ASSUME length = 1000. IGNORE 2ND COMPONENT
IN VALUE, IGNORE prev FIELD, USE ‘X’ AT END.
![Page 67: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/67.jpg)
Hashing 67
buckets count 0 1
... 358
359... 451 ... 612
6103309000 X 8
X
7178626358 6103309358
7178627451
2155551612 X
6103300451 X
7178621359 2155554358 X X
![Page 68: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/68.jpg)
Hashing 68
FOR THE find METHOD,
averageTimeS(n, m) n / 2m iterations.
<= 0.75 / 2
SO averageTimeS(n, m) <= A CONSTANT.
averageTimeS(n, m) IS CONSTANT.
![Page 69: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/69.jpg)
Hashing 69
EVEN IF THE UNIFORM HASHING
ASSUMPTION HOLDS, IT IS POSSIBLE
FOR EACH KEY TO HASH TO THE
SAME INDEX. TO SEARCH THE LIST
AT THAT INDEX TAKES LINEAR-IN-n
TIME.
SO worstTimeS(n, m) IS LINEAR IN n.
![Page 70: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/70.jpg)
Hashing 70
THE SAME RESULTS, CONSTANT
AVERAGE TIME AND LINEAR WORST
TIME, HOLD FOR insert AND erase.
![Page 71: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/71.jpg)
Hashing 71
The next collision handler is Linear Probing (OPEN-ADDRESS HASHING). AT MOST ONE VALUE IS STORED AT
EACH INDEX IN buckets.
![Page 72: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/72.jpg)
Hashing 72
HERE IS HOW THE unsigned long
RETURNED BY hash_func IS
CONVERTED INTO AN INDEX: int index = hash_func (key) % length; THIS IS DONE IN THE HASH_MAP CLASS, BECAUSE ONLY THE HASH_MAP CLASS KNOWS THE LENGTH OF THE ARRAY.
![Page 73: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/73.jpg)
Hashing 73
WHEN COLLISION OCCURS: SEARCH THE TABLE UNTIL AN
“OPEN” SLOT IN buckets IS FOUND.
THIS IS ALSO KNOWN AS “OFFSET-
OF-1” COLLISION HANDLER.
![Page 74: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/74.jpg)
Hashing 74
OFFSET-OF-1 COLLISION HANDLER:
IF buckets [index] ALREADY HAS
ANOTHER ELEMENT, TRY
buckets [index + 1], buckets [index + 2], …,
buckets [length – 1], buckets [0],
buckets [1], …, buckets [index – 1].
![Page 75: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/75.jpg)
Hashing 75
Hash Table Using Open Probe Addressing Example7 7
8 9
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(a)
1
1
1
1
1
In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4
2
7 7
8 9
4 5
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(b )
1
1
1
1
1
In s ert4 5
2
7 7
8 9
4 5
1 4
3 5
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(c)
1
1
1
1
1
In s ert3 5
3
2
7 7
8 9
4 5
1 4
3 5
7 6
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(d )
1
1
1
1
1
In s ert7 6
3
7
5 4 5 4 5 45 4
Insert 45
(mod by table size … % 11)
![Page 76: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/76.jpg)
Hashing 76
Hash Table Using Open Probe Addressing Example7 7
8 9
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(a)
1
1
1
1
1
In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4
2
7 7
8 9
4 5
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(b )
1
1
1
1
1
In s ert4 5
2
7 7
8 9
4 5
1 4
3 5
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(c)
1
1
1
1
1
In s ert3 5
3
2
7 7
8 9
4 5
1 4
3 5
7 6
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(d )
1
1
1
1
1
In s ert7 6
3
7
5 4 5 4 5 45 4
Insert 35
![Page 77: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/77.jpg)
Hashing 77
Hash Table Using Open Probe Addressing Example7 7
8 9
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(a)
1
1
1
1
1
In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4
2
7 7
8 9
4 5
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(b )
1
1
1
1
1
In s ert4 5
2
7 7
8 9
4 5
1 4
3 5
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(c)
1
1
1
1
1
In s ert3 5
3
2
7 7
8 9
4 5
1 4
3 5
7 6
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(d )
1
1
1
1
1
In s ert7 6
3
7
5 4 5 4 5 45 4
Insert 76
![Page 78: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/78.jpg)
Hashing 78
Hash Table Using Open Probe Addressing Example7 7
8 9
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(a)
1
1
1
1
1
In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4
2
7 7
8 9
4 5
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(b )
1
1
1
1
1
In s ert4 5
2
7 7
8 9
4 5
1 4
3 5
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(c)
1
1
1
1
1
In s ert3 5
3
2
7 7
8 9
4 5
1 4
3 5
7 6
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(d )
1
1
1
1
1
In s ert7 6
3
7
5 4 5 4 5 45 4
![Page 79: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/79.jpg)
Hashing 79
Linear Probing Open addressing: the
colliding item is placed in a different cell of the table
Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell
Each table cell inspected is referred to as a “probe”
Colliding items lump together, causing future collisions to cause a longer sequence of probes
Example: h(x) x mod 13 Insert keys 18, 41, 22,
44, 59, 32, 31, 73, in this order
0 1 2 3 4 5 6 7 8 9 10 11 12
41 18445932223173 0 1 2 3 4 5 6 7 8 9 10 11 12
![Page 80: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/80.jpg)
Hashing 80
WE NEED TO KNOW WHEN A SLOT IS FULL
OR OCCUPIED.
HOW?
INSTEAD OF JUST T() STORED IN THE BUCKETS (BECAUSE T() COULD BE A VALID VALUE), THE BUCKET WILL STORE AN INSTANCE OF THE VALUE_TYPE CLASS.
![Page 81: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/81.jpg)
Hashing 81
TO INDICATE WHETHER A LOCATION
IS OCCUPIED, THE value_type CLASS
WILL HAVE bool occupied; IN ADDITION TO T key;
![Page 82: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/82.jpg)
Hashing 82
key occupied
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 true 460 true 1070 true
312 true
607 true false
![Page 83: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/83.jpg)
Hashing 83
Retrieve
What about when we want to retrieve?
Consider the previous example….
![Page 84: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/84.jpg)
Hashing 84
Hash Table Using Open Probe Addressing Example7 7
8 9
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(a)
1
1
1
1
1
In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4
2
7 7
8 9
4 5
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(b )
1
1
1
1
1
In s ert4 5
2
7 7
8 9
4 5
1 4
3 5
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(c)
1
1
1
1
1
In s ert3 5
3
2
7 7
8 9
4 5
1 4
3 5
7 6
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(d )
1
1
1
1
1
In s ert7 6
3
7
5 4 5 4 5 45 4
Find the value 35. (% 11)
Now find the value 76.
Now find the value 33.
![Page 85: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/85.jpg)
Hashing 85
Hash Table Using Open Probe Addressing Example7 7
8 9
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(a)
1
1
1
1
1
In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4
2
7 7
8 9
4 5
1 4
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(b )
1
1
1
1
1
In s ert4 5
2
7 7
8 9
4 5
1 4
3 5
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(c)
1
1
1
1
1
In s ert3 5
3
2
7 7
8 9
4 5
1 4
3 5
7 6
9 4
0
1
2
3
4
5
6
7
8
9
1 0
(d )
1
1
1
1
1
In s ert7 6
3
7
5 4 5 4 5 45 4
Now delete 35. (% 11)
Now find the value 76.
Now find the value 33.
![Page 86: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/86.jpg)
Hashing 86
Linear Probing Probe by incrementing the index If “fall off end”, wrap around to the beginning
Take care not to cycle forever!
1. Compute index as hash_fcn() % table.size()
2. if table[index] == NULL, item is not in the table
3. if table[index] matches item, found item (done)
4. Increment index circularly and go to 2 Why must we probe repeatedly?
hashCode may produce collisions remainder by table.size may produce collisions
![Page 87: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/87.jpg)
Hashing 87
Search Termination
Ways to obtain proper termination Stop when you come back to your starting point Stop after probing N slots, where N is table size Stop when you reach the bottom the second time Ensure table never full
Reallocate when occupancy exceeds threshold
![Page 88: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/88.jpg)
Hashing 88
IN THE SECOND EXAMPLE, SUPPOSE itr IS POSITIONED AT INDEX 54 AND THE MESSAGE IS my_map.erase (itr);
![Page 89: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/89.jpg)
Hashing 89
key occupied
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 true 460 true 1070 true
312 true
607 true false
Erase value 1069.
false
![Page 90: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/90.jpg)
Hashing 90
key occupied
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 false 460 true 1070 true
312 true
607 true false
Now search for 460.
![Page 91: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/91.jpg)
Hashing 91
NOW A SEARCH 460 FOR WOULD BE
UNSUCCESSFUL BECAUSE 460
INITIALLY HASHES TO 54, AN
UNOCCUPIED LOCATION.
![Page 92: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/92.jpg)
Hashing 92
SOLUTION:bool marked_for_removal;
THE CONSTRUCTOR FOR VALUE_TYPE SETS EACH bucket’s marked_for_removal FIELD TO false.insert SETS marked_for_removal TO false; erase SETS marked_for_removal TO true.SO AFTER THE INSERTIONS:
![Page 93: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/93.jpg)
Hashing 93
marked_for_ key occupied removal
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 true 460 true 1070 true
312 true
607 true false
false
false
falsefalsefalse
false
falsefalse
![Page 94: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/94.jpg)
Hashing 94
AFTER DELETING THE VALUE WITH
KEY 1069:
![Page 95: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/95.jpg)
Hashing 95
marked_for_ key occupied removal
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 true 460 true 1070 true
312 true
607 true false
false
false
truefalsefalse
false
falsefalse
![Page 96: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/96.jpg)
Hashing 96
FOR find, AN UNSUCCESSFUL
SEARCH CANNOT STOP UNTIL buckets
[index].marked_for_removal = false.
![Page 97: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/97.jpg)
Hashing 97
CLUSTER: A SEQUENCE OF NON-EMPTY LOCATIONS
KEYS THAT HASH TO 54 FOLLOW THE SAME COLLISION-PATH AS KEYS THAT HASH TO 55, …
![Page 98: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/98.jpg)
Hashing 98
marked_for_ key occupied removal
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 true 460 true 1070 true
312 true
607 true false
false
false
falsefalsefalse
false
falsefalse
![Page 99: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/99.jpg)
Hashing 99
PRIMARY CLUSTERING: THE
PHENOMENON THAT OCCURS WHEN
THE COLLISION HANDLER ALLOWS
THE GROWTH OF CLUSTERS TO
ACCUMULATE.
THIS WILL OCCUR WITH OFFSET-OF-
1 OR ANY CONSTANT OFFSET.
![Page 100: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/100.jpg)
Hashing 100
SOLUTION 1: DOUBLE HASHING, THAT IS, OBTAIN BOTH INDICES AND OFFSETS BY HASHING:
unsigned long hash_int = hash (key);int index = hash_int % length,offset = hash_int / length;
NOW THE OFFSET DEPENDS ON THEKEY, SO DIFFERENT KEYS WILL USUALLY HAVE DIFFERENT OFFSETS, SO NO MORE PRIMARY CLUSTERING!
Secondary hash function
![Page 101: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/101.jpg)
Hashing 101
TO GET A NEW INDEX:
index = (index + offset) % length;
Notice that if a collision occurs, you rehash from the NEW index value.
![Page 102: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/102.jpg)
Hashing 102
EXAMPLE: length = 11
key index offset15
4
119
8
116
5
158
3
527
5
235
2
330
8
247
3
4
WHERE WOULD THESE KEYS GO IN buckets?
0
1
2
3
4
5
6
7
8
9
10
![Page 103: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/103.jpg)
Hashing 103
index
key 0
47 1 2
35 3
58 4
15 5
16 6
7
27 8
19 910
30
![Page 104: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/104.jpg)
Hashing 104
PROBLEM: WHAT IF OFFSET IS A MULTIPLE OF length?
EXAMPLE: length = 11key index offset15
4
119
8
116
5
158
3
527
5
235
2
347
3
4246
4
22 // BUT 15 IS AT INDEX 4 // FOR KEY 246, NEW INDEX = (4 + 22) % 11 = 4. OOPS!
0 47
1
2 35
3 58
4 15
5 16
6
7 27
8 19
9
10
![Page 105: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/105.jpg)
Hashing 105
SOLUTION:
if (offset % length == 0)
offset = 1;
ON AVERAGE, offset % length WILL
EQUAL 0 ONLY ONCE IN EVERY
length TIMES.
![Page 106: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/106.jpg)
Hashing 106
FINAL PROBLEM: WHAT IF length HAS SEVERAL FACTORS?EXAMPLE: length = 20key index offset20 0 125 5 130 10 135 15 1110 10 5 // BUT 30 IS AT INDEX 10
FOR KEY 110, NEW INDEX = (10 + 5) % 20 = 15, WHICH IS OCCUPIED, SO NEW INDEX = (15 + 5) % 20, WHICH IS OCCUPIED, SO NEW INDEX = ...
![Page 107: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/107.jpg)
Hashing 107
SOLUTION: MAKE length A PRIME.
![Page 108: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/108.jpg)
Hashing 108
Consider a hash table storing integer keys that handles collision with double hashing
N13 h(k) k mod 13 d(k) 7 k mod 7
Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order
Example of Double Hashing
0 1 2 3 4 5 6 7 8 9 10 11 12
31 41 183259732244 0 1 2 3 4 5 6 7 8 9 10 11 12
k h (k ) d (k ) Probes18 5 3 541 2 1 222 9 6 944 5 5 5 1059 7 4 732 6 3 631 5 4 5 9 073 8 4 8
![Page 109: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/109.jpg)
Hashing 109
THIS VERSION OF OPEN-ADDRESS
HASHING IS FAST. IF THE UNIFORM
HASHING ASSUMPTION HOLDS,
averageTime(n, m) FOR SEARCHING,
INSERTING AND REMOVING IS
CONSTANT O(1).
![Page 110: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/110.jpg)
Hashing 110
ANOTHER SOLUTION: QUADRATIC HASHING, THAT IS, ONCE COLLISION OCCURS AT h, GO TO LOCATION h + 1, THEN IF COLLISION OCCURS THERE GO TO LOCATION h + 4, then h + 9, then h + 16, etc.unsigned long hash_int = hash (key);int index = hash_int % length,offset = i2;
Notice that h stays at the same location. No clustering.
![Page 111: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/111.jpg)
Hashing 111
QUADRATIC REHASHINGEXAMPLE: length = 11
key index offset15
4
19
8
16
5
58
3
27
5
1, final place index = 635
2
30
8
1, final place index = 947
3
4, final place index = 7
0
1
2
3 58
4 15
5 16
6
7
8 19
9
10
![Page 112: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/112.jpg)
Hashing 112
Performance
HOW DOES DOUBLE-HASHING COMPARE WITH CHAINED HASHING?
![Page 113: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/113.jpg)
Hashing 113
Performance of Hash Tables Load factor = # filled cells / table size
Between 0 and 1 Load factor has greatest effect on performance Lower load factor better performance
Reduce collisions in sparsely populated tables Knuth gives expected # probes p for open addressing,
linear probing, load factor L: p = ½(1 + 1/(1-L)) As L approaches 1, this zooms up
For chaining, p = 1 + (L/2) Note: Here L can be greater than 1!
![Page 114: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/114.jpg)
Hashing 114
Performance of Hash Tables (2)
L Number of Probes Linear Probing Chaining
0 1.00 1.00 0.25 1.17 1.13 0.5 1.50 1.25 0.75 2.50 1.38 0.83 3.38 1.43 0.9 5.50 1.45 0.95 10.50 1.48
![Page 115: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/115.jpg)
Hashing 115
Performance of Hash Tables (3) Hash table: Insert: average O(1) Search: average O(1)
Sorted array: Insert: average O(n) Search: average O(log n)
Binary Search Tree: Insert: average O(log n) Search: average O(log n)
But balanced trees can guarantee O(log n)
![Page 116: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/116.jpg)
Hashing 116
We know that hashing becomes inefficient as the table fills up. What to do?
EXPAND!
![Page 117: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/117.jpg)
Hashing 117
WHAT ABOUT THE SIZE OF buckets,
AND SHOULD THAT ARRAY EVER BE
RE-SIZED? RE-SIZE WHENEVER THE LOAD FACTOR, THE RATIO OF count TO length, EXCEEDS 0.75.
![Page 118: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/118.jpg)
Hashing 118
TO RE-SIZE, WE WILL DOUBLE THE
OLD CAPACITY, PLUS 1. WHY +1? ANOTHER OPTION…FIND NEXT PRIME NUMBER AFTER DOUBLING. NOTE THAT WE RE-SIZE WHENEVER
THE LOAD FACTOR, THAT IS, THE
AVERAGE LIST SIZE, EXCEEDS 0.75.
![Page 119: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/119.jpg)
Hashing 119
IN check_for_expansion, IF count >=
length * 0.75, CREATE A NEW ARRAY
OF DOUBLE THE OLD LENGTH (PLUS
1). FOR EACH VALUE IN THE OLD
ARRAY, ITERATE THROUGH
AND HASH EACH VALUE TO
THE NEW ARRAY. FINALLY, ERASE
THE OLD ARRAY.
![Page 120: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/120.jpg)
Hashing 120
GROUP EXERCISE: ASSUME THAT length = 13. INSERT THE FOLLOWING KEYS INTO A HASH TABLE USING 1) OPEN ADDRESS, 2) DOUBLE HASHING, and 3) CHAINING 20, 33, 49, 22, 26, 140, 38, 9, 7, 3, 0, 1
![Page 121: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/121.jpg)
Hashing 121121
Summary Slide 1§- Hash Table - simulates the fastest searching technique, knowing
the index of the required value in a vector and array and apply the index to access the value, by applying a hash function that converts the data to an integer
- After obtaining an index by dividing the value from the hash function by the table size and taking the remainder, access the table. Normally, the
number of elements in the table is much smaller than the number of distinct data values, so collisions occur.
- To handle collisions, we must place a value that collides with an existing table element into the
table in such a way that we can efficiently access it later.
![Page 122: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/122.jpg)
Hashing 122122
Summary Slide 2
§- Hash Table (Cont…) - average running time for a search of a hash table is
O(1)
- the worst case is O(n)
![Page 123: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/123.jpg)
Hashing 123123
Summary Slide 3
§- Collision Resolution - Types:
1) linear open probe addressing
- the table is a vector or array of static size
- After using the hash function to compute a table index, look up the entry in the table.
- If the values match, perform an update if necessary.
- If the table entry is empty, insert the value in the table.
![Page 124: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/124.jpg)
Hashing 124124
Summary Slide 4
§- Collision Resolution (Cont…) - Types:
1) linear open probe addressing
- Otherwise, probe forward circularly, looking for a match or an empty table slot.
- If the probe returns to the original starting point, the table is full.
- you can search table items that hashed to different table locations.
- Deleting an item difficult.
![Page 125: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/125.jpg)
Hashing 125125
Summary Slide 5§- Collision Resolution (Cont…)
2) chaining with separate lists.
- the hash table is a vector of list objects
- Each list is a sequence of colliding items.
- After applying the hash function to compute the table index, search the list for the data value.
- If it is found, update its value; otherwise, insert the value at the back of the list.
- you search only items that collided at the same table location
![Page 126: Hash Tables Briana B. Morrison Adapted from William Collins](https://reader034.vdocuments.us/reader034/viewer/2022051819/5519b19855034660578b45f9/html5/thumbnails/126.jpg)
Hashing 126126
Summary Slide 6
§- Collision Resolution (Cont…)- there is no limitation on the number of values
in the table, and deleting an item from the table involves only erasing it from its
corresponding list