Download - Grokking Hash Tables
Grokking Hash Tables
A hash table is… Just a data structure that’s used to establish
a mapping between arbitrary objects Not to be confused with Map:
An interface that specifies methods that a program could call to get/set/query key/value mappings
Basically, defines what mappings are A hash table is one way (but not the only)
to make a Map And your MondoHashTable is just
implementation of that hash table data struct that happens to match the Map interface
Example:
/**
* MondoHashTable is a hash table based
* implementation of the Map interface
*/
import java.util.Map;
public class MondoHashTable implements Map {
…
}
So what’s a mapping?
A mapping is just a pair relationship between two things
In math, we write a mapping m: xy to denote that m establishes relationships between things of type x and things of type y
Big restriction: every unique x must map to a single y
Essentially: every left hand side has one and only one right hand side
Examples of mappings
376.0827 (abs(sqrt(37))) 79609 “Prof Lane’s office phone” 123456789 studentRecord(“J. Student”) -6.0827 36.999 studentRecord(“J. Student”) gradeSet() largeFunkyDataObject() otherObject() “nigeria” 114 “cs35” 12 Left side is the key; right side is the value
Small integers are easy…
37 6.0827
double[] mapArray=new double[200];mapArray[37]=Math.sqrt(37);
What about non-integers?
Desire: have a table that gives quick lookup for arbitrary objects
Doesn’t require vast space Answer: introduce an intermediate step
Hash function turns key into hash code Use hash code to look up key/val pair in
table table[h(key)]=<key,value>
The big picture…
“nigeria”
MondoHashTable
get(“nigeria”)h(“nigeria”)
1945462417
nigeria114
Some practical questions
1.9 billion? Isn’t that a little much?
Some practical questions
1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;
Some practical questions
1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
Some practical questions
1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;
Where do the hash functions come from? A: java.lang.Object.hashCode()
Some practical questions
1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;
Where do the hash functions come from? A: java.lang.Object.hashCode()
How big should the table be, initially?
Some practical questions
1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;
Where do the hash functions come from? A: java.lang.Object.hashCode()
How big should the table be, initially? Good choice: pick a prime # Ask the user (arg to constructor)
Some practical questions
1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;
Where do the hash functions come from? A: java.lang.Object.hashCode()
How big should the table be, initially? Good choice: pick a prime # Ask the user (arg to constructor)
What happens if the table gets too full?
Some practical questions
1.9 billion? Isn’t that a little much? A: reduce mod table size int h=hashFunction(Object a) % table.length;
Where do the hash functions come from? A: java.lang.Object.hashCode()
How big should the table be, initially? Good choice: pick a prime # Ask the user (arg to constructor)
What happens if the table gets too full? A: resize it!
#1 killer question: Collisions
What happens if (a.hashCode()%tSz)==(b.hashCode()%tSz)
Depends… If a.equals(b), then these are the same key If not…
This is a hash collision Basically, you have two different keys pointing
at the same location in the hash table Have to resolve this somehow -- find unique
storage for every key and don’t lose anything
Collision strategy 1: Chaining
Make each cell in the hash table a “bucket” containing multiple key/value pairs
h(“nigeria”)
nigeria114
h(“viagra”)
viagra29
Collision strategy 2: Open addressing
Each cell of the table actually holds a key/value pair
When you have a collision, rehash to find a new location for the new pair Linear probing: try next cell in line Quadratic probing: try cell h+1, then
h+4, h+9, h+16, h+15 … Double hashing:
h(k)=(h1(k)+i*h2(k)) mod table.size()
Repeat probes until you find an empty spot
Map.keySet()
A common operation on hash tables (Maps): get all of the keys You’ll probably use this in the “dump”
functionality of SpamBGon Map requires:
Set keySet(): Returns a set view of the keys contained in this map.
What’s a view?
Views of data
Different interface to the same underlying data
Doesn’t copy the data -- just provides new methods for accessing it
A “set view of the keys”, then, is an object that behaves like (i.e., implements) a set, but gives you access to all of the keys of the hash table.
The set view picture
MondoHashTable
keySet()
Set
size()
iterator()
contains()