hashing chapters 19-20. 2 what is hashing? a technique that determines an index or location for...

43
Hashing Chapters 19-20

Upload: annabella-armstrong

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

Hashing

Chapters 19-20

Page 2: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

2

What is Hashing?

A technique that determines an index or location for storage of an item in a data structureThe hash function receives the search key• Returns the index of an element in an array

called the hash table• The index is known as the hash index

A perfect hash function maps each search key into a different integer suitable as an index to the hash table

Page 3: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

3

What is Hashing?

A hash function indexes its hash table.

Page 4: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

4

What is Hashing?

Two steps of the hash function• Convert the search key into an integer called

the hash code• Compress the hash code into the range of

indices for the hash table

Typical hash functions are not perfect• They can allow more than one search key to

map into a single index• This is known as a collision

Page 5: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

5

What is Hashing?

A collision caused by the hash function h

(for a table of size 101)

h(555–1163)

Page 6: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

6

Hash Functions

General characteristics of a good hash function• Minimize collisions• Distribute entries uniformly throughout

the hash table• Be fast to compute

Page 7: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

7

Computing Hash Codes

We will override the hashCode method of Object

Guidelines• If a class overrides the method equals, it

should override hashCode• If the method equals considers two objects

equal, hashCode must return the same value for both objects

• If an object invokes hashCode more than once during execution of program on the same data, it must return the same hash code

Page 8: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

8

Computing Hash CodesHash code for a primitive type

• Use the primitive typed key itself• Can cast types byte, short, or char to int

• Manipulate internal binary representations (e.g. use folding)

• e.g. for long, casting would lose 1st 32 bits• but, could divide into two 32-bit halves (by shifting),

then add or XOR (^)the results• e.g. int hashCode = (int)(key^(key>>32))

• For a search key of type double:long bits = Double.doubleToLongBits(key);int hashCode = (int)(bits^(bits>>32))

Page 9: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

9

Computing Hash Codes

The hash code for a string, sint hash = 0;int n = s.length();for (int i = 0; i < n; i++)

hash = g * hash + s.charAt(i); // g is a positive constant

For a string s with n characters having Unicode value ui for the ith character (e.g., u0 u1 u2 … un-1) and positive constant g (e.g., 31 in Java’s String class), the hash code could be:

u0gn-1 + u1gn-2 + … + un-2g + un-1

or(…((u0g+u1)g+u2)g+…+un-2)g+un-1 (Horner’s method)

Note: hash could be negative due to overflow

e.g., public int hashCode() in Java class String

Page 10: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

10

Page 11: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

11

Page 12: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

12

Compressing a Hash CodeMust compress the hash code “c” so it fits into the index range

Typical method is to compute c modulo n• n is a prime number (the size of the table)• Index will then be between 0 and n – 1

private int getHashIndex(Object key) {int hashIndex = key.hashCode() % hashTable.length;if (hashIndex < 0)

hashIndex = hashIndex + hashTable.length;return hashIndex;

} // end getHashIndex

Note: if c is non-negative, 0 <= c%n <= n-1if c is negative, -(n-1) <= c%n <= -1(if c is negative, add n to c%n so 1 <= result <= n-1 )

Page 13: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

13

Resolving Collisions

Options when hash functions returns location already used in the table• Use another location in the table

(“open addressing”)

• Change the structure of the hash table so that each array location can represent multiple values

Page 14: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

14

Open Addressing with Linear Probing

Open addressing scheme locates alternate location• New location must be open, available

Linear probing• If collision occurs at hashTable[k], look

successively at location k + 1, k + 2, …

Page 15: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

15

Open Addressing with Linear Probing

The effect of linear probing after adding four entries whose search keys hash to the same index.

h(555–1163)

Page 16: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

16

Open Addressing with Linear Probing

A revision of the hash table in the previous figure when linear probing resolves collisions; each entry contains a

search key and its associated value

h(555–1163)

555–1163

Page 17: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

17

Removals

A hash table if remove used null to remove entries.

555–1163

h(555–1163)

Page 18: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

18

Removals

We need to distinguish among three kinds of locations in the hash table

1. Occupied• The location references an entry in the dictionary

2. Empty• The location contains null and always did

3. Available• The location's entry was removed from the

dictionary

Page 19: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

19

Open Addressing with Linear Probing

A linear probe sequence (a) after adding an entry; (b) after removing two entries;

Page 20: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

20

Open Addressing with Linear Probing

A linear probe sequence (c) after a search; (d) during the search while adding an entry; (e) after an addition to a

formerly occupied location.

Page 21: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

21

Searches that Dictionary Operations Require

To retrieve an entry• Search the probe sequence for the key• Examine entries that are present, ignore locations in

available state• Stop search when key is found or null reached

To remove an entry• Search the probe sequence same as for retrieval• If key is found, mark location as available

To add an entry• Search probe sequence same as for retrieval• Note first available slot• Use available slot if the key is not found

Page 22: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

22

Open Addressing, Quadratic Probing

Change the probe sequence• Given search key k• Probe to k + 1, k + 22, k + 32, … k + n2

Can reach any location in the hash table if table size is a prime number and if hash table is at most half full

For avoiding primary clustering• But can lead to secondary clustering

Page 23: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

23

Open Addressing, Quadratic Probing

A probe sequence of length 5 using quadratic probing.

Note: for hash index k and table size n, we can improve efficiency by using the recurrence relation

ki+1 = (ki + 2i + 1) modulo n

for i>=0 and k0=k.

Page 24: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

24

Open Addressing with Double Hashing

Resolves collision by examining locations• At original hash index • Plus an increment determined by 2nd function

Second hash function• Different from first• Depends on search key• Returns nonzero value

Reaches every location in hash table if table size is prime

Avoids both primary and secondary clustering

Page 25: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

25

Open Addressing with Double Hashing

The first three locations in a probe sequence generated by double hashing for the search key. Note: sum of the two hash functions must be computed modulo table size.

e.g. h1(key) = key modulo 7 and h2(key) = 5 – key modulo 5

Page 26: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

26

Separate Chaining

Alter the structure of the hash tableEach location can represent multiple values• Each location called a bucket

Bucket can be a(n) • List• Sorted list• Chain of linked nodes• Array• Vector

Page 27: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

27

Separate Chaining

A hash table for use with separate chaining; each bucket is a chain of linked nodes.

Page 28: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

28

Separate Chaining

Where new entry is inserted into linked bucket when integer search keys are (a) duplicate and unsorted;

Page 29: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

29

Separate Chaining

Where new entry is inserted into linked bucket when integer search keys are (b) distinct and unsorted;

Page 30: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

30

Separate Chaining

Where new entry is inserted into linked bucket when integer search keys are (c) distinct and sorted

Page 31: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

31

//Algorithm add(key, value)index = getHashIndex(key)if (hashTable[index] = = null) {

hashTable[index] = new Node(key, value)currentSize++

}else {

Search chain that begins at hashTable[index] for a node that contains keyif (key is found) { // assume currentNode references the node that contains key

oldValue = currentNode.getValue()currentNode.setValue(value)return oldValue

}else { // add new node to end of chain

// assume nodeBefore references the last nodenewNode = new Node(key, value)nodeBefore.setNextNode(newNode)currentSize++

}}

Pseudo-code for Chaining Algorithms(for distinct search keys and unsorted chains)

Page 32: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

32

//Algorithm remove(key)index = getHashIndex(key)Search chain that begins at hashTable[index] for node that contains keyif (key is found) {

Remove node that contains key from chaincurrentSize--return value in removed node

}else

return null

//Algorithm getValue(key)index = getHashIndex(key)Search chain that begins at hashTable[index] for node that contains keyif (key is found)

return value in found nodeelse

return null

Pseudo-code for Chaining Algorithms (continued)

Page 33: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

33

Efficiency Observations

Successful retrieval or removal • Same efficiency as successful search

Unsuccessful retrieval or removal• Same efficiency as unsuccessful search

Successful addition• Same efficiency as unsuccessful search

Unsuccessful addition• Same efficiency as successful search

Page 34: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

34

Load FactorPerfect hash function not always possible or practical• Thus, collisions likely to occur

As hash table fills• Collisions occur more often

Measure for table fullness, the load factor*

Note: max value for load factor depends on type of collision resolution used; for separate chaining, there is no maximum value…

*for open addressing

Page 35: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

35

Cost of Open Addressing

The average number of comparisons required by a search of the hash table for given values of the load factor when using linear probing.

½[1 + 1/(1-λ)2] for an unsuccessful search

½[1 + 1/(1-λ)] for a successful search

(Linear Probing)

Page 36: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

36

Cost of Open Addressing

The average number of comparisons required by a search of the hash table for given values of the load factor when using either quadratic probing or double hashing.

Note: for quadratic probing or double hashing, should

have < 0.5

Note: for quadratic probing or double hashing, should

have < 0.5

1/(1-λ) for an unsuccessful search

(1/λ)log[1/(1-λ)] for a successful search

(Quadratic Probing or Double Hashing)

Page 37: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

37

Cost of Separate Chaining

Average number of comparisons required by search of hash table for given values of load factor when using separate chaining. Note that the load factor here is the # of dictionary entries / # of chains (i.e., the load factor is the average # of dictionary entries per chain).

Note: Reasonable efficiency requires

only < 1

Note: Reasonable efficiency requires

only < 1

Page 38: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

38

Rehashing

When load factor becomes too large• Expand the hash table

Double present size, increase result to next prime number

Place current entries into new hash table in locations (i.e., recompute the index for each entry for the new table size)

Page 39: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

39

Comparing Schemes for Collision Resolution

Average number of comparisons required

by search of hash table versus for 4

techniques when search is

(a) successful; (b) unsuccessful.

Page 40: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

40

A Dictionary Implementation That Uses Hashing

A hash table and one of its entry objects

Page 41: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

41

Beginning of private class TableEntry• Made internal to dictionary class

A Dictionary Implementation That Uses Hashing

private class TableEntry implements java.io.Serializable{ private Object entryKey;

private Object entryValue;private boolean inTable; // true if entry is in hash tableprivate TableEntry(Object key, Object value){ entryKey = key;

entryValue = value;inTable = true;

} // end constructor

. . .

Page 42: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

42

A Dictionary Implementation That Uses Hashing

A hash table containing dictionary entries, removed entries, and null values.

Page 43: Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function

43

Java Class Library: The Class HashMap

Assumes search-key objects belong to a class that overrides methods hashCode and equals

Hash table is collection of buckets

Constructors• public HashMap()• public HashMap (int initialSize)• public HashMap (int initialSize,

float maxLoadFactor)• public HashMap (Map table)