308-203a introduction to computing ii lecture 11: hashtables fall session 2000

Post on 18-Jan-2016

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

308-203AIntroduction to Computing II

Lecture 11: Hashtables

Fall Session 2000

Dictionary

An Abstract class which defines data-structureswhich support:

• void put(Object key, Object value)

• Object get(Object key)

• void remove(Object key)

Implementations for Dictionary?

If we use an unsorted linked list:

put O( 1 )get O( n )remove O( n )

Naïve solution: must search all possibilities

Implementations for Dictionary?

If we use a binary tree (assume depth = d):

put O( 1og d )get O( log d )remove O( log d )

Good, unless the tree is unbalanced…

Implementations for Dictionary?

If we use a heap:

put O( log n )get O( n )remove O( n )

Insert is easy, but finding arbitrary elements is hard…

Implementations for Dictionary?

If we use a sorted array:

put O(n )get O( log n )remove O( n )

Binary search is easy, but lots of copying is needed

Implementations for Dictionary?

If we use an array with enough space forevery possible key (not realistic):

put O( 1 )get O( 1 )remove O( 1 )

All operations are quick and easy, but requires enormous(i.e. infinite) memory

Hashtables

We can try to patch this “perfect solution” so thatit is feasible.

The “Perfect” Solution

If we had an array that was infinitely large and eachkey had it’s own slot, every access would be O( 1 )[ and we would waste a lot of space on null pointers]

1 2 3 4 j-1 j j+1 j+ 2

Key = 3 Key = j

… …

Hash Function

Definition: A hash function is a functionwhich maps keys to a finite range of integers,called hashcodes:

f: keys [ 0, (m-1) ]

Example

Let the keys be non-negative integers: { 0, 1, … }

Let the hash function be f(x) = x mod 7

For the keys (4, 15, 26):

f(4) = 4f(15) = 1f(26) = 5

Example

Let the keys be non-negative integers: { 0, 1, … }

Let the hash function be f(x) = x mod 7

For the keys (4, 15, 26):

f(4) = 4f(15) = 1f(26) = 5

4 26

Fits in an array of size 7

150 1 2 3 4 5 6

Collisions

Problem:When two or more keys hash to the same slot,there is a possiblity of collision.

Open-Addressing

• A simple way to handle collisions

• When a collision occurs look for an empty slot elsewhere

• Some elements may end up in the slot corresponding a different hashcode

Linear Probing

Find an alternative slot after collision by steppingsequentially through the slots, for example:

4 2615

0 1 2 3 4 5 6

Insert 18 : f(18) = 18 mod 7 = 4

18 Collision in slot 4!

Linear Probing

Find an alternative slot after collision by steppingsequentially through the slots, for example

4 2615

0 1 2 3 4 5 6

Insert 18 : f(18) = 18 mod 7 = 4

18 Slot 5 is also taken

Linear Probing

Find an alternative slot after collision by steppingsequentially through the slots, for example

4 2615

0 1 2 3 4 5 6

Insert 18 : f(18) = 18 mod 7 = 4

18

Slot 6 is free

Disadvantages

• In open-addressing, the table can fill up; Must have (n < m)

• Linear-probing leads to “primary clustering:” A run of filled slots is more likely to receive more collisions

• Although best-case access is O( 1 ), worst-case access O( m )

Chaining

A (Better) Solution to Collisions:

Use the flexibility of the linked-list, but only whenneeded, i.e. within a single slot where collisionsmay occur.

Example (chaining)

0 1 2 3 4 5 6

15 4 26

Insert 39 into the previous hashtable:

Example (chaining)

0 1 2 3 4 5 6

15 4 26

f(39) = 39 mod 7 = 4 collision

39

Worst-Case

If all elements hash to the same entrywe get a linked list:

Therefore put, get and remove are O(n)worst-case.

Best-Case

0 1 2 3 4 5 6

Equal distribution to each slot

Best Case

Definition: The load factor for a hashtablewith n elements hashed into m slots is theaverage number of elements per slot:

= n / m

Best Case

If every slot contains elements (uniformlydistributed hashing):

put, get and remove are O( )

Best Case

If every slot contains elements (uniformlydistributed hashing):

put, get and remove are O( )

If the number of slots is allowed to growas O( n ) :

= n/m = n /O( n ) = O( 1 )

put, get and remove are O( 1 )

Average-Case

More realistic analysis involves determinationof statistics of the data and how well it will behashed.

Example: hashing olympic years by f(x) = x mod 4would be a bad idea (always hash to the same slot)

Java Hashtable Class

• Constructor:

Hashtable(int initialCapacity, float loadFactor)

• Default: initialCapacity = 101, loadFactor = 0.75f

• Collision resolution with chaining

Java Hashtable Class

• hashcode(): defined in java.lang.Object

• equals(): assumed defined for the entries

Keys can be objects of any class providedthe following is appropriately defined:

Java Hashtable Class

Hashtables grow multiplicatively:

• Put() checks if the hashtable contains more than (m) elements and if so m 2m+1

• Hashtables only grow, never shrink, no matter how many elements you delete

Java Hashtable Class

Other Features:

elements() returns an enumeration of everythingin the table.

This works by keeping references into thetable rather than by copying the table itself.

Any questions?

top related