hashing hashing is another method for sorting and searching data. –hashing makes it easier to add...

Post on 05-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hashing

• Hashing is another method for sorting and searching data.– Hashing makes it easier to add and

remove elements from a data structure.– The worst-case behavior for locating a key

is linear – (n).

– Java’s standard hash table class is: java.util.Hashtable

Hashing

• Hashing usually implements a data structure called a hash table.– A hash table is an effective data structure.– A hash table is a generalization of an

array.– A hash table requires a key to access

data.

Hashing

– A hash table uses an array whose length is proportional to the number of keys actually stored.

– The array index is computed from the key, rather than using the key to access the array.

• The key is a unique identifying value.

Hashing Functions

• Hashing requires the use of a hashing function.– The purpose of the hashing function is to

compute the storage slot from the key.• Maps key values to array indices.

– This calculation reduces the range of array indices that need to be handled.

Hashing Functions– If a hashing function groups key values

together, this is called clustering of the keys.• A good hashing function distributes the key values

uniformly through the array’s index range.• Any hashing function that results in clustering should

be changed.• A good hashing function has an equal likelihood of

hashing a key into any of the slots.• The java.util.Hashtable contains the method hashCode

Hashing Functions

• The division hash function depends upon the remainder of division.– Math.abs(H(k)) % table.length

– When using the division hash function, it is best to have a table size that is a prime number of the form 4n + 3.

– Using the division hash function can result in many collisions.

Hashing Functions

• The mid-square hash function converts the key to an integer, then doubles the key. The function returns the middle digits of the results.

• The multiplicative hash function converts the key to an integer and multiplies it by a constant less than one. The function returns the first few digits of the fractional part of the result.

Example

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

Table

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2)

Collisions

• A collision occurs when the hashing function calculates the same array index for two different objects and one is already stored into the array index location.– Two keys hash to the same slot.

Collision Example

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2) = H(k5)

Table

Open Addressing

• Open addressing ensures that all elements are stored directly into the hash table.– Every table slot contains either data or null.– The problem is that the table can fill up.– The good thing is that there are no external

storage locations for the table elements.

Open Addressing

– Open addressing attempts to resolve collisions using various methods.

Linear Probing

• Linear Probing resolves collisions by placing the data into the next open slot in the table.

• If this slot is open, the data is stored in the slot. • If this slot is not open, the algorithm looks at the

next slot (index) until an open slot is found.

Linear Probing

– It is difficult to delete items from a hash table that uses open addressing.

• Can not simply put null into the slot because may miss information. Instead place Deleted into the empty slot.

– If H’(k) is the ordinary hash function, the linear probing hash function is:

• H(k, i) = (H’(k) + 1) % m where i = 0, 1, 2, … , m and m is the number of elements that can be stored into the table.

Linear Probing

– A problem associated with Linear Probing is called, primary clustering.

• Primary clustering occurs when many items hash into the same slot and long runs of slots are filled up.

• This results in increased search times.

Linear Probing

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2) = H(k5)

Table

H(k5)

Double Hashing

• Double hashing is one of the best methods for dealing with collisions. – The slot location is calculated based upon

the hash function (H1(k)). If the slot is full, then a second hash function is calculated and combined with the first hash function (H(k, i)) to determine a new slot.

Double Hashing

– Assume that:• H1(k) = Math.abs(H(k)) % table.length

• H2(k) = 1 + Math.abs(H(k)) % (table.length – x) where x is a small value; 1, 2, or 3.

– Then:• H(k, i) = (H1(k) + i H2(k) ) % m

Double Hashing

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2) = H(k5)

Table

H(k5)

External Chaining

• In external chaining the hash table contains an array in which each component can hold more than one element of the hash table.– Essentially, a multiple dimension array or a

linked list of elements can exist for each table slot.

• The typical implementation is that each slot contains a linked list.

External Chaining

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2)

Table

H(k5)

Load Factor

• The load factor is a fraction that represents the number of elements stored in the table divided by the size of the table’s array.– = the number of elements stored in the table

the size of the table’s array

Load Factor

– If open addressing is used, then each table slot holds at most one element, therefore, the load factor can never be greater than 1.

– If external chaining is used, then each table slot can hold many elements, therefore, the load factor may be greater than 1.

Hashing Analysis

• The worst case analysis for hashing is the case where every key is hashed into the same slot.– (n) – linear time.

• The average time can be much faster.

Average Search Analysis• Searching with Linear probing.

– For a table that is not near full:• ½ ( 1 + 1 / (1 – a) )

– For a table that is full or near full:• Math.Sqrt( n ( / 8) )

• Searching with double hashing.– (-ln (1 – ) ) / where ‘l’ in ‘ln’ is ‘L’

• Searching with chained hashing.– 1 + ( / 2 )

• See Figure 11.6 in Main. Page 561

Coding Example

• Search Times program that demonstrates Linear, Binary, and Hashing. – The hashing uses the HashTable class.

Hashing

• Java provides the HashTable class, but it also provides two other classes.– The HashMap class implements a hash

table using a map data structure.– The HashSet class implements a hash table

using sets.

top related