course: programming ii - abstract data types hash tablesslide number 1 the adt hash table what is a...

Course: Programming II - Abstract Data Types

Hash Tables Slide Number 1

The ADT Hash TableWhat is a table?

A collection of items, each including several pieces of information. One of these pieces of information is a search key.

A table is another example of ADT whose insertion, deletion and retrieval of items is made by value and not by position.

items may be in order with respect to the search key. items may or may not have the same search key

“City” is the search key

City Contry Population

Cairo

London

Paris

Egypt

England

France

9500000

9400000

2200000

Rome Italy 2800000

Efficient retrieval of items, if based on search key value:e.g. “Retrieve the population of London”.



Access Procedures for Tables

iii. getSize( ) // post: Returns the number of entries in the table.

iv.add(key, newElem)// post: Adds the pair (key, newElem) to the table in its proper order according to // its key.

i. createTable( ) // post: Creates an empty table.

ii. isEmpty( ) // post: Determines whether a table is empty.

v. remove(key)// post: Removes from the table the entry with given key. Returns either the value// post: associated with the given key or null if such entry does not exist.

vi.getValue(key)// post: Retrieves the value that corresponds to a given key from a table. Returns null// post: if no such entry exists in the table.



Introducing the ADT Hash TableBinary Search Tree: a particular type of binary tree that enables easy search of specific items. But:• efficiency of the search method depends on the form of the binary search trees,• in the best case scenario (balanced trees), for trees with 10,000 items, operations

of “retrieve”, “insert”, and “delete” would still require O(log 10,000) = 13 steps.

Is there a more efficient way of storing, searching and retrieving data?

Hashing: the basic idea

.

.

.

0

1

2

3

4

5...

Addresscalculator

Elem (or search key) add(searchKey, newElem){

i = the array index that the address calculator gives us for the searchKey of newElem.

table[i] = newElem}

method “add” is of the order O(1) (it requires constant time).

The add operation



The retrieve operation

getValue(key) and remove(key) are also operations of the order O(1) (require constant time).

The remove operation

getValue(searchKey)// post: Returns the element that has a matching // post: searchKey. If not found, it returns null. i = the array index that the

address calculator gives us, for element whose search key is equal to searchKey

if (table[i].getKey( ) equals searchKey)

return table[i].getValue();else return null;

pseudocoderemove(searchKey)// post: Deletes element that has a matching searchKey, // post: and returns true; otherwise returns false.

i = the array index that the address calculator gives us for the given searchKey

if (table[i].getKey( ) equals searchKey){

delete element from table[i]; return true; } else return false

pseudocode



DefinitionThe ADT Hash Table is an array (i.e. array table) of elements (possibly associated with a search key unique for each element), together with an hash function and access procedures.

The hash function determines the location in the table of a new element, usingits value (or search key if any). In a similar way, permits to locate the position of an existing element.

A hash table can also be empty – no element is stored in the array table.

The access procedures include insertion, deletion, and retrieval of an element by means of the hash function.

The hash function takes a search key and maps it into an integer array index.

A perfect hash function is an ideal function that maps each search key into a unique array index.



Understanding how Hash Functions workExample: the table is a Directory Database with

Less than 10,000,000 people Each person with his/her telephone number, as search key, The telephone number is of type int: e.g., 123 4567

Store the person with number 123 4567 in table[1234567]

10,000,000 memory locations to spare!Numbers are regional. Store the person with number 123 4567 in table[4567]

10,000 memory locations to spare!

Hash function: 1234567 4567

Takes a value of the search key and transforms into an integer used as array index value.

Note: the above is an example of a perfect hash function



A perfect hash function is an hash function that maps each search key into a unique array index. It is possible if we know in advance all the possible search key values.

The collision problem

A collision is when two or more elements with search keys x and y are told by the hash function to be stored in the same array location table[i], where i = h(x) = h(y). The two search keys x and y are said to have collided.

A way for solving collisions is to provide appropriate collision-resolution schemes.

Basic requirements for a “good” hash function:- be easy and fast to compute- place elements evenly throughout the hash

table (i.e. minimizes collisions)

In practice, we don’t know all possible search key values.

An hash function can map two or more search keys into the same integer array index: x y and h(x) = h(y) = i.



Examples of Hash Functions (1)Assume hash functions have integers as search keys.

Selecting digits: given a search key number composed of a certain number of digits the hash function picks digits at specific places in the search key number:

e.g. h(001364825) = 35 (select the forth and the last digit)

Folding: given a search key number, the function defines the index by adding up allthe digits in the search key.

e.g. h(001364825) = 29 (add the digits)

Or by first grouping the digits and then adding them up.e.g. h(001364825) = 001+364+825 = 1190 (group the digits and add them up)

Simple and fast, Generally, does not evenly distribute the elements in the hash table

Note: you can apply more than one hash function to a search key



Examples of Hash Functions (2)

Modulo arithmetic: given a search key number, the function defines the index to be the modulo arithmetic of the search value with some fix number.

e.g. h(001364825) = 001364825 mod tableSize

Converting character string to an integer: given a search key is a string, we couldfirst convert it into an integer, and then apply the hash function. We could think ofusing different ways of converting strings into a number to get better hash functionresults.

We get lots of collision We can more evenly distribute the elements in the table, if tableSize is prime

e.g. h(“NOTE”) = 78 + 79 + 84 + 69, using the ASCII values of the letters. h(“TONE”) = 78 + 79 + 84 + 69.



Collision-resolution schemesTwo main approaches:

1. Assign another location within the hash table to the new collided element.2. Change the structure of the hash table: each table[i] can accommodate more than one element.

1. Open addressing schemes: In case of collision, probe for some other empty location to place the element in. The probe sequence of locations used by the add procedure has to be efficiently reproducible by the delete and retrieve procedures.

Linear probing:

.

.

.

22

23

24

25...

7597

4567

0628

3658

.

.

.i=7597 mod 101=22

i+1i+2

i+3

Table locations have to be defined to be in oneof three states: empty, deleted, occupied;otherwise, after deletion, the retrieve operation might stop prematurely.

Elements tend to cluster together. Parts of the tablemight be too dense and others relatively empty, making the hashing less efficient.



Open addressing schemes (continued)

Double hashing: probe sequence is not sequential, but defined using the given search key.

It uses the hash function “h” to calculate the initial index, and a second function “h' ” to calculate the size of the probing step, using the same search key. The function h' has the following properties:

- h' (key) 0- h' h

Example:

h (key)= key mod 11h' (key) = 7 – (key mod 7)

1410

h(14)collision

h(14) = 3h'(14) = 7, i=3+7

6 91

h(91)

h(91)collision

h(91) = 3h'(91) = 7, i=(3+7 +7)%11

3 58

h(58) = 3...

...

...

0



Restructuring the Hash Table Alter the structure of the hash table so to allow more than one element to be stored at the same location.

The array table is defined so that each location table[i] is itself an array, called bucket. Limitation: how to choose the size of each bucket?

Separate Chaining

The array table is defined as an array of linked lists. Each location table[i] is a reference to a linked list, called the chain, of all the elements that have collided to the same integer i.

Buckets

public class ChainNode{

private keyedElem elem;private ChainNode next;…….

}

public class HashTable{

private final int TABLESIZE=101private ChainNode[ ] table;private int size;…..

}



A Separate Chaining Structure

.

.

.

0

Size-1

1

2

3

Table….….….….

….

Each location of the hash table contains a reference to a linked list



getValue(searchKey){ i = hashIndex(searchKey); node = table[i]; while((node null) && (node.getElem( ).getKey( ) searchKey)) { node = node.getNext( );}if (node != null){ return node.getElem( ); } else return null;}

pseudocodeadd(key, newElem){searchKey = key;

i = hashIndex(searchKey); node = reference to a new node containing newElem; node.setNext(table[i]); table[i] = node; }

pseudocode

Implementing Hash Table with separate chaining

“hashIndex” is a protected procedure of the class HashTable.



Summary

Hashing is the process that calculates where in an array a data element should be, rather then searching for it. It allows efficient retrievals, insertions and deletions.

Hash function should be easy to compute and it should scatter the elements evenly throughout the table.

Collisions occur when two different search keys hash into same array location. Two strategies to resolve collisions, using probing and chaining respectively.



ConclusionWhat is an Abstract Data TypeIntroduce individual ADTsUnderstand the data type

abstractlyDefine the specification of the data typeUse the data type in small applications,

basing solely on its specificationImplement the data typeStatic

approachDynamic approach

Some fundamental algorithms for some ADTs: pre-order, in-order and post-order traversal, heapsort

ListsStacksQueuesTreesHeaps

AVL TreesHash Tables

course: programming ii - abstract data types hash tablesslide number 1 the adt hash table what is a...

Documents

search key value

search key addsearchkey

search key unique

given key

newelem post

value post

null post

search method