sd c 15 - reezeki2011 · hash function • hashing function should have the following features: –...

STRUKTUR DATA

By : Sri Rezeki Candra Nursari

2 SKS

Literatur

• Sjukani Moh., (2007), “Struktur Data (Algoritma & Struktur Data 2) dengan C, C++”, Mitra Wacana Media

• Utami Ema. dkk, (2007),”Struktur Data (Konsep & Implementasinya Dalam Bahasa C & Free Pascal di GNU/Linux)”, Graha Ilmu

• Hubbard Jhon, R., Ph.D, (2000), “Schaum’s Outline Of Theory and Problems of Data Structures With C++” McGraw-Hill

• Bambangworawan Paulus., (2004), “Struktur Data Dengan C”, Andi Yogyakarta

Materi1. Data dan Struktur Data2. Array3. Struktur dan Record4. Pointer5. Linked List6. Stack (Tumpukan)7. Queue (Antrian)8. Tree (Pohon)9. AVL Tree10. Heap dan B-Tree11. Sorting12. Search13. Hashing14. Graph

HASH

Pertemuan 15

2 SKS

Outline

• Hashing– Definition– Hash function– Collision resolution

• Open hashing– Separate chaining

• Closed hashing (Open addressing)– Linear probing– Quadratic probing– Double hashing

• Primary Clustering, Secondary Clustering– Access: insert, find, delete

Hash Tables• Hashing is used for storing relatively large

amounts of data in a table called a hash tableADT.

• Hash table is usually fixed as H-size, which is larger than the amount of data that we want to store.

• We define the load factor () to be the ratio of data to the size of the hash table.

• Hash function maps an item into an index in range.

0123

H-1

key hash function

itemhash table

Hash Tables (2)

• Hashing is a technique used to perform insertions, deletions, and finds in constant average time.

• To insert or find a certain data, we assign a key to the elements and use a function to determine the location of the element within the table called hash function.

• Hash tables are arrays of cells with fixed size containing data or keys corresponding to data.

• For each key, we use the hashing function to map key into some number in the range 0 to H-size-1 using hashing function.

Hash Function

• Hashing function should have the following features:– Easy to compute.– Two distinct key map to two different cells in array (Not

true in general) - why?.– This can be achieved by using direct-address table where

universal set of keys is reasonably small.– Distributes the keys evenly among cells.

• One simple hashing function is to use mod function with a prime number.

• Any manipulation of digits, with least complexity and good distribution can be used.

Hash Function: Truncation

• Part of the key is simply ignored, with the remainder truncated or concatenated to form the index.Phone no: index731-3018 338539-2309 329428-1397 217

Hash Function: Folding

• The data can be split up into smaller chunks which are then folded together in some form.Phone no: 3-group index7313018 73+13+018 1045392309 53+92+309 4544281397 42+81+397 520

Hash Function: Modular arithmetic

• Convert the data into an integer, divide by the size of the hash table, and take the remainder as the index.3-group index731+3018 3749 % 100 = 49539+2309 2848 % 100 = 48428+1397 1825 % 100 = 25

Choosing a hash function

• A good has function should satisfy two criteria:1. It should be quick to compute2. It should minimize the number of collisions

Example of hash function

• Hash function for string– X = 128– A3 X3 + A2 X2 + A1 X1 + A0 X0

– (((A3 X) + A2) X + A1) X + A0

• The result of hash function is much larger than the size of table, so we should modulo the result with the size of hash table.

Example of hash functionint hash(String key, int tableSize)

{

int hashVal = 0;

for (int i=0; i < key.length(); i++)

hashVal = (hashVal * 128 + key.charAt(i)) % tableSize;

return hashVal % tableSize;

}

• Modulo– (A + B) % C = (A % C + B % C) % C– (A * B) % C = (A % C * B % C) % C

Example of hash functionint hash(String key, int tableSize){

int hashVal = 0;for (int i=0; i < key.length();

i++)hashVal = (hashVal*37+ key.charAt(i));

hashVal %= tableSize;if (hashVal < 0)

hashVal += tableSize;

return hashVal;}

Example of hash function

int hash(String key, int tableSize){

int hashVal = 0;for (int i=0; i < key.length();

i++)hashVal += key.charAt(i)

return hashVal % tableSize;}

Collision resolution

• When two keys map into the same cell, we get a collision.

• We may have collision in insertion, and need to set a procedure (collision resolution) to resolve it.

Closed Hashing• If collision, try to find alternative cells within table.• Closed hashing also known as open addressing.• For insertion, we try cells in sequence by using

incremented function like:– hi(x) = (hash(x) + f(i)) mod H-size f(0) = 0

• Function f is used as collision resolution strategy.• The table is bigger than the number of data.• Different method to choose function f :

– Linear probing– Quadratic probing– Double hashing

Linear probing

• Use a linear function f(i) = i• Find the first position in the table for the key,

which is close to the actual position.• Least complex function.• May result in primary clustering.

– Elements that hash to the different location probe the same alternative cells

• The complexity of this probing is dependent on the value of (load factor).

• We do not use this probing if > 0.5.

Hashing - insert

dawnemerald

...

0123456789

101112131415

crystal

marigold

alpha

flamingo

hallmark

moon

...

0123456789

101112131415

cobalt?

marigold?

private?

alpha

crystaldawn

emeraldflamingo

hallmark

moonmarigold

private

Hashing - lookup

Hashing - delete

• lazy deletion - why?

...

0123456789

101112131415

delete emerald

delete moon

alpha

crystaldawn

flamingo

hallmark

marigold

private

Hashing - operation after delete

...

0123456789

101112131415

custom (insert)

marigold?

alpha

crystaldawn

flamingo

hallmark

marigold

private

...

canaryalpha

crystaldawn

customflamingo

hallmark

marigold

private

cobalt

...

canaryalpha

crystaldawn

customflamingo

hallmark

marigold

private

dark

Primary Clustering

• Elements that hash to the different location probe the same alternative cells

Quadratic probing• Eliminate the primary clustering by selecting f(i) = i2• There is more problem with a hash table that is more

than half full.• You have to select appropriate table size that is not

square of a number.• We can prove that quadratic probing with table size

prime number and at least half empty will always find a location for an element.

• Can use increment to collision by noting that quadratic function f(i) = i2 = f(i-1) + 2 i - 1.

• Elements that hash to the same location will probe the same alternative cells (secondary clustering).

Double hashing

• Collision resolution function is another hash function like f(i) = i * hash2 (x)

• Each time a factor of hash2 (x) is added to probe.

• Have to be careful for the choice of second hash function to ensure that it does not come to zero and it probes all the cells.

• It is essential to have a prime size hash table.

...

canaryalpha

crystaldawn

customflamingo

hallmark

marigold

private

cobalt

...

done

alpha

crystaldawn

customflamingo

hallmark

marigold

private

dark

Double Hashing

Open Hashing• Collision problems is solved by inserting all elements that

hash to the same bucket into a single collection of values.• Open Hashing:

– To keep a linked list of all the elements that are hashed to the same cell (separate chaining).

– Each cell in the hash table contains a pointer to a linked list containing the data.

• Functions and Analysis of Open Hashing:– Inserting a new element in to the table: We add the element at

the beginning or the end of the appropriate linked list.– Depending if you want to check for duplicates or not.– Also depends on how frequent you expect to access the most

recently added elements.

0

1

2

4

3

5

Open Hashing

Open Hashing• For search, we use the hash function to determine

which linked list holds the element, and then traverse the linked list to find the element.

• Deletion is done to the element in the appropriate linked list after we find the element to be deleted.

• We could use other kinds of lists like a tree or another hash table for each cell in the hash table to resolve collision.

• The main advantage of this method is the fact that it can handle any amount of data (dynamic expansion).

• The main disadvantage of this method is the memory usage for each cell.

Analysis of Open Hash• In general the average length of a list is the load factor .• Complexity of insertion depends on hashing function and

where insertion is done but in general has the same complexity of insertion to the linked list + time to evaluate the hashing function used.

• For search, time complexity is the constant time to evaluate the hashing function + traversing the list.

• Worst case O(n) for search.• Average case depends .• General rule for open hashing is to make 1.• Used for dynamic size data.

Issues

• Other issues common to all closed hashing resolutions:– Confusing after deletion.– Simpler than open hashing function– Good if we do not expect too many collisions.– If search is unsuccessful, we may have to search

the whole table.– Use of large table compare to number of data

expected.

Summary

• Hash tables: array• Hash function: function that maps key into

number [0 size of hash table)• Collision resolution

– Open hashing• Separate chaining

– Closed hashing (Open addressing)• Linear probing• Quadratic probing• Double hashing

– Primary Clustering, Secondary Clustering

Summary

• Advantage– Running time

• O(1) + O(Collision resolution)

• Disadvantage– Difficult (not efficient) to print all elements in hash

table– Inefficient to find minimum element or maximum

element– Not growable (for closed hash/open addressing)– Waste some space (load factor)

sd c 15 - reezeki2011 · hash function • hashing function should have the following features: –...

Documents