cs261 data structures
Post on 15-Jan-2016
19 Views
Preview:
DESCRIPTION
TRANSCRIPT
CS261 Data Structures
Hash TablesOpen Address Hashing
Goals
• Open Address Hashing
Hash Tables: Resolving Collisions
There are two general approaches to resolving collisions:
1. Open address hashing: if a spot is full, probe for next empty spot
2. Chaining (or buckets): keep a collection at each table entry
Open Address Hashing
• All values are stored in an array• Hash value is used to find initial index • If that position is filled, the next position is
examined, then the next, and so on until you find the element OR an empty position is found
• The process of looking for an empty position is termed probing, specifically linear probing when we look to the next element
Open Address Hashing: Example
Eight element table using Amy’s hash function(alphabet position of the 3rd letter of the name -1):
Already added: Amina, Andy, Alessia, Alfred, and Aspen
Note: We’ve shown where each letter of the alphabet maps to for simplicity here (given a table size of 8) …so you don’t have to calculate it!e.g. Y is the 25th letter (we use 0 index, so the integer value is 24) and 24 mod 8 is 0
Amina
0-aiqy
1-bjrz 2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv 6-gow
Aspen
7-hpx
Open Address Hashing: Adding
Now we need to add: Aimee
The hashed index position (4) is filled by Alessia: so we probe to find next free location
Amina
0-aiqy
1-bjrz 2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv
Aimee
6-gow
Aspen
7-hpx
Add: AimeeHashes to
Placed here
Open Address Hashing: Adding (cont.)
Suppose Anne wants to join:
The hashed index position (5) is filled by Alfred:– Probe to find next free location what happens when
we reach the end of the array
Amina
0-aiqy
1-bjrz 2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv
Aimee
6-gow
Aspen
7-hpx
Add: AnneHashes to
???
Open Address Hashing: Adding (cont.)
Suppose Anne wants to join:
The hashed index position (5) is filled by Alfred:– Probe to find next free location– When we get to end of array, wrap around to the beginning– Eventually, find position at index 1 open
Amina
0-aiqy
Anne
1-bjrz 2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv
Aimee
6-gow
Aspen
7-hpx
Add: AnneHashes to
Placed here
Open Address Hashing: Adding (cont.)
Finally, Alan wants to join:
The hashed index position (0) is filled by Amina:– Probing finds last free position (index 2)– Collection is now completely filled– What should we do if someone else wants to join? (More
on this later)
Amina
0-aiqy
Anne
1-bjrz
Alan
2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv
Aimee
6-gow
Aspen
7-hpx
Add: AlanHashes to
Placed here
Open Address Hashing: Contains
• Hash to find initial index• probe forward until
– value is found, or (return 1)– empty location is found (return 0)
• Notice that search time is not uniform
Amina
0-aiqy
1-bjrz 2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv
Aimee
6-gow
Aspen
7-hpx
Open Address Hashing: Remove
• Remove is tricky• What happens if we delete Anne, then search
for Alan?
Amina
0-aiqy
Anne
1-bjrz
Alan
2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv
Aimee
6-gow
Aspen
7-hpx
Amina
0-aiqy 1-bjrz
Alan
2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv
Aimee
6-gow
Aspen
7-hpx
Remove: Anne
Find: AlanHashes to
Probing finds null entry Alan not found
Open Address Hashing: Handling Remove
• Simple solution: Don’t allow removal (e.g. words don’t get removed from a spell checker!)
• Better solution: replace removed item with “tombstone”– Special value that marks deleted entry– Can be replaced when adding new entry– But doesn’t halt search during contains or remove
Amina
0-aiqy
_TS_
1-bjrz
Alan
2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv
Aimee
6-gow
Aspen
7-hpx
Find: AlanHashes to
Probing skips tombstone Alan found
Hash Table Size: Load Factor
Load factor:
= n / m
–represents the portion of the tables that is filled–For open address hashing, load factor is between 0
and 1 (often somewhere between 0.5 and 0.75)
Want the load factor to remain small in order to avoid collisions - space/speed tradeoff again!
Load factor
# of elements
Size of table
Hash Tables: Algorithmic Complexity
• Assumptions:–Time to compute hash function is constant–Worst case analysis All values hash to same position–Best case analysis Hash function uniformly
distributes the values and there are no collisions
• Find element operation:–Worst case for open addressing O( )
–Best case for open addressing O( )n
1
Hash Tables: Average Case
• What about average case for successful, S, and unsuccessful searches , U?
• If is constant, average case is O(1), but want to keep small
S U
0.5 1.5 2.5
0.75 2.5 8.5
0.9 5.5 50.5
Clustering
• Assuming uniform distribution of hash values what’s the probability that the next value will end up in index 6? in index 2? in index 1?
• As load factor gets larger, the tendency to cluster increases, resulting in longer search times upon collision
Amina
0-aiqy
1-bjrz 2-cks
Andy
3-dlt
Alessia
4-emu
Alfred
5-fnv 6-gow
Aspen
7-hpx
4/81/83/8
Double Hashing
• Rather than use a linear probe (ie. looking at successive locations)…– Use a second hash function to determine the
probe step• Helps to reduce clustering
Large Load Factor: What to do?
• Common solution: When load factor becomes too large (say, bigger than 0.75) Reorganize
• Create new table with twice the number of positions
• Copy each element, rehashing using the new table size, placing elements in new table
• Delete the old table
Hashing in Practice
• Need to find good hash function uniformly distributes keys to all indices
• Open address hashing:– Need to tell if a position is empty or not– One solution store only pointers & check for null
(== 0)
Your Turn
• Complete Worksheet #37: Open Address Hashing
top related