part ii chapter 8 hashing introduction consider we may perform insertion, searching and deletion on...
TRANSCRIPT
![Page 1: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/1.jpg)
Part II
Chapter 8 Hashing
![Page 2: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/2.jpg)
IntroductionConsider we may perform insertion,
searching and deletion on a dictionary (symbol table).Array
Linked listTree
SortedNot
Sortedunbalanc
edbalanced
Insertion
O(n) / O(1) O(h) O(h) O(logk n)
Searching
O(log n) / O(1)
O(h) O(h) O(logk n)
Deletion
O(n) / O(1) O(h) O(h) O(logk n)Is it possible to perform these operations in O(1) ?
Is it possible to perform these operations in O(1) ?
![Page 3: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/3.jpg)
IntroductionIf we find a mapping from a key to an
index, then we can locate a record quickly according its key and perform random access.
S1S2S3…
012…
![Page 4: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/4.jpg)
IntroductionThis mapping can be illustrated as
follows:
Hashing: define a function h so that h(Key) = i, where h is called a hash function.
Two kindsStatic hashingDynamic hashing
hhKey
i
![Page 5: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/5.jpg)
8.2 Static Hashing
![Page 6: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/6.jpg)
DefinitionIn static hashing, identifiers/keys are
stored in table with a fixed size that is called hash table.
slot1 slot2
Bucket 0Bucket 1Bucket 2
Bucket n
Bucket: Each bucket has its
own address and is capable of holding a key.
hhx h(x)
Hash function
Identifier Bucket address
![Page 7: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/7.jpg)
DefinitionSlot: Each bucket may consists of s
slots to hold synonym (同義字 )i1 and i2 are synonyms if h(i1) = h(i2).
Distinct synonyms enter into the same bucket as long as the bucket has slots available.
![Page 8: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/8.jpg)
ExampleNumber of buckets:Number of slots for each
bucket:Define hashing function f(x)
f(x) = {i | i is the order of the initial of x}.
A and A2 are synonyms.GA and GB are synonyms.If “Doll” enters, it will be
put at buckect _______ (according to the hash function).
A A2
slot1 slot2
Bucket 0Bucket 1Bucket 2
Bucket 25
DBucket 3
GA GB
![Page 9: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/9.jpg)
Overflow and CollisionOverflow occurs when a new identifier is
mapped into a full bucket.Collision occurs when two non-identical
identifiers are hashed into the same bucket.If the number of slot is 1, then overflow and
collision occur simutaneously.
A A2
slot1 slot2
Bucket 0Bucket 1Bucket 2
If A3 enters bucket 0, A3 collides with A and A2. The bucket overflows as well.
![Page 10: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/10.jpg)
8.2.2 Hash FunctionsIdeally, we expect to find a hash
function that is one-to-one and easy to compute.
The hash function f(x) wheref(x) = {i | i is the order of the initial of x}.The hash function can result in a lot of
collisions because it only considers the initial character.
Key points: use every character in the identifier as possible.
![Page 11: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/11.jpg)
Common ApproachesDivisionMid-squareFoldingDigit Analysis
![Page 12: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/12.jpg)
DivisionThe most widely used hash functionThe key k is divided by some
number D, and the remainder is used as the bucket address.h(k) = k % DSince the bucket address is from 0 to b-1 if there are b buckets, D is usually selected as the number of buckets.
![Page 13: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/13.jpg)
Selecting The DivisorWhen the divisor is an even number, odd
integers hash into odd home buckets and even integers into even home buckets.
20%14 = 6, 30%14 = 2, 8%14 = 815%14 = 1, 3%14 = 3, 23%14 = 9
When the divisor is an odd number, odd (even) integers may hash into any home.
20%15 = 5, 30%15 = 0, 8%15 = 815%15 = 0, 3%15 = 3, 23%15 = 8
The bias in the keys does not result in a bias toward either the odd or even home buckets.
Better chance of uniformly distributed home buckets.
So do not use an even divisor.
![Page 14: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/14.jpg)
Selecting The DivisorSimilar biased distribution of home buckets is
seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, …
The effect of each prime divisor p of b decreases as p gets larger.
Ideally, choose b so that it is a prime number.Alternatively, choose b so that it has no prime
factor smaller than 20.
![Page 15: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/15.jpg)
Mid-squareSquaring the key and then using an
appropriate number of bits from the middle of the square.
Example:Suppose a character is represented in 6 bits
and the bucket size is 2r.0 1 3 4
A 1
0 0 0 0 0 1 0 1 1 0 1 0 92
92x92=84640 1 0 0 0 0 0 1 0 0 0 01 0 0
r bits
![Page 16: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/16.jpg)
Mid-squareExample
Key = 113586, m =10000, where 9999 is the largest bucket address.
Squaring the key, and then we have
1 2 9 0 1 7 7 9 3 9 6
h(x) = 1779
![Page 17: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/17.jpg)
FoldingThe key k is partitioned into several parts,
all of the same length. These partitions are then added together to obtain the hash address of k.
Two schemesShift foldingFolding at the boundaries
1 2 3 2 0 3 2 4 1 1 1 2 2 0
P1 P2 P3 P4 P5
![Page 18: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/18.jpg)
P1
Folding
P2
P3
P4
P5
1 2 3
2 0 32 4 11 1 2 2 0
6 9 9
Shift folding
P1
P2
P3
P4
P5
1 2 3
3 0 22 4 12 1 1 2 0
8 9 7
Folding at the boundaries
![Page 19: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/19.jpg)
Overflow HandlingAn overflow occurs when the home bucket for a
new pair (key, element) is full.We may handle overflows by:
Search the hash table in some systematic fashion for a bucket that is not full.Linear probing (linear open addressing).Quadratic probing.Rehashing.
Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the bucket address.Array linear list.Chain.
![Page 20: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/20.jpg)
Linear ProbingAlso called linear opening addressing
Search one by one until a empty slot is found.Procedures: suppose b denotes the bucket
size.1.Compute h(k).2.Examine the hash table buckets in the order
ht[h(k)], ht[(h(k)+1)%b],…, ht[(h(k)+j)%b] until one of the following happens: ht[(h(k)+j)%b] has a pair whose key is k; k is
found. ht[(h(k)+j)%b] is empty; k is not in the table. Return to ht[h(k)]; the table is full.
![Page 21: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/21.jpg)
Linear Probing
divisor = b (number of buckets) = 17.Bucket address = key % 17.
0 4 8 12
16
• Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45
6 12
29
34 28
11
23 70 33
30
45
![Page 22: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/22.jpg)
Linear Probing0 4 8 1
2166 1
229
34 28
11
23 70 33
30
45
Consider: when 51 enters, how many comparisons are required?
Linear opening addressing tends to create “cluster”. These clusters become larger as more synonyms enter.
![Page 23: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/23.jpg)
Quadratic ProbingSuppose i is used as the increment.When overflow occurs, the search is carried
out by examining h(x), (h(x)+i2)%b, and (h(x)-i2)%b.For 1≦i ≦(b-1)/2 and b is a prime number of
4j+3.For example, b=3, 7, 11,…,43, 59..
![Page 24: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/24.jpg)
RehashingIf overflow occurs at hi(x), then try hi+1(x).
Use a series of hash function h1, h2, …, hm to find an empty bucket.
h1 h2 hmx hm(x
)
![Page 25: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649ca75503460f94969833/html5/thumbnails/25.jpg)
Chaining[0]
[4]
[8]
[12]
[16]
12
6
34
29
28
11
237
0
33
30
45
Disadvantage of linear probingComparison of
identifiers with different hash values.
Use linked list to connect the identifiers with the same hash value and to increase the capacity of a bucket.