intermediate information structurespages.cpsc.ucalgary.ca/.../lecture_2_2017.pdf · 9 did you know...
TRANSCRIPT
![Page 1: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/1.jpg)
CPSC 335
Intermediate Information Structures
Computer Science
University of Calgary
Canada
LECTURE 2
HASHING
Jon Rokne
Modified from Marina’s lectures
![Page 2: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/2.jpg)
2
Outline
r Definition of Hashing r Did you know that?
r Hash functions
r Collision Resolution
r Analysis of searching with Hash tables
![Page 3: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/3.jpg)
3
Approaches to Search
1. Sequential and list methods (lists, tables, arrays). 2. Direct access by key value (hashing) 3. Tree indexing methods.
Introduction to Hashing
![Page 4: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/4.jpg)
4
Definition
Hashing is the process of mapping a key value to a position in a table.
A hash function maps key values to positions.
A hash table is an array that holds the records. Data can be accessed in O(1) time regardless of size
of data using hashing.
Introduction to Hashing
![Page 5: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/5.jpg)
5
Introduction to Hashing
![Page 6: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/6.jpg)
6
Introduction to Hashing
![Page 7: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/7.jpg)
7
Examples of Usefullness
1. 10 stock items (SKU’s). The SKU’s are between 0 and 1000. Creating a table of all possible SKU numbers in order to index the
SKU information would require 1000 storage locations and this is an obvious waste of memory.
2. Telephone lists. Your contact list is indexed with a list of 100
telephone numbers of the form xxx-xxx-xxxx (standard North American long distance codes). A table to index the list using the all of the possible numbers would require 10^10 entries.
Introduction to Hashing
![Page 8: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/8.jpg)
8
Applications of Hashing
Ø Compilers use hash tables to keep track of declared variables
Ø A hash table can be used for on-line spelling checkers — if misspelling detection (rather than correction) is important, an entire dictionary can be hashed and words checked in constant time
Ø Game playing programs use hash tables to store seen positions, thereby saving computation time if the position is encountered again
Ø Hash functions can be used to quickly check for inequality — if two elements hash to different values they must be different – can
they be use to check for equality as well?
Ø Storing sparse data
Introduction of Hashing
![Page 9: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/9.jpg)
9
Did you know that? n Cryptography was once known only to the key people in the the
National Security Agency and a few academics. n Until 1996, it was illegal to export strong cryptography from the
United States. n Fast forward to 2006, and the
Payment Card Industry Data Security Standard (PCI DSS) requires merchants to encrypt cardholder information. Visa and MasterCard can levy fines of up to $500,000 for not complying!
n Among methods recommended are: ¡ Strong one-way hash functions (hashed indexes) ¡ Truncation ¡ Index tokens and pads (pads must be securely stored) ¡ Strong cryptography [Hashing for fun and profit: Demystifying encryption for PCI DSS
Roger Nebel]
Decrypted secrets, F. L. Bauer is another good reference.
![Page 10: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/10.jpg)
10
Did you know that?
n Transport Layer Security protocol on networks (TLS) uses the Rivest, Shamir, and Adleman (RSA) public key algorithm for the TLS key exchange and authentication, and only the Secure Hashing Algorithm 1 (SHA-1) for the key exchange and hashing.
[System cryptography: Use FIPS compliant algorithms for
encryption, hashing, and signing, Microsoft TechNews, 2005]
![Page 11: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/11.jpg)
11
Did you know that?
n Spatial hashing studies performed at Microsoft Research Redmond combine hashing with computer graphics to create a new set of tools for rendering, mesh reconstruction, and collision optimization (see public poster by Hugues Hoppe on the next slide)
![Page 12: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/12.jpg)
Perfect Spatial Hashing Sylvain Lefebvre Hugues Hoppe (Microsoft Research)
Hash table Offset table Hash table Offset table
Vector images Sprite maps
Alpha compression
3D textures 3D painting
Simulation Collision detection
2D 3D
1282 382 182 1283 353 193
Applications Hash function
p
The ima
( )s h p=
modq p r=
Th
Domain Hash table H
Offset table Φ
[ ]( )h p p p= +Φ
• Perfect hash on multidimensional data • No collisions à ideal for GPU • Single lookup into a small offset table
• Offsets only ~4 bits per defined data • Access only ~4 instructions on GPU • Optimized spatial coherence
10243, 46MB, 530fps 20483, 56MB, 200fps
10243, 12MB, 140fps 2563, 100fps
10242, 500KB, 700fps +900KB, 200fps
(modulo table sizes)
0.9bits/pixel, 800fps
1.8%
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x
24372
833×33
• We design a perfect hash function to losslessly pack sparse data while retaining efficient random access:
• Simply:
453
nearest: 7.5MB, 370fps
11632
![Page 13: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/13.jpg)
13
Did you know that?
n Combining hashing and encryption provides a much stronger tool for database and password protection.
n http://msdn.microsoft.com/msdnmag/issues/03/08/SecurityBriefs/
[Security Briefs, SMDN Magazine]
![Page 14: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/14.jpg)
14
How can I store passwords in a custom user database? n
There are several options. The simplest might leave you with cleartext passwords. The following example is XML:
<users> <user name='Alice' password='7&y2si(V1dX'/> <user name='Bob' password='mary'/> <user name='Fred' password='mary'/> </users>
After implementing something like this, you'll likely feel rather uncomfortable that all those passwords are sitting there in one file, in the clear. If you don't feel uncomfortable, you should!
The first approach you might take to protect these passwords is to encrypt them. That's better than nothing, but it's not the best solution. In order to validate a user's password, you need the encryption key, which means it needs to be available on the machine where the passwords are processed.
![Page 15: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/15.jpg)
15
How can I store passwords in a custom user database? A better solution that doesn't require any key at all is a one-way function!
n A cryptographic hash algorithm like SHA-1 or MD5 is a sophisticated one-way function that takes some input and produces a hash value as output, but more resistant to collisions.
n It's incredibly unlikely that you'd find two messages that hash to the same value! As a one-way function, it can't be reversed. There is no key that you need to store. You hash the password before storing it in the database:
<users> <user name='Alice' password='D16E9B18FA038...'/> <user name='Bob' password='5665331B9B819...'/> <user name='Fred' password='5665331B9B819...'/> </users> Now when you receive the cleartext password and need to verify it, you don't decrypt the stored password for comparison. Instead, you hash the password provided by the user and compare the result with your stored hash. If an attacker manages to steal your password database, he won't be able to use the passwords, as they can't be reversed back into cleartext.
![Page 16: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/16.jpg)
16
Salt n But look closely at Bob and Fred's hashed passwords. If the attacker happened to be
Fred, he now knows that Bob uses the same password he does. What luck! Even without this sort of luck, a bad guy can perform a dictionary attack against the hashed passwords to find matches.
n The usual way a dictionary attack is performed is to get a list of commonly used passwords, like the lists you'll find at ftp://coast.cs.purdue.edu/pub/dict/wordlists, and calculate the hash for each. Now the attacker can compare the hash values of his dictionary with those in the password database. Once he finds a match, he looks up the corresponding password.
n To slow down the attack, use salt. Salt is a way to season the passwords before hashing them, making the attacker's precomputed dictionary useless. Here's how it's done. Whenever you add an entry to the database, you calculate a random string of digits to be used as salt. When you want to calculate the hash of Alice's password, you look up the salt value for Alice's account, prepend it to the password, and hash them together. The resulting database looks like this:
<users> <user name='Alice' salt='Tu72*&' password='6DB80AE7...'/> <user name='Bob' salt='N5sb#X' password='096B1085...'/> <user name='Fred' salt='q-V3bi' password='9118812E...'/> </users>
n Note that now there is no way to tell that Bob and Fred are using the same password.
![Page 17: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/17.jpg)
17
Salt: example of usage Below is a C# example of using hash library
[Keith Brown, Hashing Passwords, The AllowPartiallyTrustedCallers Attribute]: string password = Console.ReadLine(); SaltedHash sh = SaltedHash.Create(password); // imagine storing the salt and hash in a database string salt = sh.Salt; string hash = sh.Hash; Console.WriteLine("Salt: {0}", salt); Console.WriteLine("Hash: {0}", hash); // after looking up salt and hash, verify a password SaltedHash ver = SaltedHash.Create(salt, hash); bool isValid = ver.Verify(password);
![Page 18: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/18.jpg)
18
Hash Functions
Ø Hashing is the process of chopping up the key and mixing it up in various ways in order to obtain an index which will be uniformly distributed over the range of indices -- hence the ‘hashing’. There are several common ways of doing this: Ø Truncation Ø Folding Ø Modular Arithmetic
Hash Functions
![Page 19: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/19.jpg)
19
Hash Functions – Truncation
Ø Truncation is a method in which parts of the key are ignored and the remaining portion becomes the index. - For this, we take the given key and produce a hash location by taking portions of the key (truncating the key). Ø Example – If a hash table can hold 1000 entries and an 8-digit number is used as key, the 3rd, 5th and 7th digits starting from the left of the key could be used to produce the
index. - e.g. .. Key is 62538194 and the hash location is 589. - Ø Advantage: Simple and easy to implement.
Ø Problems: Clustering and repetition.
Hash Functions
![Page 20: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/20.jpg)
20
Hash Functions – Folding
Ø Folding breaks the key into several parts and combines the parts to
form an index. - The parts may be recombined by addition, subtraction, multiplications
and may have to be truncated as well. - Such a process is usually better than truncation by itself since it
produces a better distribution: all of the digits in the key are considered. - Using a key 62538194 and breaking it into 3 numbers using the first 3
and the last 2 digits produced 625, 381 and 94. These could be added to get 1100 which could be truncated to 100.
They could be also be multiplied together and then three digits chosen from the middle of the number produced.
Hash Functions
![Page 21: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/21.jpg)
21
Hash Functions – (Modular Arithmetic)
Ø Modular Arithmetic process assures that the index produced is within
a specified range. For this, the key is converted to an integer which is divided by the range of the index with the resulting
function being the value of the remainder. Uses: biometrics, encryption, compression - If the value of the modulus is a prime number, the distribution of indices obtained tends to be quite uniform. - A table whose size is some number which has many factors provides the
possibility of many indices which are the same, so the size should be a prime number.
Hash Functions
![Page 22: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/22.jpg)
22
Good Hash Functions Ø Hash functions which use all of the key are almost always better
than those which use only some of the key. - When only portions are used, information is lost and therefore the number of possibilities for the final key are reduced. - If we deal with the integer its binary form, then the number of pieces that can be manipulated by the hash function is greatly increased.
Hash Functions
![Page 23: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/23.jpg)
23
Collision Ø It is obvious that no matter what function is used, the possibility exists that the use of the function will produce an index which is a duplicate of an index which already exists. This is a Collision. Collision resolution strategy: - Open addressing: store the key/entry in a different position
- Chaining: chain together several keys/entries in each position
Collision Resolution
![Page 24: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/24.jpg)
24
Collision - Example - - Hash table size 11 - - Hash function: key mod hash size Example sequence of keys: 23, 18, 29, 28, 39, 13, 16, 12, 17. For example 23 mod 11 1 ß-i.e position 1 in hash table. The positions hashed to in the hash table for the 9 items are then: There are collisions at positions 6 and 7 with this hash function.
Collision Resolution
![Page 25: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/25.jpg)
25
Collision Resolution – Open Addressing Ø Resolving collisions by open addressing is resolving the problem by taking the next open space as determined by rehashing the key according to some algorithm. Ø Two main open addressing collision resolution techniques: - - Linear probing: increase by 1 each time [mod table size!] - - Quadratic probing: to the original position, add 1, 4, 9, 16,… also in some cases key-dependent increment technique is used.
Collision Resolution
Probing If the table position given by the hashed key is already occupied, increase the position by some amount, until an empty position is found
![Page 26: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/26.jpg)
26
Collision Resolution – Open Addressing Linear Probing new position = (collision position + j) MOD hash size (where “current position” is the position determined by initial hash of a key.) Example sequence of keys: 23, 18, 29, 28, 39, 13, 16, 12, 17. Before linear probing: After linear probing: Problem – Clustering occurs, that is, the used spaces tend to appear in groups which tends to grow and thus increase the search time to reach an open space.
Collision Resolution
Example: 39 mod 11 6 39+1 mod 11 7 39+2 mod 11 8 39+3 mod 11 9 (pos. 6, 7, 8, 9 already used)
![Page 27: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/27.jpg)
27
Collision Resolution – Open Addressing Ø In order to try to avoid clustering, a method which does not look for the first open space must be used.
Ø Two common methods are used –
- - Quadratic Probing - - Key-dependent Increments
Collision Resolution
![Page 28: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/28.jpg)
28
Collision Resolution – Open Addressing
Quadratic Probing new position = (collision position + j2) MOD hash size { j = 1, 2, 3, 4, ……} Example – Before quadratic probing: After quadratic probing: Problem – Overflow may occurs when there is still space in the hash table.
Collision Resolution
Example: 39 mod 11 6 39+1 mod 11 7 39+4 mod 11 10 (pos. 6, 7 already used)
![Page 29: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/29.jpg)
Collision Resolution – Open Addressing Quadratic Probing new position = (current position + j^2) MOD hash size Example – Given the values {23, 34, 29, 40, 22, 397, 20}, a hash table of size 7, and hash function h(x) = x mod 7, prove that a hash table for these values in the above order cannot be computed using quadratic probing. Before qudratic probing: After qudratic probing:
![Page 30: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/30.jpg)
![Page 31: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/31.jpg)
Quadratic probing problem.
31
Compute 20+j^2 MOD 7 for j=0, 1, 2, … Result is never 4, hence item can never be placed. Can subtract 21 to get:
(20+j^2) MOD 7 (j^2-1)MOD 7 ((j-1)MOD 7) * (j+1) MOD 7
Now look at [(j+k*7)^2-1] mod 7 [j^2+2*j*k*7+k^2*7^2-1] mod 7 [j^2-1] mod 7+[2*k*j*7+k^2*7^2] mod 7 [j^2-1] mod 7 Hence repeats after 7 steps.
![Page 32: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/32.jpg)
Quadratic probing problem.
32
However, by rearranging we get the following: So, the sequence 20, 397,22,40,29,34,23 can be hashed using X MOD 7.
![Page 33: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/33.jpg)
33
Collision Resolution – Open Addressing
Key-dependent Increments Ø This technique is used to solve the overflow problem of the
quadratic probing method.
Ø These increments vary according to the key used for the hash function. If the original hash function results in a good distribution, then key- dependent functions work quite well for rehashing and all locations in the table will eventually be probed for a free position. Ø Key dependent increments are determined by using the key to calculate a new value and then using this as an increment to determine successive probes.
Collision Resolution
![Page 34: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/34.jpg)
34
Collision Resolution – Open Addressing
Key-dependent Increments
For example, since the original hash function was key Mod 11, we might choose a function of key DIV 11 to find the increment. Thus the hash function becomes - -
new position = (collision position + j*( key DIV 11)) MOD 11
Example – Before key-dependent increments: Modified position should read position modifier. New position the final
position. After key-dependent increments (ex 42 starts out in pos 9, then add 3 repeatedly). Process: sequential – compute slot, if filled use key-dependent increment.
Collision Resolution
![Page 35: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/35.jpg)
35
Collision Resolution – Open Addressing
Key-dependent Increments
Ø In all of the closed hash functions it is important to ensure that an increment of 0 does not arise. - - If the increment is equal to hash size the same position will be probed all
the time, so this value cannot be used.
Ø If we ensure that the hash size is prime and the divisors for the open and closed hash are prime, the rehash function does not produce a 0 increment, then this method will usually access all positions as does the linear probe. - - Using a key-dependent method usually result reduces clustering and therefore searches for an empty position should not be as long as for the linear method.
Collision Resolution
![Page 36: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/36.jpg)
36
Collision Resolution
Collision Resolution – Chaining Ø Each table position is a linked list
Ø Add the keys and entries anywhere in the list (front easiest) Advantages over open addressing:
- Simpler insertion and removal - Array size is not a limitation (but
should still minimize collisions: make table size roughly equal to expected number of keys and entries)
Disadvantage - Memory overhead is large if entries are small
![Page 37: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/37.jpg)
37
Collision Resolution – Chaining Example:
Before chaining: After chaining:
Collision Resolution
![Page 38: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/38.jpg)
38
Hash Table Collisions
For a given set of data, one hashing function may distribute the keys more evenly over the address space than another. A hashing function that has a large number of collisions is said to exhibit primary clustering. It is better to have a slightly more expensive hashing function for data that need to be stored on auxiliary storage. Another method for reducing collisions is reducing the load factor. Load factor=(number of records stored)/(total number of storage locations).
![Page 39: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/39.jpg)
39
Ø In analyzing search efficiency, the average is usually used. Searching with hash tables is highly dependent on how full the table is since as the table approaches a full state, more rehashes are necessary. The proportion of the
table which is full is the Load Factor from the previous slide. - - When collisions are resolved using open addressing, the maximum load factor is 1. - - Using chaining, however, the load factor can be greater than 1 when the table is full and the linked list attached to each hash address has more than one element. Ø - Chaining consistently requires fewer probes than open addressing. - Traversal of the linked list is slow and if the records are small, it may be just as well to use open addressing. - Chaining is the best under two conditions --- when the number of unsuccessful searches is large or when the records are large. - Open addressing would likely be a reasonable choice when most searches are likely to be successful, the load factor is moderate and the records are relatively small.
Analysis of Searching using Hash Tables
![Page 40: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/40.jpg)
40
Average number of probes for different collision resolution methods:
[ The values are for large hash tables, in this case larger than 430]
Analysis of Searching using Hash Tables
![Page 41: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/41.jpg)
41
Another hash function
![Page 42: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/42.jpg)
42
A perfect hash function
![Page 43: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/43.jpg)
43
n HASH FUNCTION SECURITY
![Page 44: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/44.jpg)
44
n HASH FUNCTION SECURITY
![Page 45: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/45.jpg)
45
n HASH FUNCTION SECURITY
Three definitions are from: Cryptography … by Stinson
![Page 46: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/46.jpg)
46
When are other representations more suitable than hashing: Ø Hash tables are very good if there is a need for many searches in a reasonably stable table Ø Hash tables are not so good if there are many insertions and
deletions, or if table traversals are needed — in this case, AVL trees are better (AVL is S. M. Adelson-Velskii and E. M. Landis).
Ø If there are more data than available memory then use a B-tree
Ø Also, hashing is very slow for any operations which require the entries to be sorted e.g. Find the minimum key
Analysis of Searching using Hash Tables
![Page 47: Intermediate Information Structurespages.cpsc.ucalgary.ca/.../LECTURE_2_2017.pdf · 9 Did you know that? n Cryptography was once known only to the key people in the the National Security](https://reader034.vdocuments.us/reader034/viewer/2022050217/5f633c19aba15560831bf975/html5/thumbnails/47.jpg)
47
Links for interactive hashing example: Ø http://www.engin.umd.umich.edu/CIS/course.des/cis350/hashing/WEB/HashApplet.htm
Ø http://www.cs.auckland.ac.nz/software/AlgAnim/hash_tables.html
Ø http://www.cse.yorku.ca/~aaw/Hang/hash/Hash.html
Ø http://www.cs.pitt.edu/~kirk/cs1501/animations/Hashing.html
Some Links to Hashing Animation