cis 280 hashing and hash tables. calendar today: hashing and hash tables wednesday: calculus due,...

14
CIS 280 Hashing and Hash Tables

Upload: jayson-robertson

Post on 02-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

CIS 280

Hashing and Hash Tables

Page 2: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Calendar

• Today: Hashing and Hash Tables• Wednesday: Calculus due, work• Friday: Cuckoo hashing and linear hashing• Monday: Hashing wrapup• Wednesday: Work• Friday: Review, final project due• Wednesday, December 16, 8am: Final exam

Page 3: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Calculus

• Questions?• Work an example: x^2 + 3*x - 4

Page 4: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Hashing

Basic idea: create an integer “digest” that identifies a complex value.

• Speeds up equality check (check hash codes first) – example: MD5

• Use as a psuedo array index to create a hash table

An array maps well organized data (integer ranges) to values

A hash table maps irregular data to value

Page 5: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Two Different Problems

• Hash functions: mapping data to integers• Hash tables: mapping hash keys to values in a

sparse way (similar to sparse arrays).http://en.wikipedia.org/wiki/Hash_functionA hash is a sort of finger print for a complex

object.

Page 6: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Hash Functions

Goals:• Quick to compute (linear time in the size of

the object)• Random behavior: closely related objects

should map to non-related keysWe’ll simplify by only hashing strings – any

structure can be hashed though.

Page 7: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

String Hashing

Input: a string of characters (bytes), strOutput: a number between 0 and n-1Use a local variable, state, to hold a “state” that

condenses the string, and F, a combining function

state = 0;for (int i = 0; i < str.size(); i++) {

state = F((int) str.charAt(i), state);return state % n;

Page 8: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Choosing F

• Bad choice (checksum): + (why?)• Better choices: functions involving bitwise

operators (they run fast!), prime numbers (why?)

• Assume arithmetic operators don’t overflow (java does calculations mod the word size!)

• Example: F(v, s) = v*31+s• There are a lot of ways to choose F – check

wikipedia for some examples.

Page 9: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Example

Supposed A = 1, B = 2, …, what is the hash of:• “ABC”• “ACB”• “CB”Where F(c, s) = 5 * c + s?

Page 10: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Hash Tables

The following interface describes a Hash table:public interface HashTable<Key, Value>

extends Iterable<Value> { public Value get(Key k); // null if not found public void set(Key k, Value v); public int size();}

Page 11: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Hash Tables

Let’s simplify by assuming that the Key is a string:

public interface HashTable<Value> extends Iterable<Value> {

public Value get(String s); public void set(String s, Value v); public int size();

}

Page 12: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Caching the Hash

public class Hashed<Value> {public Value value;public int hash; }

Note that we can remove the Value if we only hash strings.

How would you compare hashed values for equality?

Page 13: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

Factoring Out The Hasher

public interface HashFunction {public HashedString hash(String s); }

Every hash table needs to keep a hash function around.

We can create families of such functions by parameterizing them.

Page 14: CIS 280 Hashing and Hash Tables. Calendar Today: Hashing and Hash Tables Wednesday: Calculus due, work Friday: Cuckoo hashing and linear hashing Monday:

A Naïve Hashtable

Instead of using hash codes to organize the table, we’ll use direct searches.

public class SimpleHashTable<Value> implements HashTable<Value>;

public ArrayList<Pair<HashedString, Value>> values = new ArrayList<Pair<HashedString, Value>> ();

public HashFunction f;public SimpleHashTable(HashFunction f) { this.f = f; }public Value get(String s) {