sets, maps and hash tables

36
Sets, Maps and Hash Tables

Upload: marcia-jenkins

Post on 30-Dec-2015

63 views

Category:

Documents


2 download

DESCRIPTION

Sets, Maps and Hash Tables. Sets. We have learned that different data struc-tures have different advantages – and drawbacks Choosing the proper data structure depends on typical usage patterns - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sets, Maps and Hash Tables

Sets, Maps and Hash Tables

Page 2: Sets, Maps and Hash Tables

RHS – SWC 2

Sets

• We have learned that different data struc-tures have different advantages – and drawbacks

• Choosing the proper data structure depends on typical usage patterns

• Array- and list-oriented data structures are appropriate when the order of elements matter – but that is not always the case

Page 3: Sets, Maps and Hash Tables

RHS – SWC 3

Sets

• A Set is a data structure which can hold an unordered collection of elements

• Not having to worry about ordering can improve performance of other operations

• On a Set, we want to be able to– Insert an element– Delete an element– Check if a given element is in the Set

Page 4: Sets, Maps and Hash Tables

RHS – SWC 4

Sets

public interface Set<T>

{

void add(T element);

void remove();

boolean contains(T element);

Iterator<T> iterator();

}

Page 5: Sets, Maps and Hash Tables

RHS – SWC 5

Sets

• It turns out that insertion, deletion and check for containment can be done in O(log(n)), or even faster!

• Depends on the underlying implemen-tation of the interface

• In Java, implementation is either– HashSet (based on Hash Tables)– TreeSet (based on Trees)

Page 6: Sets, Maps and Hash Tables

RHS – SWC 6

Sets• A Set iterator is ”simpler” than e.g. a List

iterator– Elements will occur in ”random” order– No add method – we just call add on the Set

itself– No previous method – does not make sense

• The Set iterator does however have a delete method (why?)

Page 7: Sets, Maps and Hash Tables

RHS – SWC 7

Sets – Quality tip

• When using a Set, we must choose a spe-cific implementation (HashSet or TreeSet)

• However, the definition should look like:

Set<Car> cars = new HashSet<Car>();

Page 8: Sets, Maps and Hash Tables

RHS – SWC 8

Sets – Quality tip

Set<Car> cars = new HashSet<Car>();

• Why…? We should in general only refer to the interface, not the implementation

• Easy to switch implementation!

Page 9: Sets, Maps and Hash Tables

RHS – SWC 9

Maps

• A Map is a data structure which stores associations between– A collection of keys– A collection of values

• All keys map to a value

• Keys are unique (values are not)

Page 10: Sets, Maps and Hash Tables

RHS – SWC 10

Maps

K1

K2

K3

K4

V1

V3

V2

Page 11: Sets, Maps and Hash Tables

RHS – SWC 11

Map

public interface Map<K,V>

{

void put(K key,V value);

V get(K key);

void remove(K key);

Set<K> keySet();

}

Page 12: Sets, Maps and Hash Tables

RHS – SWC 12

Map

• The keySet method returns a Set containing all keys in the Map

• You must then iterate through this Set, in order to get all values stored in the Map

Page 13: Sets, Maps and Hash Tables

RHS – SWC 13

Map

Map<String,Car> carMap = new HashMap<String,Car>();

...

Set<String> regNumbers = carMap.keySet();

for (String regNo : regNumbers)

{

Car aCar = carMap.get(regNo);

... // Do something with the Car object

}

Page 14: Sets, Maps and Hash Tables

RHS – SWC 14

Hash Tables

• A Set and a Map are both abstract data types – we need a concrete implemen-tation in order to use them

• In the Java library, two implementations are available:– Sets: HashSet, TreeSet– Maps: HashMap, TreeMap

Page 15: Sets, Maps and Hash Tables

RHS – SWC 15

Hash Tables

• The implementations HashSet and HashMap are based on a Hash Table

• A Hash Table is based on the below ideas:– Create an array of length N, which can store

objects of some type T– Find a mapping from T to the interval [0; N-1]

(a Hash Function f)– Store an object t of type T in the position f(t)

Page 16: Sets, Maps and Hash Tables

RHS – SWC 16

Hash Tables

0 1 2 3 4

Car1 Car2

Car3f(Car1) = 3

f(Car2) = 0

f(Car3) = 2

Page 17: Sets, Maps and Hash Tables

RHS – SWC 17

Hash Tables

• A Hash Table is thus ”almost” an array

• Instead of having an index directly available, we must calculate it

• If calculation can be done in constant time, then all basic operations (insert, delete, lookup) can be done in constant time!

• Better than tree-based implementations, which have O(log(N))

Page 18: Sets, Maps and Hash Tables

RHS – SWC 18

Hash Tables

• However, there are some issues:– How do we define a

good mapping from the objects to [0; N-1]?

– What happens if we try to store two objects at the same position?

Page 19: Sets, Maps and Hash Tables

RHS – SWC 19

Hash Functions

• Before finding a good mapping – i.e. a good hash function – we must consider the size of the array

• For good performance, the array should at least be as large as the maximal number of objects stored

• Rule of thumb is about 30 % larger

• Size should be a prime number (???)

Page 20: Sets, Maps and Hash Tables

RHS – SWC 20

Hash Functions

• What if the expected number of objects is unknown in advance?

• We can expand a hash table dynamically

• If the hash table in running out of space, double the capacity

• Start out with a reasonably large array (space is cheap…)

Page 21: Sets, Maps and Hash Tables

RHS – SWC 21

Hash Functions

• Having handled the choice of N, how do we define a proper hash function?

• Properties of a hash function:– Must map all objects of type T to the interval

[0; N-1]– Should map objects as uniformly as possible

to the interval [0; N-1]

Page 22: Sets, Maps and Hash Tables

RHS – SWC 22

Hash Functions

• We can enforce the mapping to [0;N-1] by using the modulo operator:

f(t) = g(t) % N

• g(t) can then produce any integer value

• How do we achieve a uniform distribution?

• Theory for this is complicated, but there are some general rules to follow

Page 23: Sets, Maps and Hash Tables

RHS – SWC 23

Hash Functions

• A good hash function should be ”almost ran-dom”, but deterministic– ”Almost random” –

values are well distri-buted in the interval

– Deterministic – always produce the same output for the same input

Page 24: Sets, Maps and Hash Tables

RHS – SWC 24

Hash Functions

• In Java, all objects have a hashCode method– Defined in Object class– Can be overrided– Returns an integer (the Hash Code)– We must use modulo on the value ourselves

Page 25: Sets, Maps and Hash Tables

RHS – SWC 25

Hash Functions

• Hash function for integers:– The number itself…

• Hash function for strings:final int HASH_MULTIPLIER = 31;

int h = 0;

for (int i = 0; i < s.length; i++)

h = (HASH_MULTIPLIER * h) + s.charAt(i);

Page 26: Sets, Maps and Hash Tables

RHS – SWC 26

Hash Functions

• Hash code for an object can be calculated by combining hash codes for instance fields

• Combine values in a way similar to the algorithm used to find string hash codes

Page 27: Sets, Maps and Hash Tables

RHS – SWC 27

Hash Functions

public int hashCode()

{

final int MULTIPLIER = 31;

int h1 = regNo.hashCode();

int h2 = mileage;

int h3 = model.hashCode();

int h = h1*MULTIPLIER + h2;

h = h*MULTIPLIER + h3;

return h;

}

Page 28: Sets, Maps and Hash Tables

RHS – SWC 28

Hash Functions

• But wait…what about numeric overflow?

• We multiply a ”random” integer value with a number…?

• Does not really matter…

• As long as the algorithm is deterministic, overflow is not a problem

• Just helps ”scrambling” the value

Page 29: Sets, Maps and Hash Tables

RHS – SWC 29

Hash Functions

• Common pitfalls:– Remember to define a hashCode function – If you forget, the hashCode implementation in

Object is used– Based solely on memory location of object– Two objects with the same value of instance

fields will produce different hash codes…

Page 30: Sets, Maps and Hash Tables

RHS – SWC 30

Hash Functions

• Common pitfalls:– The hashCode function must be

”compatible” with your equals function– If a.equals(b) it must hold that a.hashCode() == b.hashCode()

– If not, duplicates are allowed!– The reverse condition is not required; two

different objects may have the same hash code

Page 31: Sets, Maps and Hash Tables

RHS – SWC 31

Hash Functions

• In general, you must remember to:– Either define the hashCode and the equals method

– Or not define any of them!

Page 32: Sets, Maps and Hash Tables

RHS – SWC 32

Handling collisions

• Even with a good hash function, we will still experience collisions

• Collision: two different objects t1 and t2 have the same hash code

• We will then try to store both objects in the same position in the array

• Now what…?

Page 33: Sets, Maps and Hash Tables

RHS – SWC 33

Handling collisions

• What we store in each position in the array is not the objects themselves, but a linked list of objects

• Objects with the same hash code h are stored in the linked list in position h

• With a good hash function, the average length of non-empty lists is less than 2

Page 34: Sets, Maps and Hash Tables

RHS – SWC 34

Handling collisions

0 1 2 3 4

Car1Car2 Car3

Car4

Car5

Car6

Page 35: Sets, Maps and Hash Tables

RHS – SWC 35

Handling collisions

• Basic operations (insert, delete, lookup) follow this structure:– Calculate hash code for the object– Find the corresponding position in the array

• Insert: Insert element at the end of list• Delete/Lookup: Iterate through list until element is

found, or end of list is reached

Page 36: Sets, Maps and Hash Tables

RHS – SWC 36

Handling collisions

• Basic operations are thus not done in truly constant time

• However, if a proper hash function is used, running time is constant in practice

• Use hash-based implementations unless special circumstances apply– Hard to define hash/equals function– More functionality required