nyhoff, adts, data structures and problem solving with c++, second edition, © 2005 pearson...

24
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13- 140909-3 1 Hash Tables, Maps and Sets Chapter 15 Wherein we throw all the data into random array slots and somehow obtain O(1) retrieval time

Upload: vaughn-bitton

Post on 31-Mar-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

1

Hash Tables, Maps and SetsChapter 15

Wherein we throw all the data into random array slots and somehow obtain

O(1) retrieval time

Page 2: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

2

Hash Tables

• Recall order of magnitude of searches– Linear search O(n)

– Binary search O(log2n)

– Balanced binary tree search O(log2n)

– Unbalanced binary tree can degrade to O(n)

Page 3: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

3

Hash Tables

• In some situations faster search is needed– Solution is to use a hash function– Value of key field given to hash function– Location in a hash table is calculated

Page 4: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

4

Hash Functions

• Simple function could be to mod the value of the key by some arbitrary integerint h(int i){ return i % someInt; }

• Note the max number of locations in the table will be same as someInt

• Note that we have traded speed for wasted space– Table must be considerably larger than number

of items anticipated

Page 5: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

5

Hash Functions

• Observe the problem with same value returned by h(i) for different values of i– Called collisions

• A simple solution is linear probing– Linear search begins at

collision location– Continues until empty

slot found for insertion

Page 6: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

6

Hash Functions

• When retrieving a valuelinear probe until found– If empty slot encountered

then value is not in table

• If deletions permitted – Slot can be marked so

it will not be empty and cause an invalid linear probe

Page 7: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

7

Hash Functions

• Strategies for improved performance– Increase table capacity (less collisions)– Use different collision resolution technique– Devise different hash function

• Hash table capacity– Size of table must be 1.5 to 2 times the size of

the number of items to be stored– Otherwise probability of collisions is too high

Page 8: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

8

Collision Strategies

• Linear probing can result in primary clustering

• Consider quadratic probing– Probe sequence from location i is

i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, …– Secondary clusters can still form

• Double hashing – Use a second hash function to determine probe

sequence

Page 9: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

9

Collision Strategies

• Chaining– Table is a list or vector of head nodes to linked

lists– When item hashes to location, it is added to that

linked list

Page 10: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

10

Improving the Hash Function

• Ideal hash function– Simple to evaluate– Scatters items uniformly throughout table

• Modulo arithmetic not so good for strings– Possible to manipulate numeric (ASCII) value of

first and last characters of a name

Page 11: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

11

Implementations

    Hash TableResizable

ArrayBalanced

TreeLinked

List

Hash Table + Linked

List

Interfaces

Set HashSet   TreeSet   LinkedHashSet

List   ArrayList   LinkedList  

Map HashMap   TreeMap   LinkedHashMap

Page 12: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

• HashMap -- an associative data structure

• HashSet -- a container for holding values

• TreeMap – just like HashMap only items are ordered by key (internal red-black tree)

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

12

Page 13: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Hash Tables 13

the Map Interface

• Map ADT methods:– get(k): if the map M has an entry with key k, return

its associated value; else, return null – put(k, v): insert entry (k, v) into the map M; if key k

is not already in M, then return null; else, return old value associated with k

– remove(k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null

– size(), isEmpty()– keys(): return an iterator of the keys in M– values(): return an iterator of the values in M

Page 14: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Hash Tables 14

the Set Interface

• Set ADT methods:– add(o): Adds the specified element to this set if it

is not already present. Returns true if the set does not already contain element (new item added)

– contains(o): returnts true if the set contains the specified element

– remove(o): Removes the specified element from this set if it is present. Returns true if the set contained the specified element

– size(), isEmpty()

Page 15: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

15

Page 16: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

16

Page 17: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

17

Page 18: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

18

Page 19: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

19

Page 20: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

20

Page 21: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

21

Page 22: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

22

Page 23: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Using “Chaining” to resolve collisions

• A linked list in every cell of array

• No rehashing required

• Unlimited collisions can be handled

• Lousy hash codes will still reduce performanceNyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson

Education, Inc. All rights reserved. 0-13-140909-3 23

Page 24: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

24