Download - Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright

Can’t provide fast insertion/removal and fast lookup atthe same time

Vectors, Linked Lists, Stack, Queues, Deques

4

Data Structures - CSCI 102

Copyright © William C. Cheng

Data Structure Limitations

Provide consistently fast operations, but must maintainan internal ordering

Binary Search Trees, Heaps

What if we didn’t care about the ordering of the elementsat all?

How can we further improve the performance of lookup,add & removal?

Each value in the table has a unique key

For operations where we only care about fastadd/remove/search, not fast traversal, we create a tablestructure to optimize for fast lookup

5



Lookup Tables

The key is used as a short identifier to lookup an entirevalue in the table

Your student ID is used to look up your student record(e.g. name, GPA, etc.)

Example

Search(key)See if a particular value identified by key is in thetable

What kind of operations do we need to perform on a lookuptable?

6



Lookup Tables

Insert(key,value)Insert a new value identified by key into the table

Remove(key)Remove the value identified by key from the table

We don’t care as much about traversal (visiting allelements) in this scenario

Let’s assume ID is a unique integer

We want to keep a directory of all the students at USC andbe able to look them up by their student ID

7



Sample Object

struct Student {string name;double gpa;int id;

};

Student data[4999];

If we can guarantee that student IDs will always range from0 to N (e.g. 0 to 4999), we could just store them in an array:

8



Direct Address Table

int id = 3285;Student s = data[id];

Then when we want to grab a particular student, we knowStudent N is at index N:



StudentObjects

John Doe3.20

Jane Doe2.62

Some Guy

Name

3.7

GPA

4

ID

0

1

2

3

4

5

4999

9


StudentIDs

Data

0

24

Direct Addressing

10




Maps keys directly to the indexes in an arrayUnused array indexes need to be marked

O(1) worst case

Generally use NULLOperations are fast

Key RestrictionsDirect Addressing Issues

11




Array Size

Keys must fall into a nice, uniform rangeKeys must be numeric

If there are N possible keys, then data[] must be ofsize NOur array could get HUGEWhat if we’re only using a small numbers of keys?Tons of space is wasted

How can we get around these limitations?

Hash Functions

12



Hash Functions

A function that maps key values to array indexesInput records all have a unique keyThe hash function maps key to an array indexRecords are stored at data[hash(key)]Ideally every unique key also has unique hash(key)

Direct Addressing essentially uses a hash function thatdoes nothing

int directAddressHash(int studentId) {return studentId;

}

13



Hash Tables

StudentObjects

John Doe

Jane Doe

Some Guy

3.2

2.6

3.7

0

2

4

NameGPAID

hash(4)

hash(0)

hash(2)

Data

StudentIDs

(Keys)

0

24

HashFunction

How can we avoid having to make our array gigantic tohold all possible keys?

Hash Functions

15



Hash Tables

Simple solution: use modular arithmeticSize of the backing array is no longer dependent onthe number of unique keysint modularHash(int studentId) {

return studentId % ARRAY_SIZE;}

int directAddressHash(int studentId) {return studentId;

}

Recall direct addressing:

FastHashing is supposed to be faster than a binary searchtree. hash(key) needs to be O(1)

What makes a good hash function?

16



Hash Functions

DeterministicIf we have a key K, then hash(K) must always givethe same result

Uniform distributionThe hash function should uniformly distribute keysacross all of the available indexes in the storage array

Making a good hash function is hard

For strings, use things like ASCII letter codes

Map your data into the set of natural numbersMaking a hash function

N = {0, 1, 2, ...}

17



Hash Functions

Prime table sizes tend to yield better resultsPrime numbers are your friend

E.g. make sure "get" and "gets" hash differentlyHandle variants of the same pattern

Try to be independent of any patterns that may exist inthe data

You won’t usually have to write your own, but you shouldknow what the default hash function does

Hash Tables do not maintain any ordering of theirinternal elements

Hashing Issues

19



Hash Tables

Creating a perfect hash function is almost impossible

When two distinct keys generate the same hash valueit’s called a collision

Collisions

hash(K1) == hash(K2)

If we try to insert a new element and there’s a collision,keep probing the hash table until we find a vacant space

Open Addressing

23



Collision Handling

If a collision occurs, use a deterministic algorithm tocalculate the next array index to check (based on theinitial hash result)

Probing

All data is stored directly in the hash table. No extra datastructures are needed.

Start with an empty Hash Table

25



Open Addressing (Linear Probing)

Data0

1

2

3

4

26


Student


Open Addressing (Linear Probing)Insert "John Doe" with ID = 123

Data0

1

2

3

4

John Doe

2.8

123

Name

GPA

ID

27


Student

1

2

3

4

John Doe

2.8

123

Name

GPA

ID

hash(123) = 1

hash()



hash(123) = 1

Data0

28


Student

1

2

3

4

John Doe

2.8

123

Name

GPA

ID

hash(123) = 1

hash()



hash(123) = 1data[1] is empty, no collision

Data0

29


Student

Data0

1

2

3

4

John Doe2.8123

John Doe

2.8

123

Name

GPA

ID

hash(123) = 1

hash()




store it there


Open Addressing (Linear Probing)Hash Table contains one item

Data0

1

2

3

4

30


John Doe2.8123

31



Open Addressing (Linear Probing)Insert "Jane Doe" with ID = 202

Data0

1

2

3

4

John Doe2.8123

StudentJane Doe

3.4

202

Name

GPA

ID

32


hash(202) = 3

Data0

1

2

3

4

John Doe2.8123

StudentJane Doe

3.4

202

Name

GPA

ID

hash()



hash(202) = 3

33


hash(202) = 3

Data0

1

2

3

4

John Doe2.8123

StudentJane Doe

3.4

202

Name

GPA

ID

hash()




34


hash(202) = 3

Data0

1

2

3

4

John Doe2.8123

Jane Doe3.4202

hash()




store it there

Student

Name

Jane Doe

GPA

3.4

ID

202

35


Data0

1

2

3

4

John Doe2.8123

Jane Doe3.4202


Open Addressing (Linear Probing)Hash Table contains two items

36


Data0

1

2

3

4

John Doe2.8123

Jane Doe3.4202Student

Some Guy

3.5

401

Name

GPA

ID


Open Addressing (Linear Probing)Insert "Some Guy" with ID = 401

37


Data0

1

2

3

4

John Doe2.8123


Some Guy

3.5

401

Name

GPA

ID

hash(401) = 1

hash()



hash(401) = 1

38


Data0

1

2

3

4

John Doe2.8123


Some Guy

3.5

401

Name

GPA

ID

hash(401) = 1

hash()



hash(401) = 1data[1] is non-empty, collision!

39


hash(401) = 1

Data0

1

2

3

4

John Doe2.8123


Some Guy

3.5

401

Name

GPA

ID

hash()




hash(401)+1 = 2

40


Data0

1

2

3

4

John Doe2.8123


Some Guy

3.5

401

Name

GPA

ID

hash()




hash(401)+1 = 2data[2] is empty, no collision

hash(401) = 1

hash(401)+1 = 2data[2] is empty, no collision

41


Data0

1

2

3

4

John Doe2.8123

Some Guy3.5401

Jane Doe3.4202

hash(401) = 1

hash()



hash(401) = 1

data[1] is non-empty, collision!

store it there

Student

Name

Some Guy

GPA

3.5

ID

401

Data0

1

2

3

4

123

Some Guy3.5401

Jane Doe3.4

202

42



Open Addressing (Linear Probing)Hash Table contains three items

John Doe2.8

Search(key)What is the Big O of each of these operations?

48



Open Addressing (Linear Probing)

Insert(key,value)

Remove(key)

Average: O(1), Worst Case: O(N)



How big is the table?

load factor = (# of elements) / (size of array)

Operations depend on the table’s load factor

How many slots are taken already?

"Utilization"

Each slot in the Hash Table can now contain a list ofelements instead of a single element

Chaining

50



Collision Handling

When multiple items hash to the same slot, they areplaced in the list at that slot

This requires the overhead of an extra list for each slot thatcontains one or more elements

2.8123

Jane Doe3.4202

51


Data0

1

2

3

4


ChainingHash Table contains two items

John Doe

StudentSome Guy

3.5

401

Name

GPA

ID

52


Data0

1

2

3

4


ChainingInsert "Some Guy" with ID = 401

John Doe

2.8123

Jane Doe3.4202

2.8123

Jane Doe3.4

202

StudentSome Guy

3.5

401

Name

GPA

ID

53


Data0

1

2

3

4

hash(401) = 1

hash()



hash(401) = 1

John Doe

StudentSome Guy

3.5

401

Name

GPA

ID

54


Data0

1

2

3

4

hash(401) = 1

hash()




John Doe

2.8123

Jane Doe3.4

202

StudentSome Guy

3.5

401

Name

GPA

ID

55


Data0

1

2

3

4

hash(401) = 1

hash()



hash(401) = 1data[1] is non-empty, collision!Chaining says to add the newentry to the list at data[1]

John Doe

2.8123

Jane Doe3.4

202

StudentSome Guy

3.5

401

Name

GPA

ID

56


Data0

1

2

3

4

hash()



hash(401) = 1data[1] is non-empty, collision!Chaining says to add the newentry to the list at data[1]

Insert Some Guy in the list at data[1]

hash(401) = 1

John Doe2.8123

Jane Doe3.4

202

57


Data0

1

2

3

4

2.8123

Jane Doe3.4202


ChainingHash Table contains three items

Some Guy3.5401

John Doe

63



Chaining

Search(key)What is the Big O of each of these operations?

Insert(key,value)

Remove(key)


Average: O(1), Worst Case: O(1)


Operations depend on the average length of a chain (exceptfor insert)

If a malicious user knows what hash function you’reusing, they can intentionally cause your worst-casebehavior

The Problem

66



Collision Handling

When the Hash Table is created, randomly choose ahash function independent of the keys that are going tobe stored

No single input gives worst-case behavior(just like randomized Quicksort)

Universal Hashing

Like chaining, but each element in the hash table holdsanother hash table with a different hash function

Multi-Level Hashing

67



Collision Handling

If the set of possible keys is static (never changes), wecan develop a perfect multi-level hash to give O(1) worstcase performance

e.g. The reserved keywords in a programminglanguage are a static set of keys

Perfect Hashing

By hashing multiple times, we can greatly decrease theodds of a collision

Hash Tables generally do provide a way for you toretrieve a list of the known keys

Just keep in mind there is no guaranteed ordering ofthe keys

Other Notes

68



Hash Tables

C++ currently has no built-in hash tableThere’s a proposal for unordered_map in the STL is onthe tableGoogle Sparse Hash provides C++ hash tablesBoost C++ Libraries provides hash tableshttp://www.boost.org/

Download - Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright

Top Related