1 hash tables a hash table is an array of size tsize has index positions 0.. tsize-1 two types of...
TRANSCRIPT
1
Hash Tables a hash table is an array of size Tsize
has index positions 0 .. Tsize-1 two types of hash tables
open hash table array element type is a <key, value> pair all items stored in the array
chained hash table element type is a pointer to a linked list of nodes
containing <key, value> pairs items are stored in the linked list nodes
keys are used to generate an array index home address (0 .. Tsize-1)
2
faster searching"balanced" search trees guarantee
O(log2 n) search path by controlling height of the search tree AVL tree 2-3-4 tree red-black tree (used by STL associative
container classes)hash table allows for O(1) search
performance search time does not increase as n
increases
3
Considerations
How big an array? load factor of a hash table is n/Tsize
Hash function to use? int hash(KeyType key) // 0 ..
Tsize-1Collision resolution strategy?
hash function is many-to-one
4
Hash Function
a hash function is used to map a key to an array index (home address) search starts from here
insert, retrieve, update, delete all start by applying the hash function to the key
5
Some hash functions
if KeyType is int - key % TSize if KeyType is a string - convert to an
integer and then % Tsizegoals for a hash function
fast to compute even distribution
cannot guarantee no collisions unless all key values are known in advance
6
An Open Hash Table
key value
Hash (key) producesan index in the range0 to 6. That index isthe “home address”
0123456
Some insertions:K1 --> 3K2 --> 5K3 --> 2
K1 K1info
K2 K2info
K3 K3info
7
Handling Collisions
0123456
K3 K3info
K1 K1info
K2 K2info
Some more insertions:K4 --> 3K5 --> 2K6 --> 4
K4 K4info
K5 K5info
K6 K6info
Linear probing collisionresolution strategy
8
Search Performance
0123456
K3 K3info
K1 K1info
K2 K2info
K4 K4info
K5 K5info
K6 K6infoAverage number of probes needed to retrieve the value with key K?
K hash(K) #probesK1 3 1K2 5 1K3 2 1K4 3 2K5 2 5K6 4 4
14/6 = 2.33 (successful)
unsuccessful search?
9
A Chained Hash Table
insert keys:K1 --> 3K2 --> 5K3 --> 2K4 --> 3K5 --> 2K6 --> 4
linked lists of synonyms
0123456
K3 K3info
K1 K1info
K5 K5info
K4 K4info
K6 K6info
K2 K2info
10
Search PerformanceAverage number of probes needed to retrieve the value with key K?
K hash(K) #probesK1 3 1K2 5 1K3 2 1K4 3 2K5 2 2K6 4 1
8/6 = 1.33 (successful)
0123456
K3 K3info
K1 K1info
K5 K5info
K4 K4info
K6 K6info
K2 K2info
unsuccessful search?
11
successful search performance
open addressing open addressing chaining (linear probing) (double hashing)load factor 0.5 1.50 1.39 1.25 0.7 2.17 1.72 1.35 0.9 5.50 2.56 1.45 1.0 ---- ---- 1.50 2.0 ---- ---- 2.00
12
Factors affecting Search Performance
quality of hash function how uniform? depends on actual data
collision resolution strategy used load factor of the HashTable
N/Tsize the lower the load factor the better
the search performance
13
TraversalVisit each item in the hash tableOpen hash table
O(Tsize) to visit all n items Tsize is larger than n
Chained hash table O(Tsize + n) to visit all n items
Items are not visited in order of key value
14
Deletions?
search for item to be deletedchained hash table
find node and delete itopen hash table
must mark vacated spot as “deleted” is different than “never used”
15
Hash Table Summarysearch speed depends on load
factor and quality of hash function should be less than .75 for open
addressing can be more than 1 for chaining
items not kept sorted by keyvery good for fast access to
unordered data with known upper bound to pick a good TSize
16
heap is a binary tree that
is complete has the heap-order property
max heap - item stored in each node has a key/priority that is >= the priority of the items stored in each of its children
min heap - item stored in each node has a key/priority that is <= the priority of the items stored in each of its children
efficient data structure for PriorityQueue ADT requires the ability to compare items based on
their priorities basis for the heapsort algorithm
17
two heaps
23 18 9 8 12 7 1 4 2
A heap is always a complete binary tree
1 4 2 9 8 7 18 23 12
18
a complete binary tree can be stored in an array
23 18 9 8 12 7 1 4 2
for the item in A[i]: leftChild is in A[2i+1] rightChild is in A[2i+2] parent is in A[(i-1)/2]
0 1 2 3 4 5 6 7 8 23 18 9 8 12 7 1 4 2A
9Size
19
PriorityQueue ADT Data Items
a collection of items which can be ordered by priority
Operations constructor - creates an empty PQ empty () - returns true iff a PQ is empty size () - returns the number of items in a PQ push (item) - adds an item to a PQ top () - returns the item in a PQ with the highest
priority pop () – removes the item with the highest
priority from a PQ
20
PQ Data structures unordered array or linked list
push is O(1) top and pop are (n)
ordered array or linked list push is O(n) top and pop are (1)
heap top is O(1) push and pop are O(log2 n)
STL has a priority_queue class is implemented using a heap
21
PQ operations top
return item at A[0] push and pop must maintain heap-order
property push
put new item at end (in A[size]) re-establish the heap-order property by moving
the new item to where it belongs pop
A[0] is item to delete swap A[0] and A[size-1] move item at A[0] down a path to where it
belongs
22
pop( )
0 1 2 3 4 5 6 7 8 23 18 9 8 12 7 1 4 2A
9Size
23 18 9 8 12 7 1 4 2
18 12 9 8 2 7 1 4
18 12 2 23
8
23
Balanced Search Trees
several varieties (Ch.13) AVL trees 2-3-4 trees Red-Black trees B-Trees (used for searching secondary
memory) nodes are added and deleted so that the
height of the tree is kept under control insert and delete take more work, but
retrieval (also insert & delete) never more than log2 n because height is controlled