Download - the BSTree class
1
the BSTree<TE, KF> class
BSTreeNode has same structure as binary tree nodes
elements stored in a BSTree are a key-value pair must be a class (or a struct) which has
a data member for the value a data member for the key a method with the signature: KF key( )
const; where KF is the type of the key
2
an examplestruct treeItem{ int id; // key string data; // value int key( ) const { return id; }};
BSTree<treeItem, int> myBSTree;
3
basic BST search algorithm
void search (bstree, searchKey){ if (bstree is empty) //base case: item not found // take needed action else if (key in bstree's root == search Key) // base case: item found // take needed action else if (searchKey < key in bstree's root ) search (leftSubtree, searchKey); else search (rightSubtree, searchKey);}
4
deletion cases item to be deleted is in a leaf node
pointer to its node (in parent) must be changed to NULL
item to be deleted is in a node with one empty subtree pointer to its node (in parent) must be
changed to the non-empty subtree item to be deleted is in a node with
two non-empty subtrees
5
the easy cases
36
20 42
12 24 39 45
21 40
6
the “hard” case
36
20 42
12 24 39 45
21 40
7
the “hard” case
36
20 42
12 24 39 45
21 40
replace with smallest in right subtree (inordersuccessor)
replace with largest in left subtree (inorderpredecessor)
8
traversing a binary search tree
can use any of the binary tree traversal orders – preorder, inorder, postorder base case is reaching an empty tree
inorder traversal visits the elements in order of their key values
how would you visit the elements in descending order of key values?
9
big Oh of BST operations
measured by length of the search path depends on the height of the BST height determined by order of insertion
height of a BST containing n items is minimum: floor (log2 n) maximum: n - 1 average: ?
10
faster searching"balanced" search trees guarantee
O(log2 n) search path by controlling height of the search tree AVL tree 2-3-4 tree red-black tree (used by STL associative
container classes)hash table allows for O(1) search
performance search time does not increase as n
increases
11
Hash Table a hash table is an array of size Tsize
has index positions 0 .. Tsize-1 two types of hash tables (Nyhoff – Ch.9.3)
open hash table array element type is a <key, value> pair all items stored in the array
chained hash table element type is a pointer to a linked list of nodes containing
<key, value> pairs items are stored in the linked list nodes
keys are used to generate an array index home address (0 .. Tsize-1)
12
Considerations
How big an array? load factor of a hash table is n/Tsize
Hash function to use? int hash(KeyType key) -> 0 .. Tsize-1
Collision resolution strategy? hash function is many-to-one
13
Hash Function
a hash function is used to map a key to an array index (home address) search starts from here
insert, retrieve, update, delete all start by applying the hash function to the key
14
Some hash functions
if KeyType is int - key % TSize if KeyType is a string - convert to an
integer and then % Tsizegoals for a hash function
fast to compute even distribution
cannot guarantee no collisions unless all key values are known in advance
15
An Open Hash Table
key value
Hash (key) producesan index in the range0 to 6. That index isthe “home address”
0123456
Some insertions:K1 --> 3K2 --> 5K3 --> 2
K1 K1info
K2 K2info
K3 K3info
16
Handling Collisions
0123456
K3 K3info
K1 K1info
K2 K2info
Some more insertions:K4 --> 3K5 --> 2K6 --> 4
K4 K4info
K5 K5info
K6 K6info
Linear probing collisionresolution strategy
17
Search Performance
0123456
K3 K3info
K1 K1info
K2 K2info
K4 K4info
K5 K5info
K6 K6infoAverage number of probes needed to retrieve the value with key K?
K hash(K) #probesK1 3 1K2 5 1K3 2 1K4 3 2K5 2 5K6 4 4
14/6 = 2.33 (successful)
unsuccessful search?
18
A Chained Hash Table
insert keys:K1 --> 3K2 --> 5K3 --> 2K4 --> 3K5 --> 2K6 --> 4
linked lists of synonyms
0123456
K3 K3info
K1 K1info
K5 K5info
K4 K4info
K6 K6info
K2 K2info
19
Search PerformanceAverage number of probes needed to retrieve the value with key K?
K hash(K) #probesK1 3 1K2 5 1K3 2 1K4 3 2K5 2 2K6 4 1
8/6 = 1.33 (successful)
0123456
K3 K3info
K1 K1info
K5 K5info
K4 K4info
K6 K6info
K2 K2info
unsuccessful search?
20
successful search performance
open addressing open addressing chaining (linear probing) (double hashing)load factor 0.5 1.50 1.39 1.25 0.7 2.17 1.72 1.35 0.9 5.50 2.56 1.45 1.0 ---- ---- 1.50 2.0 ---- ---- 2.00
21
Factors affecting Search Performance
quality of hash function how uniform? depends on actual data
collision resolution strategy used load factor of the HashTable
N/Tsize the lower the load factor the better
the search performance
22
TraversalVisit each item in the hash tableOpen hash table
O(Tsize) to visit all n items Tsize is larger than n
Chained hash table O(Tsize + n) to visit all n items
Items are not visited in order of key value
23
Deletions?
search for item to be deletedchained hash table
find node and delete itopen hash table
must mark vacated spot as “deleted” is different than “never used”
24
Hash Table Summarysearch speed depends on load factor
and quality of hash function should be less than .75 for open
addressing can be more than 1 for chaining
items not kept sorted by keyvery good for fast access to unordered
data with known upper bound to pick a good TSize