search trees - carleton universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/comp...comp...
TRANSCRIPT
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-1
Search Trees
• Search Trees are tree structures that are used for(dictionary) search.
• Many kinds of search trees:– Binary Search Trees
– AVL Trees
– 2-3-4 Trees
– Red-Black Trees
– …
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-2
Binary Search Trees
• A Binary Search Tree (BST) is defined as:– a binary tree;
– each internal node stores an item (k, e) of a dictionary;
– any subtree of the tree is also a binary search tree;
– keys stored at nodes in the left subtree of v are less than or equal tok;
– keys stored at nodes in the right subtree of v are greater than orequal to k;
– external nodes do not hold elements but serve as place holders.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-3
Binary Search Trees
29
28
32
17
8354
76
80
65
44
88
97
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-4
Searches in BST
• A binary search tree T is a decision tree, where thequestion asked at an internal node v is whether the searchkey k is less than, equal to, or greater than the key stored atv.
• We start with the root, keep asking the question, until wefind the node we are looking for; or until we know that thenode we are looking for does not exist.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-5
Pseudo Code
Algorithm TreeSearch (k, v):
Input: A search key k and a node v of a binary search tree T.
Output: A node w of the subtree T(v) of T rooted at v, such that either
w is an internal node storing key k or w is the external node
encountered in the inorder traversal of T(v) after all the
internal nodes with keys smaller than k and before all the
internal nodes with keys greater than k.
if v is an external node then
return v
if k = key (v) then
return v
if k < key (v) then
return TreeSearch (k, T.leftChild (v))
else { k > key (v) }
return TreeSearch (k, T.rightChild (v))
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-6
Example
• Successful:findElement (76)
• A successfulsearch traversesa path starting atthe root andending at aninternal node 29
28
32
17
8354
76
80
65
44
88
97
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-7
Example
• Unsuccessful:findElement (25)
• An unsuccessfulsearch traversesa path starting atthe root andending at anexternal node 29
28
32
17
8354
76
80
65
44
88
97
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-8
Insertion
• To perform insertItem(k, e), let w be the nodereturned by TreeSearch(k, T.root ())
• If w is external, weknow that k is notstored in T. We canthen callexpandExternal (w) onT and store (k, e) in w.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-9
Insertion
• If w is internal, weknow another itemwith key k is storedat w. We call thealgorithmrecursively startingat T.rightChild (w)
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-10
Removal
• We locate the node wwhere the key is storedwith algorithm TreeSearch
• If w has an external childz, we remove w and z withremoveAboveExternal (z)
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-11
Removal
• If w has an no externalchildren, we find theinternal node y followingw in inorder traversal
• move the item at y into w
• perform the operationremoveAboveExternal (x),where x is the left child ofy (guaranteed to beexternal)
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-12
Time Complexity
• A search, insertion, or removal, visits the nodes along aroot-to-leaf path, plus possibly the siblings of such nodes
• Time O(1) is spent at each node
• The running time of each operation is O(h), where h is theheight of the tree
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-13
40
30
20
10
Time Complexity
• The height of binary searchtree is O(n) in the worstcase, where a binary searchtree looks like a sortedsequence.
• To achieve good runningtime, we need to keep thetree balanced, i.e., withO(log n) height
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-14
AVL Trees
• Named after its inventors, Adel’son-Vel’skii and Landis,AVL trees are special binary search trees that guarantee thetrees are nearly balanced.
• Since the trees are always nearly balanced, the worse caseperformance using AVL trees is bounded to O(nlogn).
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-15
AVL Trees
• An AVL Tree is defined as:– a binary search tree;
– each internal node stores an item (k, e) of a dictionary;
– any subtree of an AVL tree is also an AVL tree;
– for every internal node v of T, the heights of the children of v candiffer by at most 1.
– external nodes do not hold elements but serve as place holders.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-16
Example
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-17
Height of AVL Trees
• Proposition: The height of an AVL tree T storing n keys isO(log n).
• Justification: The easiest way to approach this problem isto try to find the minimum number of internal nodes of anAVL tree of height h: N(h).
• We have the base cases that N(1) = 1 and N(2) = 2.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-18
Induction
• for n≥3, an AVL tree of height h with N(h) minimal contains the rootnode, one AVL subtree of height h-1 and the other AVL subtree ofheight h-2.
• i.e. N(h) = 1 + N(h-1) + N(h-2)
• Knowing N(h-1) > N(h-2), we get N(h) > 2N(h-2)– N(h) > 2N(h-2)
– N(h) > 4N(h-4)
– …
– N(h) > 2iN(h-2i)
• Let i = h/2-1, we get: N(h) > 2h/2-1
• Taking logarithms: h < 2log N(h) + 2
• Thus the height of an AVL tree is O(log n)
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-19
A More Accurate Estimation
• We can have a more accurate estimation on the height ofan AVL tree.
• Before we start, we must look at some mathematicalmaterial:
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-20
Fibonacci Numbers Revisited
• The Fibonacci number is defined as:– F(0) = 0
– F(1) = 1
– F(n) = F(n-1) + F(n-2) for all n ≥ 2
• The Fibonacci numbers form the sequence:– 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, …
• The Fibonacci numbers F(n) grow exponentially with n.That is F(n) = O(2n).
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-21
251
2515
)(
and
where
)(
−
+
−
===
βα
βα nn
nF
1
1)2()(
5
22
−=−+=
++ − hh
hFhNβα
Fibonacci Numbers and AVL Trees
• Using induction, we can prove that the Fibonacci number
• We can also prove that the minimum number of theinternal node in an AVL tree of height h is:
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-22
328.0)1log(44.1
2
)1log(5loglog)2(
)1(5
1
log)1log(5log
25
2
−+≈−≤
++≤++≤
−≥
++
+
+
n
h
nh
n
n
n
h
h
α
α
αα
Height of AVL Trees
• So, now we have:
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-23
Searching the AVL Trees
• Since AVL trees are binary search trees, it is not a surprisethat searching the AVL trees are the same as searching thebinary search trees.
• Insertion and removal, on the other hand, is much moreinvolved.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-24
Insertions
• A binary search tree T is called balanced if for every nodev, the height of v’s children differ by at most one.
• Inserting a node into an AVL tree involves performing anexpandExternal (w) on T, which changes the heights ofsome of the nodes in T.
• If an insertion causes T to become unbalanced, we travelup the tree from the newly created node until we find thefirst node x such that its grandparent z is unbalanced node.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-25
Insertions
• Since z became unbalanced by an insertion in the subtreerooted at its child y, height(y) = height(sibling(y)) + 2
• To rebalance the subtree rooted at z, we must perform arestructuring– we rename x, y, and z to a, b, and c based on the order of the nodes
in an in-order traversal.
– z is replaced by b, of which the children are now a and c, of whichthe children, in turn, consist of the four other subtrees that formerlyare children of x, y, and z.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-26
Insertion
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-27
Insertion
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-28
Restructuring
• When the AVL tree becomes unbalanced due to insertionand/or removal, we have to rebalance the tree so it is stillan AVL tree.
• The rebalancing operation is call restructuring. Thisoperation can be divided into two groups:– single rotations
– double rotations
• Each of these groups has two different scenarios.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-29
Single Rotation
T0T1
T2
T3
c = xb = y
a = z
T0 T1 T2
T3
c = xb = y
a = zsingle rotation
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-30
Single Rotation
T3T2
TT0
a = xb = y
c = z
T3T2T1
T0
a = xb = y
c = zsingle rotation
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-31
Double Rotation
double rotationa = z
b = xc = y
T0T2
T1
T3 T0
T2T3T1
a = zb = x
c = y
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-32
Double Rotation
double rotationc = z
b = xa = y
T3T1
T2
T0 T3
T1T0 T2
c = zb = x
a = y
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-33
Algorithm restructure (x)
Algorithm restructure (x):
Input: A node x of a binary search tree T that has both a
parent y and a grandparent z
Output: Tree T restructured by a rotation (either single or
double) involving nodes x, y, and z.
let (a, b, c) be an inorder listing of the nodes x, y, z
let (T0, T1, T2, T3) be an inorder listing of the the four subtrees
of x, y, and z not rooted at x, y, or z
replace the subtree rooted at z with a new subtree rooted at b
let a be the left child of b
let T0, T1 be the left and right subtrees of a, respectively.
let c be the right child of b
let T2, T3 be the left and right subtrees of c, respectively.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-34
Removal
• Performing a removeAboveExternal (w) can cause T tobecome unbalanced.
• Let z be the first unbalanced node encountered whiletravelling up the tree from w. Also, let y be the child of zwith the larger height, and let x be the child of y with thelarger height.
• We can perform operation restructure (x) to restore balanceat the subtree rooted at z.
• As this restructuring may upset the balance of anothernode higher in the tree, we must continue checking forbalance until the root of T is reached.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-35
Removal
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-36
Removal
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-37
Removal
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-38
Removal
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-39
AVL Trees Summary
• Guaranteed O(log n) performance for all operations:– Search
– Insertion
– Removal
• A bit more complex than binary search:– Restructure through rotation
– Once at most for each insertion
– Up to O(h) or O(log n) for each removal.
Albert Chanhttp://www.scs.carleton.ca/~achan
School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types
Version 03.s9-40
Other Search Trees
• 2-3-4 trees
• Red-black trees
• AA trees
• Splay trees
• B trees
• B+ trees
• …
• Many others