one more definition: a binary tree, t, is balanced if t is empty, or if
DESCRIPTION
One more definition: A binary tree, T, is balanced if T is empty, or if abs ( height (leftsubtree of T) - height ( right subtree of T) )TRANSCRIPT
One more definition: A binary tree, T, is balanced if T is empty, or if
abs ( height (leftsubtree of T) - height ( right subtree of T) ) <= 1
and if the left subtree of T and the right subtree of T are balanced.
That is, a binary tree, T, is balanced if for every node N of T,
abs ( height (left subtree of N) - height (right subtree of N) ) <= 1
balanced
Unbalanced at this node
The basic reason for using a binary search tree for storing data is to optimize search time.
If a binary search tree, T, containing n nodes is full, this optimized search time is realized, and
n = 2k - 1 for some non-negative integer k
The height (T) = k - 1
The tree T has k levels.
The tree T has nodes at levels 0 (root node), 1, 2, 3 . . . k - 1 (and all leaf nodes of T are at level k - 1.
Solving the equation above gives k = log2 (n + 1)
n = 31 n = 2k - 1 k = log2 (n + 1) = 5
The height of the tree is k - 1 = 4
All leaf nodes are at level 4.
At the opposite extreme, a binary search tree with n nodes could be a chain, or vine.
A tree such as this one has n levels, and a height of n - 1
The maximum number of comparisons in a search for a particular node is n - one for each level.
The maximum number of comparisons in a search of a full binary search tree for a particular node is k = log2 (n + 1) - one comparison at each level.
The maximum number of comparisons in a search of a binary search tree which is a chain for a particular node is n - one for each level.
For example if n = 262,143
If the n nodes are stores in a full binary search tree, a maximum of 18 comparisons are needed in a search for a particular node.
If the n nodes are stored in a binary search tree that forms a chain, a maximum of 262,143 comparisons are needed in a search for a particular node.
Clearly a full binary search tree realizes the optimal search time. But a full tree has no room to grow, or shrink.
Between these two extremes,
a balanced tree or a complete tree
yields close to the optimal search time, and still has room to grow.
The minimum height of a binary tree with n nodes is
ceiling ( log2 ( n + 1) ) - 1
And the number of levels is
ceiling (log2 ( n + 1) )
To show this:
Let k be the smallest integer for which n <= 2k - 1
Then 2k-1 - 1 < n <= 2k - 1
Add one to all three parts of this inequality and take the log2 of all three parts:
k - 1 < log2 ( n + 1) <= k
k - 1 < log2 ( n + 1) <= k
If the equality holds, the tree is full, and
k = log2 ( n + 1) = the number of levels
height of the tree = log2 ( n + 1) - 1
Otherwise, log2 ( n + 1) is not an integer; round it up, and
k = ceiling (log2 ( n + 1) ) = the number of levels
and ceiling (log2 ( n + 1) ) - 1 = the height of the tree.
Suppose T is a binary search tree with 300,000 nodes having minimal height, for instance T may be a complete tree.
The smallest integer, k, for which
n <= 2k - 1 where n = 300,000
Is 19 219 - 1 = 524, 287
218 - 1 = 262,143
So 2k-1 - 1 < n <= 2k - 1
And the maximum number of comparisons in a search of this binary search tree for a particular node is 19
And if T is a complete tree, there are 150,000 leaf nodes, more than 112,000 are at next lowest level so the tree can grow without degrading search times.
In practice searching a set of data occurs MUCH MORE frequently than adding a new item of data, or removing an existing item of data.
The algorithm presented in the text follows the premise that whenever a node is added to, or removed from, a balanced tree, the tree is tested, and if is unbalanced, the tree is rebalanced with one or more rotation operations.
A newer algorithm that will be presented in class follows a different premise.
The tree is initially built as a complete binary search tree. As nodes are added, and removed (following the algorithms illustrated in class), the tree may become closer to a chain, and further from a complete tree. Consequently the search times become degraded. A statistical utility tracks the search times, and when the
average number of comparisons per search
exceeds log2 (n + 1) by some percentage, a rebalancing utility is called to reform the binary search tree to a complete binary search tree.
So rebalancing occurs only when performance is suffering.
The rebalancing algorithm
1. Converts the tree to a vine. A vine is a binary tree in which the left child of every node is NULL.
2. Convert the chain to a complete binary search tree.
Step One - converting the tree to a vine:
For each node, N, in the tree
if N has a left child
rotate N and its left child to the right (clockwise). If the
left chilld of N has a right subtree, that subtree becomes
the left subtree of N.
Step Two - converting the vine to a complete binary tree.
The sequence { 2k - 1: k >= 1} = { 1, 3, 7, 15, 31, . . . } plays an important role in this step.
Let n = the number of nodes in the vine.
Let k be the smallest integer so that
2k-1 - 1 < n <= 2k - 1
Case I: n = 2k - 1 - the resulting complete tree will be a full tree.
1. At every second node, N, (and its parent), rotate to the left (counter clockwise). If N has a left subtree, it becomes the right subtree of N’s parent. The number of rotations = 2k-1 - 1 ( a value in the sequence above).
2. Repeat these rotations at every second node in the right chain for
2k-2- 1 nodes (the next smaller value in the sequence above). At the last repetition, perform a single left rotation at the second node and its parent.
Case II: 2k-1 - 1 < n < 2k - 1
The resulting tree will be complete, but not full.
1. Do a left rotation about every second node for a total of
n - (2k-1 - 1) nodes. This is the number of nodes that will be in the lowest level of the complete tree. The resulting chain of right children will contain 2k-1 - 1 nodes.
2. Apply Case I.