on the costs of optimal and near-optimal binary search trees

Acta Informatica 18, 255-263 (1982)

�9 Springer-Verlag 1982

On the Costs of Optimal and Near-Optimal Binary Search Trees*

Brian Allen

8 Windridge Drive, Markham, Ontario L3P 1T8, Canada

Summary. We show that the cost of an optimal binary search tree can vary substantially, depending only on the left-to-right order imposed on the probabilities. We also prove that the costs of some common classes of near- optimal trees cannot be bounded above by the cost of an optimal tree plus a constant.

1. Introduction

A binary search tree is a well-known structure for information storage and retrieval. As a binary tree, it is either empty or else composed of a root node together with left and right subtrees, which are themselves binary trees. A node with two empty subtrees is called a leaf Search keys K I < K 2 < . . . < K . from an ordered set are assigned to the nodes such that keys in the left subtree are less than the key at the root and those in the right subtree are greater. Figure l(a) shows a binary search tree with 6 nodes. Searching for a key in the tree proceeds from the root along the unique path to the node labeled by that key. The number of edges in this path is called the depth of the node. In Fig. l(a), the node labeled K 3 has depth 2. The cost cr(K ) of searching for key K in tree T is defined to be one greater than the depth of the node in T which is labeled K.

A non-negative weight wl can be associated with each key Ki, and thereby with each node of the search tree. When the sum of these weights is 1, they may be interpreted as probabilities and written pi. The cost of a binary search tree T is defined to be

C(T)= ~ WiCT(K,). i = 1

* This work was supported by the National Research Council of Canada, while the author was at the University of Waterloo

001- 5903/82/0018/0255/$01.80

256 B. Allen

/ / \

K1 K3

5

\ / \ K 5 3 1

\ / \ \ K 6 6 5 2

a) Showing keys b) Showing weights

Fig. 1. A binary search tree

In Fig. l(a), if (wl, w2, w3, w4, w5, W 6 ) = ( 6 , 3, 5, 5, 1, 2), then the cost of the tree is 52. Since we shall not be concerned with the actual keys in the tree, this weighted tree can be visualized as in Fig. l(b). Given an ordered n-tuple of weights ~ = ( w 1 . . . . . w,), there are many possible weighted binary search trees corresponding to it. Let Coer(# ) be the minimum of the costs of all these possible trees. The trees which have this minimal cost are called optimal. The entropy of the discrete probability distribution p=(pl , ..., p,) is defined as

H(p) = ~ p, log (1/p,). 1 i = 1

Notice that CoPT(p) depends on the order of the probabilities within (pt, ,.., p,), whereas H(p) does not. For more information about binary search trees, the reader is referred to [-4].

2. Dependence of Optimal Cost on Ordering Constraints

Bayer [1] has proved the following bounds on the cost of an optimal binary search tree:

H(p)-logH(p)+l-loge<Coex(p)<=H(p)+l, for H(p)> 1. (1)

We will show that these bounds are essentially the best possible, if we restrict our attention to bounds which do not take into account the order of the probabilities within the tuple (Pl, ..., P.). We will accomplish this by exhibiting a sequence of multisets of probabilities, for which the optimal cost can vary over all but a small bounded amount of the range of (1), depending on the order imposed. It is also important that the corresponding sequence of en- tropies will be unbounded; otherwise, the range of (1), being about logH(p), would itself be bounded by some constant.

Let F(n) be the cost of an optimal tree with n nodes, each having weight 1. Then

F(n)= ~ ( [ l o g i J + l ) , for n > l . i=1

1 log means logarithm base 2

Optimal and Near-Optimal Binary Search Trees 257

By a simple induction,

F(2J)=(j-1)2J+j+2, for j > 0 . (2)

For k > 1, we define multisets of probabilities, Pk, as follows. For 0 < i < k - 1, Pk 1

contains 2 ~ probabilities of magnitude 2~ ~. Hence, the sum of the probabilities in Pk is

k - 1 2 /

The entropy of the probabilities in Pk is

k - 1 2 i log (2 i k) _ k@l i + log k k - 1 2i k ~ ~ - ~ - ~-logk.

i = 0 i = 0

Hence, the log of the entropy of these probabilities is log k - 1 + o(1). We will consider two possible orderings on the probabilities in Pk. First,

1 consider trees T k with weights from Pk, in which all the nodes of weight ~ are at depth i. The distribution implicitly defined by these trees is

1 1 1 1 1 1 1 \ Pk= "" ' 4~' "" ' 2~ . . . . ' ~ - ' " " ' k . . . . . aT--k - . . . . ' 2-k- . . . . . 4k . . . . . )

For example,

(-8-1k 1 1 P4= ' 4k ' 8k

l l l l l l l l l l l l k ) ' 2k' 8k' 4k' 8k' k' 8k' 4k' 8k' 2k' 8k' 4k' "

In this case,

k-1 ( i+1) k + l C~ ~ k - 2 =H(Pk)--IogH(Pk)+~

i = 0

The other order we analyze is non-increasing order, for which the distribution is

k 1 1 1 1 1 1 1 \ ~/k= ' 2 k ' 2 k ' 4 k ' 4 k ' 4 k ' 4 k . . . . . 2k-lk)"

Consider any tree for this distribution. Note that all nodes of the same weight 1 1

21- ~ must lie in a subtree, whose root ri has weight ~ . Therefore, the cost of

this tree is at least as large as the cost calculated by imagining that all nodes 1

of weight ~T~ form an optimal subtree with root r~. Let d(r~) be the depth of 1

the node r/. The cost of an optimal tree with 2 ~ nodes, each of weight 2~ ~ is

258 B. Alien

F(2 i) 2~ k . When it occurs as a subtree with root r i, it contributes an additional

1 amount to the overall cost of the tree, namely its total weight ~ times the depth of r v Therefore,

k~=i [F(2 ~) d(r)\ Copm )>=

k - 1

But ~ d(r~) must be at least F(k ) -k , so that i=0

k-~ F(2') . F ( k ) - k C~ > ~=o ~-U ~ k

For purposes of calculation, we assume that k is a power of 2. Then, using (2), we get

COPT(qk)~-~ ~ k~l ( i _ 1 ) 2 i + i + 2 ( l o g k - 1 ) k + l o g k + 2 2~ ~ 1 i=O k

k - 1 1 k.:,l i + 2 l o g k + 2 2 1 + ; - ~ + l o g k - 1-~ - - - 1

i=0 k

k - 1 - + l o g k - 3 + o ( 1 )

2

= H(qk) -- 3 + O(1).

Therefore, we have proved the following.

Theorem 1. There exists a sequence, Pk, of multisets of probabilities, and two particular orders, Pk and glk, of the elements of Pk, such that

and

while

CopT(pk ) = H (Pk) -- log H (Pk) + 0(1)

CoPT(qk ) > H (qk) -- 3 + o(1),

lim H(Pk)= lim H(qk)= ~ . k~cx3 k~oo

In view of (1), this theorem is essentially the strongest possible of its type.

3. Near-Optimal Trees not Within an Additive Constant of Optimal

The best algorithm known for building an optimal tree is due to Knuth [3]. However, its requirements of O(n 2) time and space are often prohibitive. Moreover, the weights being used might be only estimates of the true values. As a result, several people have proposed classes of trees which have a nearly optimal cost, but which can be constructed more efficiently. These trees are all defined by top-down descriptions. In weight-balanced trees, first suggested by


Knuth [3], the root of each subtree is chosen to make the weights of the resulting left and right subtrees as near to being equal as possible. Mehthorn [5] defined bisection trees as follows. The root of the entire tree is chosen closest to the 50 th percentile of the cumulative weight distribution. Its left and right sons are chosen closest to the 25 th and 75 'h percentiles, respectively, and so on. Min-max trees were introduced by Bayer [1]. In these trees, the root of each subtree is chosen to minimize the maximum of the weights of the resulting left and right subtrees. In each of these three informal definitions, the tree in question is not uniquely specified for certain weight distributions. For weight-balanced and bisection trees, this nondeterminism has generally been allowed in the definitions. However, Bayer makes a particular choice in his definition of min-max trees (see [1] for details). Therefore, we will employ the term "essentially min-max" to describe trees which satisfy the above informal definition, and we will use "min-max" in the more restrictive sense of Bayer. Fredman [2] has given a search technique which leads to linear time algo- rithms for constructing all these classes of near-optimal trees. We will use the notation CwB(p), Cm(p) , and CMM(p ) to refer to the costs of weight-balanced, bisection, and min-max trees, respectively, for a distribution i ~.

Bayer proved that, if H(p)> 1,

H(p) - l o g H(p) + 1 - log e < CoPT(p) ~ CMM(P ) ~ H(p) -]- 1.

This implies that

CMM(P) _--< CopT(P) + log H(p) + log e.

He conjectured that an inequality of the form

CMM(D =< CopT(P) + constant

would be possible. Since Bayer also proved that

and Mehlhorn proved that

Cw.(D ~ H(D + 2,

c B,(p)_-< H(p) + 1,

similar conjectures could be made for weight-balanced and bisection trees. However, we shall demonstrate that all of these conjectures are false.

We begin by introducing a sequence of weighted trees, which are weight- balanced, bisection, and essentially min-max. For k > 1, we define the tree T k in three stages.

(i) For 0 < i < k - 1 , there are 21 nodes at depth i, all of weight 0 (later, we shall modify the trees to make all weights positive);

(ii) for 0 < j < 2 k-l, there are 2 j "groups" of nodes of weight 0 at depth k + j - 1 , the smallest of these groups containing 2 k-1 - j nodes. We specify the nodes at depth k + j - 1 inductively. For j = 0 , there is one group (as defined in (i)), which consists of all nodes at depth k - 1 . For 0 < j < 2 k-l, consider each

260 B. Al len

o/'-o o/\o o/\o / \ /\ / \ 0/\ 0/\ / \ / \ /\ ?\ ,/i /\ /\/\ /\

/1~1\ It I\ It"ll /I II II ,11 I1"11 I\ Ik It o_o o o o_o o o 1 \ I~"11 !~ II ~ o o_o o o o

I--0 O ! O0 0 I-- O0 O0 • O0 O-- O0 O0 --0 O0 O0 --0 O0 O0 O-- O0 O0

~/I II~/ I ? / I I I\~7 / I </111\~ l ~'/I / ll l l\ < /l l l l\ PP PP PPPPPP PPPPPPPP PPPPPPPP PPPPPPPP PPPPPPPPPP PP PPPPPPPPPP PP PPPPPP

1 P = 12~"

Fig. 2. T 3

group at depth k + j - 1 in turn. Suppose it contains m nodes. Then each of these has two nodes as children. Of these 2m nodes, all have weight 0, except the m'th, which has weight 2 -(k+i). The first m - 1 and the last m form two groups at depth k+j;

(iii) each of the nodes of weight 0 at depth k + 2 k - l - 1 has two nodes as children, each of weight 2 -(k+2k 1)

The tree T a is shown in Fig. 2. In Tk, every node at depth d is the root of a subtree of weight 2-d. This can

be proven by induction, proceeding from the leaves to the root of T k. If a node at depth d has a non-zero weight, then it is a leaf and its weight is 2 -d, from the definition of T k. If a node has weight 0, then it has two nodes as children; hence, the result follows by induction. This means that, for k> 1, the weight of T k is 2 -~ 1. Let Pk be the probability distribution implicitly defined by T k. The above observations show that, given the distribution Pk, Tk is the unique choice for a weight-balanced tree, and a possible choice for a bisection and essentially min-max tree.

The entropy of the distribution Pk can be calculated as follows. For 0__<j_<2k-l--1, the probability 2 -tk+j) occurs 2 j times, so the total weight of the nodes at depth k+j is 2 -k. Hence, the weight of the nodes at depth k 4- 2 k- 1 must be

l _ 2 - k 2 k - i - - • --Z"

Since each of these nodes has probability 2 -(k+2k-1), there must be 2 k+2~ l_1 of them. Therefore,


1

J l _ _ j

/ \ 1 1 1 1 6/+ 6/` 6/, 6/+ \ / \ / \ / \

P P P P P P P

! ! 1 '1

/\~" /~<X IX~< /'\ P P P P P P P P

I ! l t l \ J!l\ 17 J\ J ! l \ JX I\ I\ I\ I\ I\ p pO p

O000pOOOOOpOOOpOOOpppOOOpOOOpO

it it It l i lt\\ Ik /t~176176176176176176 O0 O0 O0 0 0 0 0 0 0 O0 O0 00000000 000000000000~000000

O0 P = 12~

Fig. 3. S 3

2 k - ' _ i

H(Pk)= ~ 2J2_ (k+J) ( k+j )+2k+ 2 . . . . 1 2 _ ( k + Z k - b ( k + 2 k _ l ) j=O

2k ' - 1 k + 2 k - a = 2 k - 1 2 - k k + 2 -k ~, j-~

j=0 2 2-k(2k- I _ 1) 2 k- 1

= ~-k+2 k-2 2

_2k-2-F 2k-3 -F k- �88

Hence, H (Pk) < 2k

and log H (~k) < k.

Now we present another sequence of trees, Sk, k ~ 1, such that the probability distribution defined by S k is also Pk, and the cost of S k is substantially less than the cost of T k. For 0 ~ i < 2 k - 1 1, there are 2 i nodes at depth i, each of weight 2 -(k+i). The nodes of weight 2 -(k§ ') are situated appropriately in optimal subtrees with roots at depth 2 k-1. The largest of these, the rightmost, contains 2 k nodes. The nodes of weight 0 are located as if they were inserted after all the other nodes. S 3 is shown in Fig. 3.

The deepest node of weight 2 -(k+2~-b in S k is at depth 2 k - l + k . Hence none of the nodes of weight 2 -(k+zk- b is deeper in S k than in T k. Moreover, all the other nodes with non-zero weight are k levels closer to the root in S k. Therefore,

k c(L)>-c(s,)+~ log H (pk)

> C (Sk) -~ 2

262 B. Allen

s 1-3e ~

E E E E / \ / \ / \ / \ 1-15e 1-15e 1-15e 1-15e 1-15e 1-15e 1-15~ 1-15~

16 16 16 16 16 16 16 16

Fig. 4. T2(e )

This contradicts the conjectures that the costs of weight-balanced, bisection, or essentially min-max trees are within an additive constant of the optimal cost.

To obtain a counterexample to the conjecture for min-max trees requires 1

only a small change to the sequences T k and S k. For 0 < e < = =2 k+2k ' - 1 ' we

define Tk(e ) by modifying T k. Nodes of weight 0 in T k have weight e in Tk(e), while nodes of weight 2 -6 have weight 2-6(1-(2 6-1)e). Hence, Tk ( O) =T k. The tree T2(e ) is shown in Fig. 4. Every node at depth d in Tk(e ) is the root of a subtree of weight 2-6(1- (2 6 - 1)e). This is again proved by induction, starting from the leaves, since

2.2-6(1 - (2 6-1) e)+ e =2 -(6-1)(1 - ( 2 6-1 _ 1) e).

For e>0, these observations show that Tk(e ) is a min-max tree, since it is the only possible essentially min-max tree. Let pk(e) be the distribution defined by Tk(e), and let Sk(e ) be the modified version of S k. H(Pk(e)), C(Tk(s)), and C(Sk(e)) are continuous functions of e. Therefore, for k> 1 and e>0 sufficiently small, where e is a function of k,

log H (Pk(e)) C(Tk(~)) > C(Sk(e)) -t 2

T k was only one possible choice for a bisection tree for the probability distribution Pk" We can eliminate this non-uniqueness by defining Tk'(e ) in the following way, for 0 < 8 < 2 -(k+zk-~). Nodes of weight 0 in T k have weight e in Tk'(e ). Nodes of weight 2 -~ have weight 2 -6 -e , except for the leftmost and rightmost nodes of this category, which are reduced in weight by only e/2. Then Tk'(0)= T k. The tree Tz(e ) is shown in Fig. 5. For e>0, Tk'(e ) is the unique choice for a bisection tree. Let p~,(e) be the distribution defined by Tk'(e ), and let S~,(e) be the modified version of S k. Again, by continuity, for k > l and e>0 sufficiently small,

log H (p~,(e)) C(T;(e)) > C(S'k(e)) Jr 2


E - - - E E

E E - - - E E

' /\ /\

rig. s. T~(~)

E / \ 16 16

4. Conclusion

This paper has answered certain questions in the theory of binary search trees. However, the negative results of Sect. 3 leave open the question of whether there exists a class of trees, constructible in linear time, whose cost is within an additive constant of optimal.

5. References

1. Bayer, P.J.: Improved bounds on the costs of optimal and balanced binary search trees. Project MAC Technical Memorandum 69, M.I.T. Cambridge, MA., 1975

2. Fredman, M.L.: Two applications of a probabilistic search technique: sorting X + Y and building balanced search trees. Proc. 7th Ann. ACM Symp. Theor. Comput. 1975

3. Knuth, D.E.: Optimum binary search trees. Acta Informat. 1, 14-25 (1971) 4. Knuth, D.E.: The art of computer programming, Volume 3: Sorting and searching. Reading,

MA.: Addison-Wesley 1973 5. Mehlhorn, K.: A best possible bound for the weighted path length of binary search trees. SIAM

J. Comput. 6, 235-239 (1977)

Received October 18, 1978/July 11, 1980

on the costs of optimal and near-optimal binary search trees

Documents