minimum cost binary trees speaker: dana moshkovitz

36
Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Upload: frederick-golden

Post on 25-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Speaker: Dana Moshkovitz

Page 2: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Outline• Minimum Cost Binary Trees: The problem’s description

• The Garsia-Wachs algorithm

• Kingston’s proof for the Garsia-Wachs algorithm

Page 3: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees: Description

Given a list of n weights (nonnegative real numbers), p1,...,pn,

let us look at the class of binary trees, which these weights are assigned to their leaves in the same order given.

we would like to find the tree with the minimum weighted external path length, i.e. that tree T, s.t is minimal (hi, namely the level of pi, is the number of arcs in T along the path from the root to the leaf, whose weight is pi)

Let’s observe an example...

i

n

ii ph

1

Page 4: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

10 4 1227 6

Suppose these are the weights we get:

We can construct the following tree:

10

4 122

7

6

Now let us add to each leave its weighted external path length

Minimum Cost Binary Trees

Presenting the problem Example

20 14

6 12 36 18

Total: 106

Page 5: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

10

4 122

7

6

20 14

6 12 36 18

Total: 106

Clearly, this tree was not in fact minimal:

10

4

12

2

7

620

21

8 16

24 12

Total: 101

This is because we can construct a “better” tree.

Such tree is this one:

Minimum Cost Binary Trees

Presenting the problem Example

Page 6: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Solving the problem

Intuition

Our first observation is that the smaller the weight, the lower it should be in the tree.

The above obvious observation is all we needed in the unordered version.

Let us recall Huffman’s simple greedy algorithm. The general idea behind his solution was to construct the tree from the bottom and always choose the two smallest weights to be siblings.

Let us demonstrate...

Page 7: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

10 4 1227 6

Minimum Cost Binary Trees

Solving the problem

Intuition - Back to Huffman’s Algorithm

6

Again we have these weights:We choose the two less expensive fatherless weights and create them a father

12

17

24

41 Total: 100

Page 8: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Solving the problem

Intuition

The question arising here is whether and how can we apply the same idea to the ordered case.

The problem now is that we cannot arbitrarily choose the two cheapest leaves. The two leaves we choose must be neighbors. Unfortunately, this demand makes the naive greedy algorithm incorrect (For example, suppose the weights are 4,3,4,4. The correctness of the result is depended on whether we choose <4,3> or <3,4> to be the minimal pair).

So, what do we do??

Page 9: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Solving the problem

Toward a Solution

The feel is that we have some pretty useful ideas:

• Constructing the tree from the bottom

• Finding siblings in each iteration

However, as we have already seen, they still need some refinement.

Page 10: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

We need an additional observation:

Suppose we have weights p1,...,pn and levels h1,...,hn. There is only one binary tree (at most) determined by this data, and that tree can be computed efficiently.

Let’s demonstrate and explain this:

Minimum Cost Binary Trees

Solving the problem

Toward a Solution

Page 11: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Solving the problem

Toward a Solution

Given feasible weights p1,...,pn and levels h1,...,hn, let us see how can we construct the binary tree they represent.

10 4 1227 6

2 3 4 4 2 2

Algorithmically we can obtain this by preserving an array, which will hold in its i-th entry a pointer to a node in the i-th level

lacking a right son.

Page 12: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Solving the problem

Toward a Solution

We conclude, that it suffices to solve the unordered version under the constraint that the level of each leaf in the resulting tree equals the level of the corresponding leaf of an ordered solution.

What we need to explain now is how do we solve the unordered problem under this constraint.

The question is how do we choose the right pair of neighbors.

Page 13: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Let us define the following:

A pair of leaves pi-1,pi is right minimal (briefly R.M.) if

(i) 1<in

(ii) pi-2+pi-1 pi-1+pi

(iii) pi-1+pi < pj-1+pj for all j>i

Minimum Cost Binary Trees

Solving the problem

Right Minimal Pairs

In other words, two neighbors are right minimal if their sum is minimal among the sums to their right, but this does not hold for the pairs to their left.

Remark: From now on, we shall treat our list of weights as though it consists of additional two

sentinels, namely and +1, placed in the leftmost and rightmost positions in the list respectively. This is done for the ease of the discussion, proving the modification does not alter the solution is rather trivial (see Gilbert and Moore’s result).

Page 14: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Lemma 1: Suppose we have a sequence of at least three weights pa,pa+1,...,pb, s.t. pj-1+ pj< pj+ pj+1, for a<j<b, then ha ha+1... hb-1 in every minimal tree containing these weights.

3 5 1044 6

7 < 8 < 9 < 15 < 16

Minimum Cost Binary Trees

Solving the problem

Some Interesting Facts about R.M. Pairs

Page 15: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Proof. Suppose hj-1<hj for some a<j<b. pj must be a left child. Then the transformation:

is both legal and less expensive: since pj+1 must be in R, |R|pj+1>pj-1.

Therefore the lemma holds.

pj

pj-1

pjpj-1

R

R

Minimum Cost Binary Trees

Solving the problem

Some Interesting Facts about R.M. Pairs

Page 16: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Lemma 2: if pi-1,pi is the rightmost R.M. pair, then hi-1 hi... hn in every minimal tree.

Proof. Directly follows from the choice of the pair and from lemma 1.

Further explanation: notice that when a pair is the rightmost R.M pair, the sequence of pairs to its right constructs a monotone series. (Otherwise the pair that “breaks” the monotony would be the rightmost R.M pair).

Minimum Cost Binary Trees

Solving the problem

Some Interesting Facts about R.M. Pairs

Page 17: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Lemma 3: if pi-1,pi is the rightmost R.M. pair, then hi-1= hi in some minimal tree.

Proof. By lemma 2 it suffices to show that hi-1 hi in some minimal tree. Suppose we have a minimal tree such that hi-1>hi . In this tree pi-1 is a right child, so we can use the following transformation to get another minimal tree:

pi-1

pi

R

pi-1 pi

R

Again, |R| pi-2 pi, so the new tree is necessarily minimal.

Minimum Cost Binary Trees

Solving the problem

Some Interesting Facts about R.M. Pairs

Page 18: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Solving the problem

... And finally...

After we clarified the most important notion for the purpose of this lecture, namely R.M. pairs, and proved some interesting properties related to such, let us finally observe the desired algorithm.

Afterwards we shall prove the correctness of our solution.

Page 19: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Solving the problem

The Algorithm

Execute the following two steps n-1 times:

(1) Locate the rightmost R.M pair of entries.

(2) Find the first entry to its right, which is greater than/equals it, and move the pair to the left of this entry.

Page 20: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Solving the problem

Simulating the algorithm

We start with the same inputWe locate the rightmost R.M pair, pair them and move them to their right place.

10 4 1227 6

6 18

13

23

41

12 6

18

10 427

6

13

23

10 4 1227 6

2 3 4 4 2 2

The heights yield the tree we have already seen:

Page 21: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Lemma 4: Let pi+k+1 be the first node to the right of the rightmost R.M pair pi-1,pi, s.t pi+k+1> pi-1+pi. In some minimal tree T for which hi-1= hi, either

(a) hi+k= hi-1,or

(b) hi+k= hi, and pi+k is a right child.

Proof. Begin with T the minimal tree of lemma3. By lemma 2 we merely need to show that hi+k hi-1 in T. Let us suppose hi+k< hi-1. Let pm (mi+k) be the first node to the right of pi, s.t. hm< hi-1. pm<pi-1+pi.

Minimum Cost Binary Trees

Proving the Solution

Page 22: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

There are two possibilities:

(a) pi-1,pi are siblings in T

(b) pi,pi+1 are siblings in T

Let p,q denote the two siblings. We can transform T as follows and improve the cost by decreasing p+q-pm pi-1+pi-pm>0. This contradicts the hypothesis T was minimal. Hence hi+k hi-1.

p q

pm

. . . p q . . . pm

Minimum Cost Binary Trees

Proving the Solution

Page 23: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

It remains to prove, that when hi+k= hi, pi+k can be made a right child. If pi+k is already a right child, we are done. Otherwise, if k>0, we can transform T:

pi-1 pi

. . .

pi+k pi+k+1

pi-1 pi

. . .

pi+k

pi+k+1

Minimum Cost Binary Trees

Proving the Solution

Page 24: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

pi-1 pi pi+k+1

pi-1 pi

pi+k+1

While if k=0, we can use this transformation:

In both cases, our claim holds.

Minimum Cost Binary Trees

Proving the Solution

Page 25: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Proving the solution

Lemma 5: For every rearrangement done in an iteration of the algorithm, the cost for the new series cannot exceed the cost for the original series of weights.

Proof. Let k be the number of places we move the R.M. pair in a specific iteration. We will exhibit a tree T’ for the new arrangement, whose leaves has the same levels as in T, the minimum tree of lemma 4. This - of-course - will prove the lemma.

If k=0, pi-1 and pi are siblings in T, so we may take T’=T.

Page 26: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Proving the solution

Otherwise k>0. First we will transform T, so pi-1 and pi (the current rightmost R.M. pair) are siblings (If they they are not so already):

pi-1 pi pi+1 pi-1 pipi+1

Lemma 4 states, that they have the

same level

Lemma 4 states pi+1 is a right child

Page 27: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Since k>0, the first weight greater than/equal to pi-+pi, namely pi+k+1, is still to the right of pi. By lemma 4, there are merely two possibilities: hi+k=hi or hi+k=hi-1. In each case all we need to do is “slid” the new node to the right until it passes pi+k.

Minimum Cost Binary Trees

Proving the solution

pi-1 pi

pi+k. . .

pi-1 pi

pi+k. . .

pi-1 pi pi+k

. . .

pi-1 pi

. . .

pi+k

Page 28: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Proving the solution

Lemma 6: For every rearrangement done in an iteration of the algorithm, the cost for the new series is at least the cost for the original series of weights, and if equality holds, then the two solutions have corresponding levels.

Proof. Now - given a minimum tree T’ for the new arrangement - we need to construct a minimum tree T for the original series (while preserving the levels). Again we denote the number of places we move the pair by k.

If k=0, T=T’.

Page 29: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Proving the solution

Otherwise k>0. We know that pi+k<pi-1+pipi+k+1, so we can use lemma 1 in T’ with pi+1,...,pi+k,px, pi+k+1 (px is the father of the R.M. pair) to obtain: hi+1... hi+khx in T’. Hence, either

Tx

pi-1 pi

pi+1 . . . Tx

pi-1 pi

pi+1

. . .

. . .

or

In the first case the treatment is simple: we need to move Tx to the left.

Page 30: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Proving the solution

The treatment of the second case consists of two phases:

(a) Slid Tx to the left across all the leaves in its level.

(b) Rotate all the nodes two places to the right.

Tx

pi-1 pi

pi+1

. . .

. . .Tx

pi-1 pi

pi+1

. . .

. . .(a)

Page 31: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Tx

pi-1 pi

pi+1

Minimum Cost Binary Trees

Proving the solution

How do we perform the rotation?

A simulation

pi

pi-1

pi+1pipi-1

pi+1

Page 32: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Proving the solution

For every level the pair drops downward (possibly 0), some other pair moves up one level. But we have taken a R.M. pair (for which all the pairs with greater indices weight more than it does). This implies that w(T’)=w(Ta)w(T), so the weight of T’ is at least the weight of a minimum cost tree for the original series of weights.

If equality holds, the level of the pair is not changed by the rotation, so T is a minimum cost tree for the original series, which preserves the levels of T’.

Page 33: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Proving the solution

Those two final lemmas allow us to finally state the following:

The algorithm we presented is correct.

Page 34: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

Solving the problem

Another way to view this algorithm

We start with the same inputWe locate the rightmost R.M pair, pair them and move them to their right place.

10 4 1227 60 0 000 0

610 127 60 1 010 0

1810 7 60 1 110 1

18 231 3 132 1

13 180 2 121 1

10

412 4 243 2

10 4 1227 6

2 3 4 4 2 2

Which yield the tree we have already seen:

Page 35: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

References

This overview was based on:

[1] J. H. Kingston. A new proof for the Garcia-Wachs algorithm. J. of Algorithms, 9:129-136,1988

The Garsia-Wachs algorithm:

[2] A. M. Garsia and M. L. Wachs, A new algorithm for minimum cost binary trees. Sicomp, 6(4), 622-642, 1977

Huffman’s algorithm:

[3] D. Huffman, A method for the construction of minimum-redundancy codes, Proc. IRL, 1098-1101, 1952

Page 36: Minimum Cost Binary Trees Speaker: Dana Moshkovitz

Minimum Cost Binary Trees

References

You might be interested in previous results concerning this problem:

The original O(n3) algorithm:

[4] E. N. Gilbert and E.F. Moore, Variable length binary encodings, Bell Systems Tech, 38, 933-968, 1977.

An O(n2) algorithm:

[5] D. E. Knuth, Optimum binary search trees, Acta Inform. 1, 14-25, 1971.

The Hu-Tucker algorithm (O(nlogn)):

[6] T.C. Hu and C. Tucker, Optimum computer search trees, SIAM J. Appl. Math., 514-532, 1971.

[7] T.C. Hu, A new proof of the T-C algorithm, SIAM J. Appl. Math. 25, 83-94, 1973.