trees and optimization

31 Chapter 1: Trees and Distance Section 1.1: Trees and Optimization 32

1.1. TREES AND OPTIMIZATION

Theword tree suggests branching out and never completing a cycle.

Many computational applications use trees to organize data or decisions.

Cleverly managed trees provide data structures that enable many algo-

rithms to run quickly. In this section, we explore optimization problems

concerning trees. First we need definitions and basic properties.

1.1.1. DEFINITION. A forest is a graph with no cycle (an acyclicgraph). A tree is a connected forest. A leaf (or pendant vertex)is a vertex of degree 1. A spanning subgraph is a subgraph con-

taining all vertices. A spanning tree is a spanning subgraph that

is a tree.

1.1.2. PROPOSITION. A tree with at least two vertices has at least two

leaves. If v is a leaf of a tree G , then G v is a tree.Proof: The first statement holds because the endpoints of a maximal non-

trivial path in an acyclic graph have degree 1. Next, since a cut-vertex v

of a connected graph G has a neighbor in each component of G v, a leafcannot be a cut-vertex. Thus when v is a leaf of a tree G , the graph G vis connected and acyclic.

PROPERTIES OF TREES

Trees have many equivalent definitions. Verifying any one shows

that a graph is a tree, and then the others are available for use.

1.1.3. THEOREM. For a graph G on n vertices, the following properties

are equivalent (and define the class of trees on n vertices).A) G is connected and has no cycles.B) G is connected and has n 1 edges.

C) G has n 1 edges and no cycles.D) G has exactly one u , v-path whenever u , v V(G).

Proof: We prove the equivalence of A, B, and C by showing that any two

of {connected, acyclic, n 1 edges} implies the third.A B ,C. Use induction on n; for n = 1, an acyclic graph has no

edge. For n > 1, Proposition 1.1.2 provides a leaf x and implies that G xis a tree with n1 vertices. By the induction hypothesis, G x has n 2edges. Since d(x) = 1, it follows that G has n 1 edges.

B A ,C. Delete edges of cycles in G until the resulting graph G isacyclic. Since G is connected and no edge of a cycle is a cut-edge (Lemma0.44), G is connected. Since G is acyclic and A B ,C, G has n 1edges. Hence no edges were deleted, and G itself is acyclic.

C A ,B. Let the components of G be G1 , . . . , Gk , with |V(Gi)| = nifor all i. Since G is acyclic, each Gi is connected and acyclic, so |E(Gi)| =ni 1. Thus |E(G)| = ki=1 |E(Gi)| = n k. Since |E(G)| = n 1, we havek = 1, and G is connected.

A D. Since G is connected, for u , v V(G) there is a u , v-path.To prohibit a second path, we use extremality. Over all pairs of distinct

paths with the same endpoints, let {P, Q} be a pair with minimum totallength. By this choice, P and Q have no common vertices other than their

endpoints. Hence P Q is a cycle in G , which contradicts condition A.D A. Existence of the paths implies that G is connected. Unique-

ness of the paths prohibits cycles.

To characterize trees among multigraphs, one must add to Theorem

1.1.3(D) a prohibition of loops. The rest remains the same. The definingproperties have many applications.

1.1.4. PROPOSITION. If T is a tree with k edges and G is a graph with

(G) k , then T is a subgraph of G. Also, this inequality is sharp.Proof: Since Kk has minimum degree k 1 and contains no tree with kedges, no value of (G) less than k can force the appearance of T .

The sufficiency of (G) k follows by induction on k. When k = 0,the graph G has a vertex and the claim holds. When k > 0, let T be atree on k vertices obtained from T by deleting a leaf v with neighbor u.

Since (G) k > k 1, the induction hypothesis applies to yield T as asubgraph of G. Let x be the vertex in this copy of T that represents u.Because T has only k 1 vertices other than u, some y NG(x) does notappear in this copy of T . Adding the edge xy to represent uv enlargesthis copy of T in G to a copy of T in G.


Gx

y

T

Since edges of cycles are not cut-edges, we can delete edges from a

connected graph to obtain a spanning tree. We next use two properties of

trees to prove a result about spanning trees: 1) Since a tree has no cycles,every edge is a cut-edge. 2) Since a tree has a unique connecting path foreach vertex pair, adding any edge creates exactly one cycle. We use sub-

traction and addition to indicate deletion and inclusion of single edges.

1.1.5. PROPOSITION. If T and T are two spanning trees of a con-nected graph G and e E(T) E(T ), then there exists e E(T ) -E(T) such that T e + e and T + e e are both spanning trees ofG.

Proof: The figure below illustrates T and T sharing two edges, withE(T) bold and E(T ) solid. The specified edge e is a cut-edge of T ; let Uand U be the vertex sets of the components of T e. Adding e to T cre-ates a unique cycle C. The path C e contained in T has its endpointsin U and U , so it has an edge e with endpoints in U and U.

Since e is the only edge of T joining U and U , we have e E(T )E(T). Since e joins the components of Te, the graph Te+e is a span-ning tree. Since e is on the path fromU to U in T , it lies on the uniquecycle formed by adding e to T . Hence T + e e also is a spanning tree.

U U

e E(T )

e E(T)

In any graph, the maximal subgraphs not having cut-vertices are

useful subgraphs that form a tree-like structure.

1.1.6. DEFINITION. A block of a graph G is a maximal connected

graph H such that H is a subgraph of G and has no cut-vertex. If

G itself is connected and has no cut-vertex, then G is a block.

1.1.7. Example. Blocks. If H is a block of G , then H has no cut-vertex,

but H may contain cut-vertices of G. For example, the graph below has

five blocks; three copies of K2 , one of K3 , and one subgraph that is neither

a cycle nor complete.

1.1.8. REMARK. Properties of blocks. An edge of a cycle cannot itself

be a block, because it belongs to a larger subgraph having no cut-vertex.

Hence an edge is a block of G if and only if it is a cut-edge of G (the blocksof a tree are its edges). If a block has more than two vertices, then it is 2-connected. The blocks of a graph are its isolated vertices, its cut-edges,

and its maximal 2-connected subgraphs.

1.1.9. PROPOSITION. Two blocks in a graph share at most one vertex.

Proof: Given blocks B1 and B2 sharing two vertices, choose x V(B1 B2). Since each block has no cut-vertex, deleting x leaves a path withinBi from every vertex of Bi x to each vertex of (B1 B2) x. Hence B1 B2 x is connected. Now B1B2 is a subgraph with no cut-vertex, whichcontradicts the maximality of B1 and B2 .

Thus the blocks of a graph G form a decomposition of G. When two

blocks of G share a vertex, it must be a cut-vertex of G. The interaction

between blocks and cut-vertices is described by an auxiliary graph.

1.1.10. DEFINITION. The block-cutpoint graph of a graph G is a bi-

partite graph H in which one partite set consists of the cut-vertices

of G , and the other has a vertex bi for each block Bi of G. We include

vbi as an edge of H if and only if v Bi.

bc

d

a e

f

g

h

j

x

i

b1a

b5

b3e

b2

xb4

G H

The block-cutpoint graph of a connected graph G is a tree (Exercise13) whose leaves are blocks of G. A graph G with connectivity 1 has atleast two leaf blocks that each contain exactly one cut-vertex of G.


OPTIMAL SPANNING TREES

In a connected graph with many spanning trees (see Chapter 6 forenumeration), which is best? For example, the Minimum ConnectorProblem seeks a connected subgraph with minimum total weight in a

graph with weighted edges. For nonnegative weights, the solution is a

spanning tree. Naively, we iteratively include an edge of smallest weight

that creates no cycle. Locally optimal heuristics are often called greedy

algorithms. This is one of the rare instances where a greedy algorithm

finds an optimal solution.

1.1.11. ALGORITHM. (Kruskals Algorithm;Minimum-Weight Span-ning Trees)Input: A weighted connected graph.

Idea: Maintain an acyclic spanning subgraph H , enlarging it by edges

with low weight to form a spanning tree. Consider edges in nondecreas-

ing order of weight, breaking ties arbitrarily.

Initialization: Set E(H) = .Iteration: If the next cheapest edge joins two components of H , then

include it; otherwise, discard it. Terminate when H is connected.

No added edge creates a cycle, so each new edge connects two compo-

nents. We begin with n components and reduce this number by one with

each step. As long as more than one component remains, there are edges

joining components (since the input graph is connected), and we have notyet considered them (since every edge considered is in H or completes acycle with H). Thus n 1 steps are performed and produce a subgraphthat is a tree. We will prove that this tree has minimum weight.

1.1.12. Example. Kruskals algorithmuses only the order of theweights,

not their magnitude. In the example below, edges are labeled (and consid-ered) in increasing order of weight. Edges of equal weight may be exam-ined in any order; the resulting trees have the same cost. Here, the four

cheapest edges are selected, but then we cannot take the fifth or sixth.

4

3

12

11

1

9

2

6

510

7

8

1.1.13. THEOREM. (Kruskal [1956]). In a connectedweighted graph G ,Kruskals Algorithm constructs a minimum-weight spanning tree.

Proof: Let T be a tree produced by Kruskals Algorithm, and let T bea minimum spanning tree. If T 6= T , let e be the first edge chosen forT that is not in T. Adding e to T creates one cycle, which contains anedge e / E(T) since T has no cycle. Now T + e e is a spanning tree.

Since T contains e and all the edges of T chosen before e, both eand e are available when the algorithm chooses e, and hence w(e) w(e).Thus T+ e e is a spanning tree with weight at most T that contains alonger initial segment of T . Since T is finite, iterating this switch leads

to a minimum-weight spanning tree containing T . Phrased extremally,

we have proved that a minimum-weight spanning tree agreeing with T

for the longest initial segment must be T itself.

To implement Kruskals Algorithm, first sort them edges by weight.

Maintain for each vertex the label of the current component containing

it. Accept the next cheapest edge if its endpoints have different labels.

Merge the two components involving an accepted edge by assigning one of

the labels to every vertex having the other label. By always merging the

smaller component into the larger, each label will change at most log2 n

times, and the total number of changes is at most n log2 n.

In this implementation, the time complexity is governed by the sort-

ing of edge weights. Analysis of Kruskals Algorithm often assumes pre-

sorted weights; otherwise, other algorithms may do better. Prims Al-

gorithm (Exercise 25) grows a spanning tree from a single vertex by it-eratively adding the cheapest edge that incorporates a new vertex. It and

Kruskals Algorithm are comparable when weights are pre-sorted.

Both Boruvka [1926] and Jarnick [1930] posed and solved the min-imum spanning tree problem. Modern improvements use clever data

structures to merge components quickly. Fast versions appear in Tarjan[1984] for when the edges are pre-sorted and in GabowGalilSpencerTarjan [1986] for when they are not. Thorough discussion and furtherreferences appear in AhujaMagnantiOrlin [1993, Chapter 13]. Morerecent developments appear in KargerKleinTarjan [1995].

Next we seek a spanning tree with the most leaves. When our graph

models a communication network, we seek the smallest set of vertices to

protect so that all surviving vertices after an attack can communicate.

The non-leaf vertices in a spanning tree forma set S such that G[S] is con-nected and every vertex outside S is adjacent to some vertex of S. Such a


set S is a connected dominating set. A smallest connected dominating

set is always the set of non-leaves in a spanning tree with the most leaves;

this equivalence was perhaps first noted in HedetniemiLaskar [1984].The problem of maximizing the number of leaves in a spanning tree

of G is NP-complete (GareyJohnson [1979, p206]), so we may be contentto find a spanning tree with many leaves. NP-completeness increases the

value of constructive proofs for general bounds. A constructive proof that

all graphs in some class have spanning treeswith at least t leaves becomes

an algorithm to produce such a tree for graphs in this class.

DingJohnsonSeymour [2001] solved this extremal problem in termsof the numbers of vertices and edges. If n 6= t+2 and G has at least n+( t

2)

edges, then G has a spanning tree with more than t leaves. The result is

sharp: some n-vertex graph with n+ ( t2) 1 edges has no spanning tree

with more than t leaves (Exercise 31).Earlier, the extremal problemwas studied in terms of the number of

vertices and the minimum degree. Let l(n , k) be the largest t such thatevery connected n-vertex graph with minimum degree at least k has a

spanning tree with at least t leaves. When G = Cn, spanning trees have

only 2 leaves, so l(n , 2)= 2. For k 2, the cycle generalizes to a k-regulargraph that has a linear number of nonleaves in each spanning tree.

1.1.14. Example. l(n , k) k2k+1

n + 2. Let s = nk+1

. We constructa graph Gn,k , with n vertices and minimum degree k , whose spanning

trees all have t least 3s 2 nonleaf vertices. Begin with complete graphsR1 , . . . , Rs , each of order at least k + 1, together having n vertices.

Choose xi , yi Ri ; let W = {xi}si=1 {yi}si=1 . Delete the edges {xiyi}si=1 .Let Z = {xiy(i+1) (mod s)}si=1 , and add the edges in Z to complete Gn,k . Notethat (Gn,k) = k.

Consider a spanning tree T . Any two edges in Z form an edge cut, so

T lacks at most one edge of Z. If xj yj+1 / E(T), then T contains an xi , yi-path in Ri , for each i. Now the nonleaves contain some vertex of Ri Wfor each i, plus all of W {xj , yj+1}. If Z E(T), then T lacks an xi , yi-path in Ri for exactly one value of i, say j . This forces at least 3(s 1)nonleaves in V Rj , and k 2 forces an additional nonleaf at xj or yj .

yi

Ri

xi

For k = 3, there are several proofs that the construction in Exam-

ple 1.1.14 is optimal. For k = 4, the optimal bound l(n , 4) 25n + 8

5

was proved in GriggsWu [1990] and in KleitmanWest [1991] (two smallgraphs have no tree with 2

5n+ 2 leaves). GriggsWu [1990] also proved

that l(n , 5) = 36n+2. The proofs are algorithmic, constructing a tree with

at least this many leaves. We present a proof for k = 3.

1.1.15. THEOREM. (LinialSturtevant [unpub.], GriggsWu [1990],KleitmanWest [1991]) Every connected N-vertex graph G with(G) 3 has a spanning tree with at least N/4 + 2 leaves.

Proof: We provide an algorithm to grow such a tree. Let T denote the

current tree, with n vertices and l leaves. If x is a leaf of T , then the

external degree of x, denoted d(x), is |NG(x) V(T)|. The operation of ex-pansion at x consists of adding to T the d(x) edges from x to NG(x)V(T).We grow T by operations, where each operation consists of some number

of expansions. Note that expansion preserves the property that all edges

from T to G V(T) are incident to leaves of T .A leaf x of T with d(x) = 0 is dead; no expansion is possible at a

dead leaf, and it remains a leaf in the final tree. Let m be the number

of dead leaves in T . An expansion that makes y a dead leaf kills y. We

call an operation admissible if its effect on T satisfies the augmentation

inequality 3l + m n, where l, m, and n denote the changesin the numbers of leaves, dead leaves, and vertices in T , respectively.

We grow a spanning tree by admissible operations. If G is not 3-

regular, we begin with the edges at a vertex of maximum degree. If G

is 3-regular and every edge belongs to a triangle, then G = K4 , and the

claim holds. Otherwise, G is 3-regular and has an edge in no triangle; in

this case we begin with such an edge and the four edges incident to it.

If T is grown to a spanning tree with L leaves by admissible opera-

tions, then all leaves eventually die. The final operation will kill at least

two leaves not counted by the augmentation inequality, so the total of

m from the augmentation inequalities for the operationswill be at most

L 2. We begin with 4 leaves and 6 vertices if G is 3-regular; otherwise


with r leaves and r + 1 vertices for some r > 3. Summing the augmenta-

tion inequalities over all operations yields 3(L 4)+ (L 2) N 6 if Gis 3-regular and 3(L r)+ (L 2) N r 1 otherwise. These simplifyto 4L N + 8 and 4L N + 2r + 1 N + 9, respectively, which yieldL N/4 + 2.

It remains to present admissible operations that can be applied until

T absorbs all vertices and to show that the last operation kills two extra

leaves. We use the three operations shown below, applying O2 only when

O1 is not available.

O1

xO2

xO3

x

y

O1: If d(x) 2 for some current leaf x, then expanding at x yieldsl = d(x) 1, n = d(x), and m 0. The augmentation inequalityreduces to 2d(x) 3, which is satisfied when d(x) 2.

O2: If d(x) 1 for every current leaf x and some vertex outside Thas at least two neighbors in T , then expanding at one of them yields

l = 0 and m 1 = n, and the augmentation inequality holds.O3: If y is the only neighbor of x outside T and y has r neighbors not

in T , where r 2, then expanding at x and then y yields l = r1, n =r+1 and m 0. The augmentation inequality reduces to 3(r1) r+1,which holds when r 2.

Because k = 3, every vertex outside T has at least two neighbors in T

or at least two neighbors outside T . Hence at least one of these operations

is available until T becomes a spanning tree.

Now consider the final operation. If it is O1 or O3, then the two new

leaves are dead, which contributes an extra 2 to m not counted by the

augmentation inequality. If the final operation is O2, then the new leaf is

dead and has at least two neighbors that are current leaves of T and also

become dead. Therefore, the contribution to m is at least 3 instead of

at least 1. In each case, the contributions to m from the augmentation

inequalities sum to at most L 2.

The graph Gn,k of Example 1.1.14 contains many copies of Kk+1

, the

graph obtained from Kk+1 by deleting one edge. Forbidding this induced

subgraph forces more of the vertices to be leaves; GriggsKleitman

Shastri [1989] proved that every K4 -free connected n-vertex graph with

minimum degree at least 3 has a spanning tree with at least (n + 4)/3leaves. The proof is difficult. Exercise 34 considers an easier variation.

The construction of Example 1.1.14 is optimal for k 5. It was longthought to be essentially optimal for all k. However, Alon [1990] showedprobabilistically that for large n some k-regular graph has no dominat-

ing set of size less than (1 + o(1))1+ln(k+1)k+1

n. Since connected dominating

sets are dominating sets, the number of leaves that can be guaranteed

therefore cannot grow faster than about (1 ln(k+1)k+1

)n (noted by Mubayi).Kleitman andWest [1991] showed that Alons probabilistic construc-

tion is close to optimal. The connection between the two results appar-

ently was not noticed until seven years after they were proved. For large

k , one cannot avoid having at least about (n ln k)/k nonleaves, and treesthis good exist. The proof below is simpler than that in KleitmanWest[1991] and makes the result asymptotically sharp. Here 1 + replaces aconstant greater than 2 .5 in the original result.

1.1.16.* THEOREM. (CaroWestYuster [2000])Given > 0 and k largein terms of , every connected graph with order N and minimum de-

gree k has a spanning tree with more than [1 (1+) ln kk

]N leaves.Proof: We grow such a tree. Begin with a star at a vertex of degree k

and iteratively expand the current tree T , which has n vertices, l leaves,

and external degree d(x) at each leaf x. Expansion at a leaf adds all out-side neighbors, so only leaves have outside neighbors. Each operation

combines one or more expansions to satisfy the augmentation inequality

rl+M (r 1)n, where r is a parameter to be chosen in terms of k ,and M is a measure of total deadness of leaves.

A leaf is more dead as it has fewer external neighbors. We will

choose 0 , . . . , r with 0 r = 0 and say that a leaf x withd(x) = i has deadness i (let i = 0 for i > r). Set M = r1i=0 imi ,where T has mi leaves with external degree i.

For the final tree, M = 0L, and initially M 0. When we grow atree from the initial star, summing the augmentation inequalities yields

r(L k)+ 0L (r 1)(N k 1). Thus L (r1)N+k+1rr+0 . When r k ,we can discard k+ 1 r from the numerator. Dividing top and bottom byr and applying 1

1+0/r> 1 0

rthen yields

L > (1 1r)(1 0

r)N > (1 1+0

r)N ,

so we will choose r and 0 so that1+0ri(k 2r i). Admissibility of Oi follows if cii(k 2r i) r i + i.

We specify r and nonincreasing c1 , . . . , cr to satisfy this inequality

for all i. Set ci =bifor 1 i r (we will choose b in terms of ). We

then want b(k 2r) r i + (1 + b)i. Since i i+1 and i < i + 1,it suffices to establish the inequality when i = 1, where it simplifies to

1 bk+1(2b+1)rb+1 . Our choice of ci yields 0 = bri=1 1i b[ln r+ 12r + .577](see Knuth [1973, p73-78] for the bound on the harmonic numberri=1 1i).

Since 1 = 0 b, when r 2 we have1 b[ln r + 12r .423] < b ln r < b ln k.

Therefore, b ln k bk(1+2b)r1+b

suffices, so we set r = b1+2b

(k (1 + b) ln k) .Now the augmentation inequality holds for Oi.

With these choices, we have proved that l(n , k) (1 1+0r

)n. Since1+0r 2/ isneeded. CaroWestYuster [2000] also gave a probabilistic algorithm forconnected domination that does as well as Theorem 1.1.16. For each fixed

k with k 6, the exact value of l(n , k) in terms of n remains unknown.

OPTIMAL SEARCH TREES AND CODING

Trees are used in computer science to model hierarchical structures.

1.1.17. DEFINITION. A rooted graph is a graph with one vertex dis-

tinguished as a root. In a rooted tree, the neighbor of a vertex on the

path from it to the root is its parent , and the other neighbors are its

children. An ordered tree is a rooted tree in which the children of

each vertex are given a fixed (left-to-right) order. A binary tree isan ordered tree in which every vertex has zero or two children.

The root in a rooted tree has no parent. Ordered trees are also called

rooted plane trees or planted trees, since the ordering of children

yields a natural drawing in the plane. In a binary tree, the left sub-

tree and right subtree are the subgraphs obtained by deleting the root

r; they are rooted at the left and right children of r, respectively. Some

applications of binary trees allow vertices to have one child, still desig-

nated as left or right. In most discussions of k-ary trees, each vertex has

0 or k children. Vertices in rooted trees are often called nodes.

1.1.18. Example. Below are the five binary trees with four leaves.


Binary trees support data storage for efficient access. If we associate

each item with a leaf, then we can access them by a search from the root

that always says which subtree at the current node contains the desired

leaf. Given access probabilities among n items, we want to associate the

items with the n leaves of a binary tree to minimize the expected search

length. The length of a search is the distance from the root to the leaf.

Alternatively, with large computer files and limited storage, wewant

binary codes for characters to minimize total length. The relative char-

acter frequencies define probabilities. Treating the items as messages

with probabilities p1 , . . . , pn, wewant to assign binary codewords to min-

imize the expected message length. The problems of minimizing expected

search length and expected message length are equivalent.

The length of codewords may vary, so a way to recognize the end of

a codeword is needed. If no codeword is a prefix of another, then the cur-

rent word ends when the bits since the end of the previous word form a

codeword. This prefix-free condition allows the codewords to correspond

to the leaves of a binary tree by associating left with 0 and right with 1.

The expected length of a message is pi li , where the ith word has proba-bility pi and code-length li. Constructing the optimal code is surprisingly

easy (n = 1 can also be used as the basis).

1.1.19. ALGORITHM. (Huffmans Algorithm [1952]; Prefix-free Cod-ing).Input: Weights (frequencies or probabilities) p1 , . . . , pn.Output: Prefix-free code (equivalently, a binary tree).Idea: Infrequent messages should have longer codes; put infrequent mes-

sages deeper by combining them into parent nodes.

Initial case: If n = 2 the optimal length is one, and 0,1 are the codes

assigned to the two messages (the tree consists of a root and two leaves).Recursion: If n > 2, replace the two least likely items p and p with asingle item q having weight p+ p. Solve the smaller problem with n 1items. Give children with weights p and p to the leaf for q. That is, re-place the codeword computed for qwith its extensions by 1 and 0, assigned

to the items that were replaced.

1.1.20. Example. Huffman coding. Suppose the relative frequencies of 8

messages are 5,1,1,7,8,2,3,6. The algorithm iteratively combines light-

est items to form the tree on the left below, working from the bottom up.

The tree is redrawn on the right with leaves labeled by frequencies and

codewords. Placed in the original order, the codewords are 100, 00000,

00001, 01, 11, 0001, 001, and 101. For the expected length, we compute

pi li = 90/33 < 3; the expected length of a code using the eight words oflength 3 would be 3.

5 1 1 7 8 2 3 6

2

4

714

11

19

33

1:00000 1:00001

2:0001

3:001

7:01

8:11

5:1006:101

1.1.21. THEOREM. For distribution p1 , . . . , pn, Huffmans Algorithm

produces the prefix-free code with minimum expected length.

Proof: We use induction on n. For n = 2, the algorithm produces the

only binary tree. Consider n > 2. Given a tree with n leaves, greedily

assigning messages to leaves with depths in reverse order to probabilities

minimizes the expected length. Thus every optimal code has two least

likely messages at leaves of greatest depth. Since every leaf at maximum

depth has another leaf as its sibling, we may thus assume that the least

likely messages appear as siblings at greatest depth; permuting items at

a given depth does not change the expected length.

Let T be an optimal tree, having the items with least probabilities

pn and pn1 as sibling leaves of greatest depth. Let T be the tree ob-tained from T by deleting these leaves. The tree T yields a code forq1 , . . . , qn1 , where qn1 = pn1 + pn and otherwise qi = pi. Let k be thedepth of the leaf for qn1 in T . The cost for T is the cost for T plus qn1 ,since we lose kqn1 and gain (k + 1)(pn1 + pn) in changing T to T .

This holds no matter which sibling pair at greatest depth we com-

bine to form T , so we optimize T by optimizing T for q1 , . . . , qn. Bythe induction hypothesis, T is optimized by applying Huffmans algo-rithm to {qi}. Since the replacement of {pn1 , pn} by qn1 is the first stepof Huffmans algorithm for {pi}, we conclude that Huffmans algorithmgenerates the optimal tree for p1 , . . . , pn.

pn pn1

pjT

k + 1

k qn1 qj

T


Huffmans algorithm computes an optimal code, but how does it com-

pare to a balanced tree with every codeword having length lg n or lg n ? If n = 2k and the words are equally likely, then the balancedtree with all leaves at depth k is optimal, as produced by the algorithm.

With pi = 1/n, the resulting expected length is kpi = pi lg pi ; thelatter quantity is called the entropy of the discrete probability distribu-

tion {pi}. The formula is no coincidence; entropy is a lower bound on theexpected length. This holds for all codes with binary codewords, not just

prefix-free codes.

1.1.22. THEOREM. (Shannon) For every probability distribution {pi}on nmessages and every binary code for these messages, the expected

length of codewords is at least pi lg pi.Proof: We use induction on n. For n = 1 = p1 , the entropy is zero, as

is the expected length for the optimal code, since there is no need to use

any digits. For n > 1, let W be the set of words in an optimal code, with

W0 and W1 being the subsets starting with 0 and 1, respectively. If all

words start with the same bit, then deleting the first bit of each reduces

the expected length, and the code is not optimal.

Hence W0 and W1 are codes for smaller sets. Let qj be the sum of

the probabilities for the messages in Wj ; normalizing the given probabil-

ities by qj gives the probability distributions for codeWj . Since the words

within Wj all start with the same bit, the expected length is at least 1

more than the optimal expected length for the normalized distribution

over its words.

Applying the induction hypothesis to both W0 and W1 , we find that

the expected length for W is at least

q01

iW0

pi

q0lg

pi

q0

+ q1

1

jW1

pj

q1lg

pj

q1

= 1 iW0

pi(lg pi lg q0) jW1

pj(lg pj lg q1)

= 1 + q0 lg q0 + q1 lg q1 iW

pi lg pi

.

It suffices to prove that 1+q0 lg q0+q1 lg q1 0 when q0+q1 = 1. Letf(x) = x lg x. The function f is convex for x > 0 (since f is positive), so1 + f(x)+ f(1 x) 1 + 2f(.5) = 0.

Huffmans algorithm comes close to Shannons bound. If each pi is a

power of 1/2, then the Huffman code achieves the entropy bound (Exercise

36). When the probabilities vary greatly, Huffman coding is much moreefficient than codes words of equal length. The length of a computer file

coded for compactness may be only half its length under ASCII coding,

which assigns 5 digits per character. Coding individual files accentuates

this; the distribution of characters in a program source file may be much

different from that of a document or another program.

However, we may want the codewords in the same order as the mes-

sage words. This makes searching easy, because we can store at each in-

ternal vertex a word between the largest messageword at a leaf of the left

subtree and the smallest message word at a leaf of the right subtree. The

expected length will be longer than in Huffman coding, but these alpha-

betic prefix-free codes can be easier to use while almost as efficient.

Since the items must appear at leaves in left-to-right order, the

leaves in any subtree get one of the (n2) sets of consecutive messages. The

final merge to complete an optimal alphabetic tree must combine an opti-

mal tree for the first k leaves and an optimal tree for the last nk leaves,for some k. To choose the best k , we solve the subproblems for all consec-

utive segments. This algorithmic technique of solving all subproblems is

called dynamic programming.

1.1.23. ALGORITHM. (Optimal Alphabetic Trees).Input: Frequencies p1 , . . . , pn and fixed left-to-right ordering of leaves.

Initialization: Set c(S) = 0 for each singleton leaf set S.Iteration: For i from 2 through n, compute a cost c(S) for each segment Sof i consecutive nodes. With Sk being the first k of these, and S

k= SSk ,

the cost is c(S) = ( jS pj) +mink[c(Sk)+ c(Sk)].

1.1.24. THEOREM. Algorithm1.1.23 computes optimal alphabetic trees

in time O(n3).Proof:When two adjoining segments are merged, the search path to each

leaf lengthens by 1. This explains the combining cost jS pj in the algo-rithm. Since an optimal tree must merge optimal subtrees, induction on

i proves that the algorithm finds an optimal tree.

A separate dynamic programcomputes the (n2) combining costs jS pj

in advance, in increasing order of |S|, using constant time per compu-tation. The algorithm then computes a potential combination for each

choice of two adjoining segments, in increasing order of the size of the

union. Such a pair is specified by the start and end of the first segment(possibly equal) and the end of the second segment. The algorithm per-forms (n

3)+(n

2) constant-time computations to find all values c(S), keeping

always the best value found among the |S| 1 candidates for c(S).


Knuth [1971] showed how to manage the computation more cleverlyto do it in quadratic time, even for a more general problem. Yao [19??]later found further refinements. Nagaraj [1997] is a tutorial on optimalbinary search trees with a good bibliography.

Hu and Tucker developed a faster algorithm for optimal alphabetic

codes. It computes a not-necessarily-alphabetic tree, discards that tree

while keeping its depth information for leaves, and then finds an alpha-

betic tree with leaves at those depths.

1.1.25. ALGORITHM. (HuTucker Algorithm).Input: Frequencies and fixed left-to-right ordering of leaves.

Step 1. Iteratively merge two compatible items with least total weight,

where items are compatible if all items between them have already par-

ticipated in merges. Replace the merged items by a parent with their

total weight, placed between their former positions in the list.

Step 2. The output of Step 1 is a binary tree. Compute the depth of each

original item (the number of merges it participated in).Step 3. Construct an alphabetic tree with the leaves at these depths by

iteratively pairing leftmost adjacent items with the largest depths, re-

placing them with one item having the next smaller depth.

1.1.26. Example. Consider input frequencies 3 , 2 , 2 ,3 ,6 ,2 ,3 ,2 in or-

der. In Step 1 of the HuTucker Algorithm, we first combine the leftmost

2s, but the other 2s are not compatible. We combine one of these with the

3 between them and then combine two 3s across the initially combined

item. Proceeding yields the tree on the left below.

This tree is not alphabetic. The depths computed in Step 2 are

3 , 3 , 3 , 3 , 2 , 4 , 4 , 3 in order. To form the corresponding alphabetic tree,

combine the 4s, then three pairs of neighboring 3s, etc.

3 2 2 3 6 2 3 2

4

6 5

10 7

13

23

3 2 2 3

6

2 3

2

Step 1 maintains a shrinking list of the current weights, with the

crossable ones (results of merges)marked. The pairs that later are com-patible are not affected by where between its two inputs the output of

a merge is placed. Hence the location also does not affect the resulting

depths. If the new item replaces its left child in the list, and the algo-

rithm breaks ties by merging the lexicographically leftmost least compat-

ible pair, then the merged pair always consists of two consecutive items.

This produces a fast implementation (GarsiaWachs [1977]).The proof of correctness is surprisingly hard (it may be skipped with-

out loss of continuity). Hu [1973] shortened the original HuTucker[1971] proof. We follow the still shorter proof in HuKleitmanTamaki[1979], which permits a more general optimization criterion.

The proof has two main steps. Feasibility: The depths resulting from

Step 2 are realizable by an alphabetic tree. Optimality: The tree result-

ing from Step 1 has minimum cost among a class of trees that includes all

alphabetic trees. The final alphabetic tree is then also optimal, because

pi li does not change when we rearrange the tree to make it alphabeticwithout changing the depths of the leaves.

The proof of feasibility requires technical lemmas about the weights

of nodes and their order of formation. As Step 1 proceeds, the items in a

segment bounded by noncrossable nodes (including the boundary nodes)are pairwise compatible. When a noncrossable item is merged, the two

compatibility sets involving it combine. Therefore, if u and v are com-

patible and both are crossable, then they are compatible with the same

set of nodes as long as they both exist.

Let T be the tree produced by Step 1 of the algorithm. We write w(u)for the weight of a node u in T , with w(u) w(v) meaning that w(u) w(x) for some node

x compatible with v; let x have the least such weight (leftmost if there isa tie).

By the induction hypothesis, subsequent merges are LMCPs. First

suppose that before x merges, v merges with some node y. Since v x, also x is compatible with v and with every node compatible with v(regardless of whether x is crossable). In particular, x y, but w(v) w(u)> w(x), which contradicts the choice of {v , y} as an LMCP.

Next suppose that x eventually merges with v. We wish to replacethe first merge {u , v} with {v, x} (forming x) and replace the {v , x}merge with {u , x}; this would yield the same list of depths of mergesand yield a cheaper tree, since w(u)> w(x). Since v x, this list will alsobe a list of compatiblemerges unless u is noncrossable and some interven-

ing merge in forming T crosses the position of u. We must eliminate this

possibility.

We consider this together with the remaining case, which is when x

merges with some y other than v before v merges. Here replacing theinitial {u , v} merge by {v, x} and the later {v, y} merge by {u , y} wouldagain yield the same list of depths of merges, and it would produce ei-

ther a cheaper tree or one with the same cost but cheaper first merge.

Again these pairs are compatible at the time of merge and the new list

will be a list of compatible merges unless u is initially noncrossable and

some intervening merge crosses the position of u.

If u is noncrossable, then the hypothesis v x implies that v and xare on the same side of u. Suppose {r, s} is the first merge crossing theposition of u, with s on the same side of u as v, x. If s v initially, thenthe choice of x implies w(x) w(s), and thus {r, s} is not an LMCP whenit merges. We prove that w(x) w(z) when z is a node on the v side of uthat is not initially compatible with v but becomes compatible with v be-fore x merges. In particular, s has this property and could not form an

LMCP with r.

We use induction on the number of mergesuntil z becomescompatible

with v. We have already argued that w(x) w(z) if no such merges occur.Otherwise, z becomes compatible with v via some merge {a , b} in whichb z. If a v initially, then the choice of x implies that w(x) w(a). Ifinstead a became compatible with v before the {a , b}merge, then the in-duction hypothesis yields w(x) w(a) again. In either case, a , b being anLMCP implies that w(a) w(z).

We have proved that there is no first merge crossing over a noncross-

able u before x merges. Thus the choice of the initial cheapest deepest

compatible merge must be an LMCP, which completes the proof.


EXERCISES

1.1.1. () Prove that a tree with maximum degree k has at least k leaves.1.1.2. () Prove that every graph with n vertices and k edges has at least n kcomponents.

1.1.3. () Prove C A ,B in Theorem 1.1.3 by adding edges to connect compo-nents.

1.1.4. () Characterization of trees.a) Prove that a multigraph is a tree if and only if it is connected and every

edge is a cut-edge.

b) Prove that a multigraph is a tree if and only if every way of adding anedge (without adding a vertex) creates exactly one cycle.

c) Explain why (b) fails if the condition applies only to nonadjacent pairs.1.1.5. () Prove that a connected n-vertex graph has exactly one cycle if and onlyif it has exactly n edges.

1.1.6. () Every tree is bipartite. Prove that every tree has a leaf in its largerpartite set (in both if they have equal size).1.1.7. () Let T be a tree in which every vertex has degree 1 or degree k. Deter-mine the possible values for |V(T)|.1.1.8. () Let T be a tree in which all vertices adjacent to leaves have degree atleast 3. Prove that T has some pair of leaves with a common neighbor.

1.1.9. () Let G be a tree. Prove that there is a partition of V(G) into twononempty sets such that each vertex has at least half of its neighbors in its own

set in the partition if and only if G is not a star.

1.1.10. () There are five cities in a network. The cost of building a road directlybetween i and j is the entry ai , j in the matrix below. Note that ai , j = indicates

that there is a mountain in the way and the road cannot be built. Determine the

least cost of making all the cities reachable from each other.

0 3 5 11 9

3 0 3 9 8

5 3 0 10

11 9 0 7

9 8 10 7 0

1.1.11. () In the graph K1 C4 , assign the weights 1 ,1 , 2 , 2 ,3 ,3 ,4 ,4 to theedges in two ways: one way so that themimimum-weight spanning tree is unique,

and another way so that the minimum-weight spanning tree is not unique.

1.1.12. () Compute a code with minimum expected length for a set of ten mes-sages whose relative frequencies are 1 , 2 ,3 ,4 ,5 ,5 ,6 ,7 ,8 ,9. What is the ex-

pected length of a message in this optimal code?

1.1.13. Prove that the block-cutpoint graph (Definition 1.1.10) of a connectedgraph G is a tree in which all leaves correspond to blocks of G.

1.1.14. Prove that a connected graph having exactly two vertices that are not

cut-vertices is a path.

1.1.15. Let T be a tree with k vertices. Let G be a graph that does not contain K3or K2 ,t . Prove that if (G) > (k 2)(t 1), then T occurs as an induced subgraphof G. (Zaker [2011])

1.1.16. () Let G be an n-vertex graph with n 4. Prove that if G has atleast 2n 3 edges, then G has two cycles of the same length. (Comment: ChenJacobsonLehelShreve [1999] strengthens this.)

1.1.17. () Prove that every n-vertex graph with n+ 1 edges has a cycle of lengthat most (2n+ 2)/3 . For each n, construct an example achieving this bound.1.1.18. () Prove that every n-vertex graph with n+ 2 edges has a cycle of lengthat most (n+ 2)/2 . For each n, construct an example with no shorter cycle.1.1.19. ()Let T be a tree with k edges, and let G be an n-vertex graphwith morethan n(k 1) (k

2) edges. Use Proposition 1.1.4 to prove that T G if n > k.

1.1.20. Give proof or infinitely many counterexamples for these statements:

a) If T is a minimum-weight spanning tree of a weighted graph G, then theu , v-path in T is a minimum-weight u , v-path in G.

b)One can produce a minimumweighted spanning path in a complete graphwith nonnegative edge weights by iteratively selecting the edge of least weight so

that the edges selected so far form a forest with maximum degree 2.

1.1.21. Suppose that in the hypercube Qk , each edge whose endpoints differ in

coordinate i is givenweight 2i. Compute the minimumweight of a spanning tree.

1.1.22. Let G be a weighted graph with distinct edge weights. Without using

Kruskals algorithm, prove that G has only one minimum-weight spanning tree.

1.1.23. Let G be a weighted connected graph. Prove that no matter how ties are

broken in choosing the next edge for Kruskals Algorithm, the list of weights of

a minimum spanning tree (in nondecreasing order) is unique.

1.1.24. Let F be a spanning forest of a connected weighted graph G. Among all

the edges of G having endpoints in different components of F , let e be one of mini-

mum weight. Prove that among all the spanning trees of G that contain F , there

is one of minimum weight that contains e. Use this to give another proof that

Kruskals algorithm works.

1.1.25. () Prims Algorithm grows a spanning tree from an arbitrary vertex ofa weighted G, iteratively adding the cheapest edge between a vertex already ab-

sorbed and a vertex not yet absorbed, finishing when the other n1 vertices of Ghave been absorbed. (Ties are broken arbitrarily.) Prove that Prims Algorithm


produces a minimum-weight spanning tree of G. (Jarnick [1930], Prim [1957],Dijkstra [1959], independently).1.1.26. Let v be a vertex in a connected graph G. Obtain an algorithm for find-

ing, among all minimum spanning trees of G, one that minimizes the degree of

v. Prove that it works.

1.1.27. () A minimax or bottleneck spanning tree is a spanning tree in whichthe maximum weight of the edges is as small as possible. Prove that every

minimum-weight spanning tree is a bottleneck spanning tree.

1.1.28. Let T be aminimum-weight spanning tree in a weighted connected graph

G. Prove that T omits some heaviest edge from every cycle in G.

1.1.29. Given a connected weighted graph, iteratively delete a heaviest non-cut-

edge until the resulting graph is acyclic. Prove that the subgraph remaining is

a minimum-weight spanning tree.

1.1.30. () Let T be a minimum-weight spanning tree in G, and let T be an-other spanning tree in G. Prove that T can be transformed into T using stepsthat exchange one edge of T for one edge of T , such that the edge set is always aspanning tree and the total weight never increases.

1.1.31. Form a graphG by replacing an edge of Kt+1 with a path of length nt con-necting its endpoints, through n t1 new vertices. Prove that G has n verticesand n + ( t

2) 1 edges but has no spanning tree with more than t leaves. (Com-

ment: DingJohnsonSeymour [2001] showed that every n-vertex graph with atleast n+ ( t

2) edges has a spanning tree with more than t leaves.)

1.1.32. ()Upper bounds on l(n, k).a) Form an n-vertex graph G from a cyclic arrangement of cliques of sizes

k/2 , k/2 ,1 , . . . , k/2 , k/2 ,1 (in order) by letting every vertex be adjacent to allvertices in the clique before it and the clique after it (G is k-regular). In terms ofk and n, determine the maximum number of leaves in a spanning tree of G.

b) For k even, place n vertices on a circle and let each vertex be adjacent tothe k/2 closest vertices in each direction. For 3k/2+2 n < 5(k+1)/3, prove thatthis graph has no spanning tree with at least (k 2)n/(k + 1)+ 2 leaves.1.1.33. Let l(G) be the maximum number of leaves in a spanning tree of G, andlet f(G) = vV(G) d(v)2d(v)+1 . Linial conjectured that l(G) f(G) for all G; this isfalse. Prove that f(G) l(G) can be arbitrarily large by considering the graphGm with 10m vertices formed by adding a matching joining K5m and mC5 .

1.1.34. ()Let G be an n-vertex graph other than K3 in which every edge belongsto a triangle. Prove that G has a spanning tree with at least (n+ 5)/3 leaves andthat this is sharp for an infinite family of graphs. (Hint: Grow a tree by oper-ations satisfying an appropriate augmentation inequality. To get the constant

right, an extra dead leaf may need to be guaranteed at the beginning.)1.1.35. Prove that the number of binary trees with n+ 1 leaves equals the num-

ber of ordered trees with n+1 vertices (Example 1.1.18 shows the five binary treeswith four leaves). (Hint: Show that the two families satisfy the same recurrence.)

1.1.36. () Suppose that nmessages occur with probabilities p1 , . . . , pn and thateach pi is a power of 1/2 (each pi 0 and pi = 1).

a) Prove that the two least likely messages have equal probability.b) Prove that the expected message length of the Huffman code for this dis-

tribution is pi lg pi.1.1.37. Given frequencies p1 , . . . , pn, Huffmans algorithm finds the prefix-free

encoding that minimizes pi li , where li is the length of the word assigned toitem i (equivalently, the length from the root to the ith leaf in the correspondingbinary tree. Consider instead the objective function maxi{pitli}, where t is a realnumber in the interval [1 , 2]. (HuKleitmanTamaki)

a) An analogue of Huffmans algorithm builds a binary tree from the bottomup by iteratively replacing the two least frequent items, having weights p , p , bya single item with frequency tmax{p , p}. Prove that the resulting binary treeminimizes maxi{pitli}.

b) (+) Use the algorithm in part (a) to prove that a tree always exists withmaxi{pitli} tnj=1 pj . (Hint: Use induction on n. Consider an optimal tree builtby the algorithm. Focus on the deepest level where it has more than two nodes,

if any exists. Modify the part of the input corresponding to leaves at that level.)1.1.38. () The Fibonacci tree Tk with Fk leaves is the rooted tree defined asfollows. Let T1 and T2 consist of the root only. For k 2, let the left subtree beTk1 and the right subtree be Tk2 . For a path from the root, let left branchescost 1 and right branches cost c, with c > 0. (Here Fk is the adjusted Fibonaccinumber with F0 = F1 = 1.)

a) Let T be a binary tree with n leaves in which every vertex has 0 or 2 chil-dren (not necessarily a Fibonacci tree). Prove that thedifference between the totalcost of paths to leaves and the total cost of paths to non-leaves is (n 1)(1 + c).

b) For c = 2, prove that Tk has the minimum total cost of paths to leavesamong all rooted plane binary trees with Fk leaves. (Hint: Prove that the cost toeach internal vertex of Tk is less than the cost to every potential vertex that is

not internal to Tk .) (Horibe [1983])1.1.39. Lopsided binary trees. Fix p (0 ,1). From an internal node of a tree,assign probability p to the left branch and 1 p to the right branch. The proba-bility of reaching any node is the product of the probabilities along the path from

the root. The entropy function H is given by H(p1 , . . . , pn) = pi lg pi . Ap-maximal tree is a tree maximizing the entropy of the leaf probability distribu-

tion over all trees with the same number of leaves.

a) Prove that H(p1 , . . . , pn) equals H(p ,1 p) times the sum of the proba-bilities of internal nodes, and that the sum of the probabilities of internal nodes

equals the expected path length to a leaf.

b) A binary tree with left-branch cost 1 and right-branch cost c > 1 is c-minimal if the total cost of paths to leaves is miminal among all binary trees

with the same number of leaves. Prove that if pc = 1 p, then a binary tree is c-minimal if and only if it is p-maximal. (Comment: for c = 2, the corrrespondingp is (1 +

5)/2.) (Horibe [1988])

trees and optimization

Documents