three minimum spanning tree algorithms

8/13/2019 Three Minimum Spanning Tree Algorithms

http://slidepdf.com/reader/full/three-minimum-spanning-tree-algorithms 1/50

Three minimum spanning tree algorithms

Jinna Lei

Submitted for Math 196, Senior Honors Thesis

University of California, Berkeley

May 2010

1



Contents

1 Introduction 61.1 History and Content . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 The problem, formally . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Some definitions . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Simplifying assumptions . . . . . . . . . . . . . . . . . . . . . 71.2.4 Limited computation model . . . . . . . . . . . . . . . . . . . 8

1.3 Important properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 Cuts and cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.2 About trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.3 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . 101.3.4 The cut and cycle properties . . . . . . . . . . . . . . . . . . . 11

1.4 Graph representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Classic algorithms 122.1 The union-find data structure . . . . . . . . . . . . . . . . . . . . . . 122.2 Kruskal’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Dijkstra-Jarník-Prim . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Iterative algorithms 143.1 Contractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Borůvka’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.1 One iteration of Borůvka’s algorithm . . . . . . . . . . . . . . 16

3.2.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Fredman-Tarjan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3.1 One iteration of Fredman-Tarjan . . . . . . . . . . . . . . . . 173.3.2 The complete algorithm . . . . . . . . . . . . . . . . . . . . . 19

4 An algorithm for verification 194.1 Verification: problem definition and reduction . . . . . . . . . . . . . 19

4.1.1 Narrowing down the search space . . . . . . . . . . . . . . . . 194.1.2 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 We can verify with a linear number of comparisons . . . . . . . . . . 214.2.1 Proof of complexity for a full branching tree . . . . . . . . . . 22

4.2.2 Turning every tree into a full branching tree . . . . . . . . . . 224.2.3 We can use B instead of T . . . . . . . . . . . . . . . . . . . . 23

2



5 A randomized algorithm 255.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.1.1 The subgraph passed to the second recursion is sparse . . . . . 26

5.2 A tree formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.2.1 Some facts about vertices and the recursion tree . . . . . . . . 285.2.2 Some facts about edges and the recursion tree . . . . . . . . . 29

5.3 Runtime analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.3.1 The expected running time . . . . . . . . . . . . . . . . . . . . 295.3.2 A guaranteed running time . . . . . . . . . . . . . . . . . . . . 305.3.3 High-probability proof . . . . . . . . . . . . . . . . . . . . . . 30

6 A deterministic, non-greedy algorithm 316.1 The Ackermann function and its inverse . . . . . . . . . . . . . . . . 316.2 The Soft Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2.1 Bad and corrupted edges . . . . . . . . . . . . . . . . . . . . . 336.2.2 Consequences for the MST algorithm . . . . . . . . . . . . . . 33

6.3 Strong contractibility and weak contractibility . . . . . . . . . . . . . 336.3.1 Strong contractibility . . . . . . . . . . . . . . . . . . . . . . . 336.3.2 Weak contractibility . . . . . . . . . . . . . . . . . . . . . . . 346.3.3 Strong contractibility on minors . . . . . . . . . . . . . . . . . 35

6.4 Overview revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.5 Motivation for Build-T . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.5.1 What we already know . . . . . . . . . . . . . . . . . . . . . 366.5.2 The recursion formula . . . . . . . . . . . . . . . . . . . . . . 37

6.6 Build-T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.6.1 A hierarchy of minors . . . . . . . . . . . . . . . . . . . . . . . 386.6.2 Building the tree . . . . . . . . . . . . . . . . . . . . . . . . . 386.6.3 Determining which sibling to visit next . . . . . . . . . . . . . 396.6.4 When a node runs out of children . . . . . . . . . . . . . . . . 406.6.5 Data structures and corruption . . . . . . . . . . . . . . . . . 416.6.6 Error rate and running time . . . . . . . . . . . . . . . . . . . 41

6.7 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.8 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.8.1 Density games . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.9 And, Finally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3



7 The Optimal One 427.1 Decision trees and optimality . . . . . . . . . . . . . . . . . . . . . . 43

7.1.1 Breaking up the decision tree . . . . . . . . . . . . . . . . . . 44

7.2 DenseCase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447.3 Building and storing decision trees . . . . . . . . . . . . . . . . . . . 457.3.1 Keep the parameters small . . . . . . . . . . . . . . . . . . . . 457.3.2 Emulating table lookups . . . . . . . . . . . . . . . . . . . . . 45

7.4 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467.4.1 Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.4.2 Finding partitions . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.5 Putting things together . . . . . . . . . . . . . . . . . . . . . . . . . . 487.6 Time complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4



Since this marks the denouement of my college career, I suppose acknowl-edgements are in order.

I dedicate this to my family who encouraged and supported me through-

out college, and gave me the inspiration to keep going.Also a very big thanks to Professor Karp, Luqman Hodgkinson, YehonatanSella, Ian Henderson, and everybody else who listened to me blatheringon about spanning trees and soft heaps. And my dad, who not onlylistened but read part of it!

5



1 Introduction

1.1 History and Content

In 1926 Otokar Borůvka attacked the problem of finding the most efficient electricitynetwork for the now-nonexistent nation of Moravia. [1] Distilled into a mathematicalform, this is the problem of finding the subgraph of least cost that is still connected.Since then, the task of finding a minimum spanning tree has become a staple of the algorithms repertoire. A few others proposed better solutions, the methodsof Kruskal and Dijkstra-Jarník-Prim (more commonly known as Prim’s algorithm)being the most intuitive and popular.

The classical greedy algorithms – Kruskal’s, Dijkstra-Jarnik-Prim, and Borůvka’s– build the graph incrementally, at all times maintaining a correct partial result –correct in the sense that every edge in the intermediate result ends up in the final. On

the other hand, the fastest algorithms so far all maintain intermediate results thatare instead supersets of the correct answer, instead of subsets. We shall investigatewhy this approach is powerful by examining four algorithms: one that checks whethera given spanning tree actually is minimal, and three for actually constructing theMST.

The verification algorithm first is the cumulative result of quite a few papers,probably the first of which came from Janos Komlós in 1984. Next we will examine arandomized algorithm by David Karger, Philip Klein, and Robert Tarjan (1995) thatruns in linear time with high probability. The third result is a slightly superlinearalgorithm by Bernard Chazelle (2000) that uses the wonderful Soft Heap. The lastalgorithm we will look at was put together by Seth Pettie and Vijaya Ramachan-

dran (2002), and although they show that its asymptotic running time cannot bebeaten (at least for a comparison-based algorithm), no one really knows what thatasymptotic running time actually is, except that it is at least linear in the input size.

This is a review of the papers I found interesting. I try to frame things in newand interesting ways, and introduce some coherence between them. I hope it can beof help to anyone surveying the minimum spanning tree literature.

1.2 The problem, formally

1.2.1 Some definitions

Definition. A graph G consists of a vertex set V and an edge set E. Every elementof E is an unordered pair of vertices. We will write undirected edges as u, v.

6



Definition. A subgraph H of G has an edge set E ⊆ E , and a vertex set inducedby E .

Definition. A path in G is a sequence of vertices v0, v1, v2, . . . , vk such that there is

an edge between any two adjacent vertices vi, vi+1 in the sequence. We will sometimesrefer to the edges in the path, although it is formally defined as a sequence of vertices.

Definition. A graph is connected if there is a path between any two vertices in thegraph.

Definition. A tree is a graph that is minimally connected – that is, any tree T is aconnected graph, but removing any edge will disconnect it.

Definition. A spanning tree of G is a subgraph of G that is a tree and that coversevery vertex in G.

Definition. If G is a graph whose edges have weights w(e), the cost of a subgraphH is

e∈H w(e), or more informally, the sum of the weights of all its edges.

Definition. A forest is a set of trees.

Definition. An edge is incident to a connected component, a vertex, or anotheredge if exactly one if its endpoints lie in the connected component, equals the vertex,or is also an endpoint of the other edge.

1.2.2 Statement

Given a graph G = (V, E ), and a weight function w over the edges, find a spanningtree T of G such its cost is minimal. That is, of all the spanning trees U of G,e∈T w(e) ≤

e∈U w(e).

We will denote the true minimum spanning tree of G as MST(G).

1.2.3 Simplifying assumptions

Let it be known that m generally denotes the number of edges in the input graphand n the number of vertices. If there is a possibility of ambiguity, we will strive toclarify whether m and n refer to the parameters of the original input, or those of arecursive call.

Unless otherwise stated, we will assume that all edge weights are distinct. Usuallyit is a simple matter to generalize to non-distinct weights. In addition, we will oftenassume in correctness proofs that all edge weights are integers in [1,m]. This will not

7



change the minimum spanning tree. In fact, we only need an ordering on the edgeweights to find the MST, as the Cut and Cycle properties below show.

We will also assume that the original input graph G is connected. If an input

graph is not connected, we can find the strongly connected components in linear(O(m + n)) time and feed the components separately to our algorithms. It is nothard to find an algorithm that finds connected components – for instance, depth-firstsearch will suffice. Since all the algorithms we deal with are linear or superlinear, thisstipulation causes no loss of generality and doesn’t affect our running time analyses.It also implies that m ≥ n − 1, or n = O(m) and log n = O(log m).

The original input graph G will always be simple – that if an edge connectstwo vertices, it is the only edge between those two vertices, and that there are noself-loops. We lose no generality here because we can clean up a non-simple graph,keeping only the redundant edges with lightest weight, in O(m) time. We will give thealgorithm later. This assumption causes m ≤ n(n−1), which implies m = O(n2) and

log m = O(log n). In some recursive calls the simple-graph requirement is droppedto make analysis simpler, and it will be clearly stated whenever this occurs.

In addition, we talk about graphs on labeled vertices. For a graph on n vertices,each vertex is labeled with a number (or any arbitrary symbol, as long as it is unique)from 1 to n. Two graphs G and G are taken to be equal if i, j is an edge in G if

and only if i, j is also an edge in G. There are 2(n2) unique unweighted graphs onn vertices.

1.2.4 Limited computation model

The literature makes a distinction between comparison-based algorithms and algo-

rithms which are allowed full access to bit representations of data. There is a specificmodel of computation that is favored, the pointer machine . The main limitation of a pointer machine is that it does not allow arithmetic on pointers. That means noconstant-time table lookups, since calculating a hash function requires the ability tomanipulate pointers. A pointer is allowed to be dereferenced and checked against an-other pointer for equality, nothing more [2]. The full range of arithmetic operationsare allowed on any other data type, and have unit cost.

We need to acknowledge the elephant in the room: given models of computationthat do allow bit arithmetic, the MST problem already has a linear time solution!For example, Fredman and Willard also another algorithm and data structure that

finds MSTs in linear time on a unit-cost RAM. bit arithmetic [8]. Pettie’s algorithmfrom Section 7 also runs in linear time if pre-computed MST solutions (instead of decision trees) are allowed to be cached and retrieved in constant time.

8



We will focus on pointer machine algorithms in this review. They tend to revealmore about the nature of the MST problem, and the resulting insights are frequentlyapplicable to matroid optimization in general. In addition, the search for a linear-

time comparison-based minimum spanning tree algorithm has motivated ideas anddata structures that are useful in general to computer science, some of which we willdescribe.

1.3 Important properties

1.3.1 Cuts and cycles

Definition. A cut of a graph G is a subset S of vertices and its complement S suchthat neither S nor S is empty.

Although we will formally define a cut as a set of vertices and its complement,

keep in mind that a cut is really a just of dividing the graph. In many ways theedges that cross a cut are more important than the vertices themselves.

Definition. An edge e crosses a cut S, S if one of its endpoints is in S and the otheris in S .

Often we will talk of a specific cut, one that results from removing an edge froma spanning tree.

Definition. Let T be a spanning tree of G and e an edge in T . Removing e from T divides T into two connected components, and every vertex in G is in one of them.

We say S, ¯S is the cut defined by e if S is the set of vertices in one of the components,and denote it cut (T, e).

Definition. A cycle in G is a path whose endpoint is the same as its start point.

Since a spanning tree T of G is connected, there is a path involving only edgesin T between any two vertices in G, and since it is a tree, this path is unique.

Definition. Let T (u, v) denote the unique path between u and v in T .

As with cuts, a spanning tree and an edge can define a specific cycle.

Definition. If T is a spanning tree of G and e = u, v is an edge not in T , thenT (u, v and u, v form a cycle. We call this the cycle e makes with T .

9



1.3.2 About trees

We will reiterate without proof some basic facts about trees and go on to some MSTproperties.

Fact. A tree T with n vertices has n-1 edges and no cycles.

In the other direction,

Fact. Any two of the following properties suffice to prove that T is a tree: connect-

edness, acyclicity, and having n-1 edges

1.3.3 Existence and Uniqueness

Now we have the tools to prove that the minimum spanning tree does indeed existfor all graphs, and is furthermore unique if our edge weights are distinct.

Theorem 1. For any connected graph G, the minimum spanning tree of G exists

and is unique.

Proof. Existence: A spanning tree of G exists, since we can keep removing edgesuntil G no longer has any cycles. Removing any edge that is part of a cycle does notdisconnect G since any path that went through the edge u, v which we removedmay also go through the remaining part of the cycle. There also exists a spanningtree with minimal weight: since our graphs our finite, we can enumerate all spanningtrees and their costs. The set of spanning tree costs is also finite, so there must bea minimum.

Uniqueness: Suppose we have two spanning trees, T 1and T 2, with the sameweight. Let e be the heaviest edge in T 1 ∪ T 2 − T 1 ∩ T 2. Suppose without loss of generality that e ∈ T 1. Across cut (T 1, e), there is no other edge of T 1, otherwise therewould be a cycle. However, since a spanning tree must be connected, there is at leastone edge in T 2 that crosses this cut. Let f be such an edge of T 2. Since f is not in T 1,it must be in T 1 ∪ T 2 − T 1 ∩ T 2, and since e was the heaviest in this set, w(f ) < w(e).If we replace e with f in T 1, the resulting graph is connected, since the graphs oneither side of the cut were connected. It is also acyclic, since after removing e fromT 1, there was no path between a vertices on opposite sides of the cut, so adding inf created no cycles. Thus replacing e by f results in a spanning tree with total costless than cost (T

1) = cost (T

2), so neither of these are minimal. Therefore a spanning

tree with minimal cost must be the only spanning tree with that cost.

10



1.3.4 The cut and cycle properties

The properties of being the lightest edge across some cut and the heaviest edge onsome cycle have a curious dual relationship:

Lemma 2. Theorem: ∃ cut S across which e is lightest ⇐⇒ cycle on which e is

heaviest.

Proof. (⇒) Let e be an edge that is the lightest across a cut S, S , and suppose thatC is a cycle containing e. Removing e from the cycle leaves a path between the twoendpoints of e, call them u and v. Since one (say u) is in S and the other is in S ,then the remainder of C must cross from S to S . Let f be the edge in C –e thatcrosses the cut. Since e is the lightest across the cut, e must be lighter than f , andso e cannot be the heaviest in the cycle.

(⇒) We argue the contrapositive. Suppose e is heaviest on a cycle C . Let S , S

be a cut that e crosses. Since C is a cycle, it must cross the cut at least twice. Letf be another edge that crosses the cut. We know e is heavier than f , so e cannot bethe lightest across this cut.

The cut property and the cycle property are ways of characterizing all edges inthe MST, and indeed either one actually defines the edges of the MST.

Theorem 3 (Cycle Property). An edge e is not in the minimum spanning tree if

and only if it the heaviest edge on some cycle.

Proof. (⇒) Let T ∗ be the minimum spanning tree for G, and let e = u, v not be inT ∗. Since T ∗ connects all the vertices of G, there is a path in T ∗ that connects u and

v. Adding e to this path creates a cycle C . If there exists an edge e

that is heavierthan e, then we can replace e with e to get a lighter tree T ∗∗, which is impossibleby our choice of T ∗.

(⇐) Suppose e is heaviest on the cycle C . Then there is no cut across which itis the lightest. Let T be a spanning tree of G that includes e. Removing e splits T into two connected components, defining a cut of G. There is a lighter edge f acrossthis cut, and replacing e by f yields a tree T that is lighter than T . So no spanningtree containing e is minimal.

By the cut-cycle duality, it is easy to see that this implies the cut property:

Theorem 4 (Cut Property). An edge e is in the minimum spanning tree if and only if it is the lightest across some cut.

Proof. This follows directly from Lemma 2 and Theorem 3.

11



1.4 Graph representation

A graph, being mathematically defined as a set of vertices V and a subset E of V × V , still needs to have some kind of concrete representation on a computer. We

can realize this with an adjacency list : each vertex v maintains a list of pointersto edge objects for which v is an endpoint. Both edges and vertices we will defineto be data structures, with a vertex storing at minimum its unique identifier. Anedge stores a pointer to the endpoint with the lesser identifier, and a pointer to theendpoint with the greater identifier, as well as its weight. Vertices and edges are alsocapable of storing an additional constant amount of data, which we will describe asneeded.

2 Classic algorithms

2.1 The union-find data structure

The classic greedy algorithms heavily use set operations, in particular asking whethertwo objects are in the same set as well as taking the union of two sets. The union-find data structure supports the operations makeset(u), find(u), and union(u, v): makeset() returns a new set with one element u, find(u) returns the unique repre-sentation of the set to which u belongs, and union(u, v) combines the sets containingu and v into one.

The implementation of the union-find structure is outside the scope of this review.However, the running times per operation are important enough emphasize here: makeset and union both run in O(1) time. And for any sequence of union andfind operations that includes k finds, then find takes at most O(kα(k)) time,averaging O(α(k)) or better per find, where α(·) refers to one form of the inverse of the Ackermann function. The Ackermann function grows extremely quickly, and itsinverse grows extremely slowly – α(number of atoms in the observable universe) =4. The Ackermann function and a different form of its inverse (one that takes twoarguments) will reappear later, when we discuss Chazelle’s algorithm.

2.2 Kruskal’s

Kruskal’s algorithm follows a simple intuition: in order to minimize the final cost,

include the lightest possible edges. In particular, start by grabbing the lightest legaledge available, and repeat until you have a spanning tree.

12



Algorithm 1 Kruskal

Require: Input G = (V, E )Ensure: Output T ⊆ E , the minimum spanning tree1:

sort E 2: T ← ∅3: for all v ∈ V do4: makeset(v)5: end for6: for all edges u, v ∈ E do7: if find(u) != find(v) then8: add u,v to T9: union(u,v)

10: end if 11: end for

12: return T

Let’s take a high-level look at Kruskal’s. When edge e is processed, if e is not inthe MST, Kruskal’s ignores it, and if e is in the MST, Kruskal’s puts it in T . Thisis easily proved using the cut and cycle properties. Basically, if the endpoints of eare in the same connected component, then e is heaviest on the cycle it creates withthe existing edges in T , because we are processing the edges in sorted order. On theother side, if the two endpoints are in different components C 1 and C 2, and if S 1 isthe vertex set of C 1, then e is lightest across the cut S 1, S 1.

At the time of processing an edge, Kruskal’s algorithm does exactly the right

thing with it – if e belongs in M ST (G), then Kruskal’s includes it. If not, Kruskal’sthrows it out.

2.3 Dijkstra-Jarník-Prim

This is more commonly known as Prim’s algorithm, but I will follow Pettie’s examplein calling this the Dijkstra-Jarník-Prim algorithm, or DJP for short. It was firstdevleoped in 1930 by Jarník, and independently discovered by Prim and Dijkstra inthe late 1950’s.

From a distance DJP seems very similar to Kruskal’s: it also grabs the lightestedge possible at every step. Instead of iterating through the edges in sorted order,it uses a heap to keep track of which vertex would be cheapest to add to a growingtree. It only keeps track of one edge per candidate vertex, calling on the heapdecreasekey operation if necessary. The running time is therefore heavily dependent

13



on the heap used. With a standard binary heap that insertions, deletions, anddecreasekey operations in O(log N ) time, where N is the number of elements in theheap, the running time is O(m log n).

3 Iterative algorithms

The two iterative algorithms we shall describe are Borůvka’s and the Fredman-Tarjanalgorithm. Both define iterative steps that are important in the later, more sophis-ticated algorithms.

3.1 Contractions

The purpose of the contraction is mostly to present things cleanly. Instead of speak-ing of a collection of intermediate subgraph, it allows us to speak of the vertices of a contracted graph.

Contraction is exactly what it sounds like: we merge two or more vertices intoone supervertex, whose incident edges are the union of all the edges incident to theoriginal vertices that make it up. More formally, given a (usually disconnected) sub-graph H , contracting the graph across H means to make every connected componentof F into a supervertex.

The implementation given in Algorithm 2 requires each vertex to store an integerin the field “component.”

Let mF , nF be the numbers of edges and vertices in F , and mG, nG be the numbersof edges and vertices in G. Since can find connected components in O(mF +nF ) time,

the entire subroutine takes O(mG + nG) time, since we iterate through the verticesonce and throught the edges once, and mF < mG and nF < nG.

Contractions have the messy side effect of potentially returning a non-simplegraph. There is a simple clean-up routine, using a lexicographic sort, to ensure thatthe contracted graph is simple, keeping the lightest edge when there are redundantedges. After lexicographically sorting the edges by component identifiers, redundantedges show up next to each other and we only need to scan the sorted list of edges toextract the edge of lowest cost among duplicated edges. This can be done in O(mG)on a pointer machine [7].

Definition. If G was obtained by contracting edges of G, then G is called a minor

of G.Remark. A minor of a minor is also a minor. That is, if Gis a minor if G and G isa minor of G, then G is a minor of G.

14



Algorithm 2 Contract

Require: Input: G = (V, E ), a subgraph H of edges to contractEnsure: Ouput: G, the contracted graph

If not every vertex in V is represented in H , put the missing vertices inV ← ∅, E ← ∅G = (V , E )connectedComponents←find-connected-components(H )i ← 0for all C ∈ connectedComponents do

put i in V

for all v ∈ C dov.component ← i

end fori ← i + 1

end forfor all u, v ∈ E do

put u.component, v .component in E

end forreturn G = (V , E )

In terms of MST algorithms, certain subgraphs are safer to contract than others.

Definition (Contractible). A subgraph C of G is contractible if MST (G) = M ST (C )∪MST (G \ C ).

That is, treating the entire collection of vertices in C as one does not affect thecorrectness of an MST algorithm. All partial MST results are contractible – we couldstop Kruskal’s or DJP at any time, for instance, contract G across the intermediateresult, and carry on.

Remark. C is contractible and connected ⇐⇒ C ∩MST(G) is connected.

Definition. If G is a minor of G, and v is a vertex in G, then the supervertex v

contains one or more vertices of G. Let the expansion of v be the subgraph of Gwith vertex set v ∈ G : v maps to v , and edge set u, v : u, v both map to v.We write the expansion of v as C v.

3.2 Borůvka’s algorithmLike Kruskal’s algorithm, Borůvka’s partitions the vertices into partial trees andmerges them incrementally. However, unlike Kruskal’s, which merges two compo-

15



nents in a step, in a single step of Borůvka’s algorithm every component is involvedin a merger. Like DJP, Borůvka’s grows the intermediate result by taking the lightestedge coming out of a component, but unlike DJP, which only tracks one component,

Borůvka’s does so for many.Of course the cost of taking multiple edges in a step is that the steps are longerand more complex.

3.2.1 One iteration of Borůvka’s algorithm

At the start iteration i, we have a graph Gi, with G0 = G. During one iteration, eachvertex selects the lightest edge incident to it and contracts that edge. At the endof the iteration, we have some contraction Gi+1 of Gi, and the set F of contractededges.

In the implementation given in Algorithm 3, we need to store fields “minEdge”

and “minEdgeWeight” for each vertex.

Algorithm 3 Borůvka-step

Require: Input: Gi = (V i, E i)Ensure: Output: a forest F of MST edges and a contracted graph G

F ← ∅for all u, v ∈ E i do

if w(u, v) < u.minEdgeWeight thenu.minEdgeWeight ← w(u, v)u.minEdge ← u, v

end if

if w(u, v) < v.minEdgeWeight thenv.minEdgeWeight ← w(u, v)v.minEdge ← u, v

end if end forfor all v ∈ V do

put v.minEdge in F end forGi+1 ←contract (Gi, F )return Gi+1, F

Iterating through the edge set and vertex set takes O(m + n) time, and contractis O(m), so Borůvka-step takes O(m) time in all.

16



3.2.2 The algorithm

Borůvka’s algorithm simply performs Borůvka phases until the entire graph is con-tracted into one vertex. It stores a running result T , and appends the contracted

edges F to T after every iteration. It is easy to see correctness by noting that takingthe lightest edge out of a vertex v in G is equivalent to taking the lightest edgeout of the cut S v , S v, where S v is the vertex set of C v. And we iterate until G iscontracted to a single vertex, so the final T is connected.

3.3 Fredman-Tarjan

Over time, various modifications to Kruskal’s, DJP, and Borůvka’s algorithms havebeen proposed, lowering the running time by various degrees. The use of a Fibonacciheap, and a slight but important modification to DJP, lowers the running time fromO(m log n)to O(mβ (m, n)), where β is a form of the iterated logarithm, being theleast number of times the logarithm function must be applied to n before it dropsbelow m/n. Formally, β (m, n) = mini : log(i) n ≤ m/n. The approach is morethoroughly described in [7].

The Fibonacci heap performs the insert, deletemin, decreasekey, and meld op-erations in constant amoritized time, and both deletemin and delete in O(log N )amoritized time, where N is the number of items in the heap.

3.3.1 One iteration of Fredman-Tarjan

One iteration of the Fredman-Tarjan algorithm results in a contracted graph, wherethe number of contracted vertices is at most 2m/k, with k being an input parameter.In addition, it generates a set of partial MSTs C such that

1. C covers every vertex.

2. The members of C are edge-disjoint

3. The number of connected components in ∪C ∈CC is at most 2m/k.

The basic flow is pretty simple: start with all vertices unmarked. Picking an arbitraryvertex, expand outward, DJP-style, until the heap of candidate vertices reaches size k.Mark all the vertices in the current component, and start afresh with an unmarkedvertex. For components other than the first, expansion can stop before the heapgrows large enough, if the currently component collides with an old one – that is,the last vertex added to the component was already part of another component.Pseudocode is given in Algorithm 4.

17



Algorithm 4 Fredman-Tarjan-iteration

Require: G = (V, E )Ensure: F = a subset of MST edges; G = G \ F

while there are still unmarked vertices doInitialize a new heapPick an arbitrary unmarked vertex v0

Put all adjacent vertices u in heap with key w(u, v0)while hp has fewer than k elements do

v ← heap.deletemin()if v is already in the currently growing component then

Continue without doing anythingend if Add v, x to F , where w(v, x) was the last key of v in the heapfor all u adjacent to v do

If u is not in the heap, insert u with key w(u, v). If u has a greater keyin the heap than w(u, v), then decrease the key to w(u, v)

end forend while

end whileContract G across the edges of F (without clean-up)

This ensures 1) that every time we retrieve the lightest edge from the heap, ittakes time in O(log k), and 2) whenever we stop growing a component, either it hask or more other vertices adjacent to it, or it shares a vertex with another component.

However, the first component in any set of components linked by common verticesmust have stopped growth when the heap reached critical size. Therefore everyconnected component of F at least k edges coming out of it. Contracting across F gives us k edges coming out of each vertex. Since the total number of edges comingout of all vertices is 2m, this gives at most 2m/k vertices in the contracted graph.

Again, the running time is highly dependent on the particular heap implementa-tion. With a Fibonacci heap, one iteration runs in O(m + n log k).

Another consequence of this is that we can raise the density, mn

, of a graph to anarbitrary value D in O(m + n log D) time. This comes from the fact that the newdensity m

n = k

2, so by setting k = 2D and running a Fredman-Tarjan iteration we

have the desired result.

18



3.3.2 The complete algorithm

Again, we perform iterations, contracting the connected components after step, untilthe graph is becomes trivial. Setting k = 2

2m

n for each iteration will give us the

promised time bound. We refer the reader to [7] for details.

4 An algorithm for verification

I’ll begin with a verification algorithm because it nicely illustrates both the usage of the cycle property and a couple of other tricks.

A bit of history and acknowledgements: Janos Komlós first observed that ver-ification can be done in a linear number of comparisons , although a linear-time

implementation proved more incorrigible. Valerie King distilled Komlós’s result intoa simpler form in addition to managing to implement Komlós’s algorithm in lin-

ear time and space. Slightly before King’s result, Dixon, Rauch, and Tarjan gavea completely different algorithm based on massaging the input so that a previousmethod of Tarjan’s runs in linear time. Adam Buchsbaum produced the first purelycomparison-based verifier by replacing the RAM-dependent portion of Dixon et al.by a pointer method.

Here I will talk about Komlós’s information-theoretic result and King’s refine-ment. It is important to know that Buchsbaum’s algorithm exists, for later algo-rithms, but I will not go into detail about it.

4.1 Verification: problem definition and reduction

The inputs are a graph G and a spanning tree T of G. A correct verifier accepts if T is the minimum spanning tree of G and rejects if T is not.

4.1.1 Narrowing down the search space

How do we know if T is the MST? The cut and cycle properties tell us exactly whichedges are in the MST. We present the holistic cycle and cut properties. They areholistic in the sense that they apply to an entire spanning tree.

Theorem 5 (Holistic cut property). If T is a spanning tree of G, then removing

any edge splits T into two connected components, which between them cover all the

vertices of G. This defines a cut of G. The holistic cut property states that T is the minimum spanning tree if and only if every edge in in T is the lightest across the cut

defined by removing it from T .

19



Remark. For the cut defined by removing e ∈ T then e is the only edge from T acrossthat cut.

Likewise the cycle property can be used to evaluate an entire spanning tree.

Definition. Let G be a graph and T a spanning tree of G. Given any two verticesu and v in G, there is a unique path between them that only uses the edges in T.This is a consequence of the definition of a spanning tree. Define T (u, v) to be thisunique path.

Theorem 6 (Holistic cycle property). If T is a spanning tree for G, then for every

edge u, v that is not in T , putting u, v together with T (u, v) creates a cycle. T is

the MST if and only if every edge u, v not in T is heaviest in the cycle it creates

with T (u, v). To simplify notation at times, we will speak of the cycle e creates with

T .

Proof. The cut-cycle duality entails that the forward direction of the holistic cutproperty is equivalent to the holistic cycle property, and the same for the backwarddirection.

For the forward direction, if T is the MST, and f is an edge not in T , then f must be the heaviest in the cycle it creates with T since otherwise we replace theheaviest edge in the cycle with f , obtaining a spanning tree lighter than T . For thebackward direction, if every edge in T is lightest across the cut it defines, then theordinary cut property guarantees that every edge in T is in the MST.

The holistic cut and cycle properties seem nearly like tautologies, given the or-dinary cut and cycle properties. Their significance comes from the fact that theyspecify the exact cut or cycle that we should look at. The ordinary cut and cycleproperties only said, “if there exists a cut,” or “if there exists a cycle.” The holisticproperties make it so we don’t have to look at all cuts or all cycles, just one.

4.1.2 Reduction

Applying the holistic cut and cycle properties yields the following equivalent formu-lations of the MST verification problem:

1. Given a graph G and a spanning tree T , then for every e ∈ T , is e the lightestacross the cut defined by removing e from T ?

2. Given a graph G and a spanning tree T , then for every e /∈ T , is e the heavieston the cycle it makes with T ?

20



Komlós chooses to attack the second question, breaking it up into two parts. Thefirst task is to find the maximum weight on T (u, v) for all vertex pairs u, v. Thesecond is to test w(u, v) against this maximum weight for all edges u, v not in

T .

4.2 We can verify with a linear number of comparisons

Komlós notes that one can turn any spanning tree into a rooted tree by distinguishingan arbitrary leaf node. Given this natural order on the vertices and edges of the inputtree, then, we can break any query path into two half-paths:

Definition. If one end of a path is an ancestor of the other end, then this path is ahalf-path .

Komlós inductively finds the maximum weight on every possible half-path, and

stores the result in a lookup table.For every node v on level d of the tree, we will construct an ordered list M (v) =

[m0(v), m1(v), . . . , md(v)], where mi(v) is the maximum weight of the directed pathstarting at level i and going to v. For example, if p(v) is the parent of v, md−1(v)equals the weight of the only edge between p(v) to v .

Lemma 7. For every v at level d, we can find M (v) in less than or equal to log dcomparisons.

Proof. If d = 1, i.e. v is a child of the root, we define M (v) = [m0(v)] = [w(v, r)].This takes zero comparisons, and 0 ≤ log 1 = 0.

Let u be a node and let ai(u) denote its ancestor at level i, of course constraining

i to be less than depth(u).For any depth i and any node u, mi(u) ≤ mi−1(u) becausethe directed path from ai(u) to u is a subset of the directed path from ai−1(u) tou. Thus [m0(v), m1(v), . . . , md(v)] is an ordered list.When constructing M (v), sincefor every path from an ancestor ai(v) to v we know the maximum over all edges ex-cept p(v), v, we only need to compare w p(v), v) with mi( p(v)). However, sinceM ( p(v)) is an ordered list, we only need to find the point at which w(ad−1, v)becomes greater than mi( p(v)). This is binary search, which takes log(d − 1) com-parisons.

To actually construct M (v), however, is a little more expensive. We take theindex i∗ returned by binary search, , and set mi(v) = mi( p(v)) for all i < i∗, and

mi(v) = w( p(v), v if i ≥ i∗.Definition (Full branching tree). A rooted tree in which all leaves are at the samelevel, and every internal (non-leaf) node has at least two children.

21



4.2.1 Proof of complexity for a full branching tree

The total number of comparisons needed, then, is

i

Li

for Li the total number of comparisons needed for all the vertices at level i, which isgiven by

v

log(|M (v)| + 1) (1)

for all v on level i.Following Eisner’s lead [6], we rewrite this as an average of logs and use Jensen’s

inequality. Equation (1) becomes

niv

log(|M (v)| + 1)

ni

≤ni log(v

|M (v)|

ni+ 1)

≤ni log n +

v |M (v)|

ni

≤ni log n + 2m

ni

=ni

log n + 2m

n + log n

ni

The sum over all levels is then

O(n log m + n

n )

using the fact that, since this is a full branching tree, the number of nodes at depthi is at most n/2i.

4.2.2 Turning every tree into a full branching tree

By building a tree that documents a run of Borůvka’s algorithm, King gives us a wayto turn every spanning tree into a full branching tree with at most 2n vertices. Eachlevel of the full branching tree represents the state of the graph before one iteration

22



of Borůvka’s, with nodes at level i corresponding to the contracted vertices in Gi.There is an edge between a node at level i − 1 and a node at level i if the i − 1-nodebecomes part of the i-node during that iteration. More formally, if T is a spanning

tree over n vertices, we will build a full branching tree B .1. Start B as the empty graph.

2. Put all the vertices in T as the leaves of B . That’s the end of the 0th Borůvkaiteration.

3. Repeat until Gi is contracted to a single vertex: If, at beginning of the ithiteration, we have vertices v1, . . . , vk, and at the end, we have vertices u1, . . . , ulin the contracted graph, then put all u j in B as nodes. Draw a directed edgefrom vi to u j if vi was contracted into u j, and the weight of that edge is theweight of the edge selected by vi during that Borůvka iteration.

Note that since T was a tree to begin with, Boruvka’s algorithm trivially returns T .An immediate consequence of this is that every edge in T was selected at some point.In addition, even if we assume weights in T are unique, weights in B may not be.There is a natural surjective map from edges in B to edges in T , and a one-to-manymapping from edges in T to edges in B, namely the map that associates every anedge in T to all the edges with the same weight in B .

Claim. B is a full branching tree.

Proof. B is clearly a rooted tree, with the node in B to the entire T as the root.Since after an iteration, every connected component is the result of joining at least

two other connected components, condition 2 for a full branching tree is satisfied.And we can prove by induction on the height of B that all leaves are at the same level(that level being the number of iterations needed for running Boruvka’s on T ).

4.2.3 We can use B instead of T

Recall that for a spanning tree T , T (x, y) denotes the unique path in T between xand y. In the same way, let B(x, y) denote the unique path in B between leaves xand y .

Lemma 8. u is on B(x, y) if and only if u is the lowest common ancestor of both x

and y in B, or u is an ancestor of x but not of y, or vice versa.

23



Proof. Suppose v is the lowest common ancestor of x and y. We can see that by joining the path from v to x and from u to y and ignoring orientation, we get anundirected path from x to y. Since B is a tree, this is the only path. Any node

on the path from v to x is an ancestor of x but not y (otherwise we contradict thelowest-ness of v), and vice versa for any node on the path from v to y . On the otherhand, if u is a common ancestor of x and y but not the lowest, then u is not includedon the path defined earlier in the paragraph, which is unique.

Lemma 9. If e’ is an edge of B(x, y), then there is an edge e in T (x, y) with w(e) =w(e). As a matter of fact, e is the same edge from which e derived its weight.

Proof. Let e be the T -edge whose selection gave rise to e in B . If we show that e ison T (x, y), then we’re done.

Suppose (u, v) = e, so e is incident to v. Since v is on B(x, y) and it is notthe highest node (u is higher), by the previous lemma, the subgraph expansion of

v contains exactly one of x and y. Disconnecting e would partition T into C v andT \ C v, one of which contains x and the other of which contains y. Since x and ywould no longer be connected, e must be on T (x, y).

Lemma 10. If e is heaviest on T (x, y), there must be an edge of the same weight in

B(x, y)

Proof. We will show that the expansion of any contracted vertex that selects e con-tains x or y but not both. First, let C v be the expansion of a vertex in one of theGi. Let x = u0, u1, . . . , uk = y be T (x, y). If e is incident to C v, then T (x, y) ∩ C v isnonempty since one endpoint of e is in C v and a vertex in T (x, y). Also, T (x, y) ∩ C v

is clearly connected because otherwise there would be a cycle, and T is a tree. SoT (x, y) ∩ C v = ui, . . . , u j . Since both x and y are not in C v, ui = x and u j = y. Thusui−1, ui and u j , u j+1 are both incident to C v, and one of these is e. However,since there are two edges in T (x, y) incident to C v, there is an edge lighter than eincident to C v, so C v does not select e.

To see that any C v containing both x and y cannot select e, note that discon-necting any edge incident to C v leaves C v intact. If C v contains both x and y,disconnecting an incident edge leaves T (x, y) connected, so no incident edges of C vare part of T (x, y).

Therefore, let v be any vertex that selects e over the course of Borůvka’s algo-rithm. We noted above that every T -edge is selected by at least one component. C vcontains exactly one of x or y, and by Lemma 8 is part of B(x, y). Since C v doesnot contain both x and y, the parent of v is also a node in B(x, y), so the edge fromv to its parent, which has weight w(e), is part of B(x, y).

24



This brings us to our final result:

Theorem 11. If e is the heaviest edge on T (x, y) and f is the heaviest edge on

B(x, y), then w(e) = w(f ).

Proof. By Lemma 9, there is an edge f on T (x, y) with w(f ) = w(f ), so w(f ) =w(f ) ≤ w(e). By Lemma 10, there is an edge e on B(x, y) with w(e) = w(e), sow(e) = w(e) ≤ w(f ). Therefore w(e) = w(f ).

Therefore we can use Komlos’s algorithm for full branching trees instead of generaltrees, which we have just proved to take a linear number of comparisons.

A note The ideas in this section, I thought, nicely illustrated a use of the cycleproperty for determining MSTs. In addition, the idea of constructing a tree of con-tracted components, where each vertex is the child of the vertex it contracted into,

turns up again in Chazelle’s algorithm.

5 A randomized algorithm

Karger, Klein, and Tarjan introduce a randomized algorithm that always returns thesame answer for any input but whose running time varies. MSF-random, as we will callit, is expected to run in O(m + n) for any given graph with m edges and n vertices,although it could conceivably get very unlucky and take up to O(m log n + n2) time.

Since I found that the applying big-Oh notation to a randomized running timeis a little bewildering, let’s restate that: there is a magic number c such that when

MSF-random is run on any graph G a large number of times, the average running timeis less than or equal to c · (m + n) units of time. However, there is another magicnumber d such that MSF-random always finishes in under d · (m log n + n2) units of time.

5.1 Overview

The “F” in the name “MSF-random” comes from the fact that it works for graphs thatare not connected, hence it returns a minimum spanning forest instead of a minimumspanning tree.

The algorithm is roughly sketched below for a graph G having n vertices and medges.

MSF-random:

25



An auxiliary procedure Consider the modification to Kruskal’s algorithm, givenin Algorithm 5.

Algorithm 5 Count-F-light

Require: G = (V, E ); pEnsure: H is a subsampled graph of G; F = M SF (H )1: Sort E 2: numFLight ← 03: numF ← 04: H ← (V H , E H ) ← (∅, ∅)5: F ← (V F , E F ) ← (∅, ∅)6: for all e ∈ E do7: X ← coinFlip( p)8: if X is heads then

9: Put e in H 10: end if 11: if e is F -light then12: numFLight ← numFLight + 113: if X is heads then14: Put e in F 15: numF ← numF + 116: end if 17: end if 18: end for

Claim. After running Algorithm 5, F = MSF (H ) and H is a sampled graph witheach edge being included independently with probability p.

Proof. The second part of the claim follows directly from lines 7 to 10. The first partcomes from the fact that we process the edges in increasing order of weight. If e isincluded in F , then it does not create a cycle with lighter edges (by the definition of F -light), and thus it is safe to include it in F , since any edge added to H afterwardmust necessarily be heavier.

Fact. Suppose we have a coin that comes up heads with probability p. Let Z be a

random variable representing the number times we must flip the coin to achieve n

heads. Then E[Z ] = n/p.

More formally, Z has the negative binomial distribution parameterized by n and p. The expectation of such a distribution is well-known to be n/p.

27



Claim. The variable numFLight is bounded above by a random variable Z havingthe negative binomial distribution with parameters n and p.

Proof. Suppose we flip a coin every time we increment numFLight – but we already

do! We flip a coin and store it in the variable X , and numF counts the number of times we get heads and increment numFLight. So numF is the number of headswe have gotten from our numFLight coin flips. However, numF must be less thann, since the maximum number of edges in the forest is n − 1. Suppose we keep onflipping after count-F-light finishes, until numF plus the number of heads we gotafterwards is n. Let Z be the total number of flips we had to make. That is, Z isnumFLight plus the number of extra flips we had to make. By construction Z hasa negative binomial distribution, and Z is necessarily greater than numFLight. Son/p = E[Z ] > E[numFLight].

The proof of Theorem 12 follows directly from the previous two claims.

This allows us to expect that the number of edges passed into the second recursivecall is proportional to the number of vertices, not the original number of edges. Sincea Borůvka iteration halves the number of vertices, and we perform two of them atthe start of each call, this is very good news for the running time.

5.2 A tree formulation

MSF-random is a divide-and-conquer algorithm in the sense that it calls itself multipletimes, with the input to each subcall being smaller than the input of the parent call.The divide is not quite clean, though, and only the output of the last call is used in thefinal recursion, making the subcalls more like a sequence of refinements. Nevertheless,like all divide-and-conquer algorithms, it can be represented by a recursion tree withthe original problem at the root. Each node has two children, one for each recursivesubproblem. The first (randomly sampled) we’ll say is the left child, and the secondthe right.

5.2.1 Some facts about vertices and the recursion tree

The Borůvka iteration reduce the number of vertices by a factor of 4, so each subprob-lem has at most 1/4 the number of vertices as its parent. Therefore, a subproblemat depth d has at most n/4d vertices. Each subproblem has at most two children;therefore the number of subproblems at depth d is at most 2d. Using these facts,we see that the total number of vertices in all the subproblems at depth d is n/2d.Summing over all levels, we obtain an upper bound of 2n vertices in all subproblemscombined.

28



5.2.2 Some facts about edges and the recursion tree

Definition. A left-path is a path on the recursion tree consisting of all left edges. Acomplete left-path is a left-path headed by either the root or a right child.

Note that left-paths correspond to a recursion chain of only the first recursive call– that is, finding a minimum spanning forest of a randomly sampled subgraph. Alsonote that different complete left-paths are disjoint, and that every vertex on a treeis a member of a complete left-path. In other words, the complete left-paths form apartition of the tree. Also, every right child heads a complete left-path.

It’s pretty trivial to prove that if X 0 is the number of edges at the head of theleft-path, and X i the number of edges at ith node of the path, E[X i] ≤ E[X 0]/2i,since each edge has a 1/2 chance of being sampled, and we remove many edges atthe Borůvka stage.

We sum over all subproblems on the complete left-path and see that the expec-

tation for this number is ∞i=0

E[X 0]/2i = 2E[X 0].

Theorem 13. The expected number of edges in all the combined subproblems is 2m

+ n.

Proof. Suppose I have a subproblem with n vertices and m edges, and let H L andH R denote my left and right subproblems. By the fact above, E[mR] ≤ 2n. Note thatat any depth d, there are at most 2d total subproblems and 2d−1 right subproblems.Recall that each subproblem has at most n/4d vertices. Summing over all depths, we

see that all the right subproblems combined have at most n/2 vertices. Therefore,by Theorem 12, the total expected number of edges in all the right subproblems is2(n

2) = n. The expected number of edges in the complete left-path headed by the

root is 2m, so the expected number of edges in the entire recursion tree is 2m+n.

5.3 Runtime analysis

5.3.1 The expected running time

For a problem of size n vertices and m edges, the running time T (m, n) breaks downinto

1. Two iterations of Borůvka’s: O(m).

(a) Recursive call + finding F -heavy edges: T (mL, nL).

29



(b) Finding F -heavy edges + recursive call: O(m) + T (mR, nR).

2. Concatenate the edges found in previous steps: O(1).

T (m, n) = T (mL

, nL

) + T (mR

, nR

) + O(m).

The running time depends solely on the number of edges processed in each sub-problem:

T (m) = T (mL) + T (mR) + O(m) (2)

By above, the expected total number of edges is 2m + n, which is O(m).

5.3.2 A guaranteed running time

In the worst case, the sampling does nothing, and all the work is done by the Borůvkaiterations. This gives us a bound of O(m log n) from a maximum recursion depth of log n, and m edges in all the subproblems at one level. Furthermore, a subproblem

at depth d contains fewer than 12

n2

4d

2

= 12

n2

24d edges. This gives us at most

1

22d

n2

24d <

1

2

n2

2d total edges in a level, and at most n2 edges in all subproblems at all

levels. This gives us a guarantee that even in the event that MSF-random makesvery, very unlucky choices, it is no worse (asymptotically) than a classical algorithmlike DJP or Borůvka’s.

5.3.3 High-probability proof

The algorithm finishes in O(m) time with probability 1 − exp(−Ω(m)).

First, we deal with the right subproblems. We’re going to prove the numberof edges in all the right subproblems is ≤ 3m with a high probability. We toss anickel for every edge that could be F -light. If it’s heads, the edge goes into the rightsubproblem. Since the number of edges in a spanning forest is less than the numberof vertices, and the number of vertices in all right subproblems is less than n/2, thetotal number of F -light edges in all right subproblems is n/2. Then, the probabilitythat there are more than 3m F -light edges is less than the probability that fewerthan n/2 heads appear in 3m coin tosses. The authors apply a Chernoff bound andthe inequality m ≥ n/2 to get a probability of exp(−Ω(m)) [9].

Now for the left subproblems. If we define m to be the total number of edges inall right subproblems, and m∗ to be the the total number of edges in left subproblems,this can be thought of as m heads appearing in m∗ coin tosses. Therefore P (m∗ >3m) is the probability of getting only m heads in more than 3m coin tosses. Anda Chernoff bound gives you P (m∗ > O(m)) grows as exp(−Ω(m)).

30



6 A deterministic, non-greedy algorithm

This algorithm, published by Chazelle in 2000 [4], held the record for the fastestasymptotic runtime for two years, until Pettie and Ramachandran came up withan algorithm that is by nature upper-bounded by any comparison-based algorithm,including this one. However, since no one has been able to prove a lesser boundfor the latter, Chazelle’s analysis still yields the lowest asympotic runtime for theMST problem of which are aware. In this review I will follow a technical report byPettie [10] that simplifies Chazelle’s analysis. In addition, I will not include mostof the details Chazelle describes in [4], and focus more on the intuition driving thealgorithm.

This algorithm, at its heart, consists of three parts:

1. Identifying subproblems

2. Recursing on subproblems

3. Refining the result from Number 2.

The reason Number 3 is needed is that we will use a data structure, the soft heap,that renders the results of the subproblems inexact. While choosing the subproblems,we use a soft heap, which picks good but not perfect subproblems.

6.1 The Ackermann function and its inverse

The main thing to know about the Ackermann function is that it grows extremely

quickly. Therefore, its inverse grows extremely slowly.The Ackermann function is defined on a 2D table, as follows:

A(1, j) = 2 j ( j ≥ 1)A(i, 1) = A(i − 1, 2) (i > 1)A(i, j) = A(i − 1, A(i, j − 1)) (i,j > 1)

The base cases are sometimes given differently; I have followed [10]. To give anidea of how fast the Ackermann function grows, the first few values are given in Table1. It is estimated that the observable universe contains less than 2515 atoms, whichis in turn less than A(2, 4).

There are two flavors of inverse. The first takes only one argument.

α(k) = mini | A(i, i) > k.

31



Table 1: Values of the Ackermann functioni/j 1 2 3 4 5 61 2 4 8 16 32 64

2 4 16 65536 265536 2265536

22265536

3 16 A(2, 16) ... ... ... ...

The second takes two.

α(m, n) = mini | A

i,m

n

> log n.

Note that α(·, ·) is decreasing in first argument, as m/n is greater, and thus ineedn’t go as high for A(i, m

n ) to toplog n. We will mostly use this second form in

our analyses.

6.2 The Soft Heap

Recall that heaps support the following operations:

• insert(item, k) puts item in the heap with key k.

• delete(item) takes the item away from the heap.

• deletemin() returns the item with minimum key and removes it from the heap.

• meld(otherHeap) which combines two heaps.

The soft heap, an earlier invention of Chazelle’s [5], plays a central part in loweringthe running time bound. We have seen, as in Kruskal’s and Prim’s, that insistingon correctness at every step leads to unnecessary overhead. In Kruskal’s sorting theedges was extra work, and in Prim’s we incurred overhead from maintaining a sortedheap. The soft heap sacrifices correctness in exchange for speed. At any time itmay contain corrupted elements, elements whose keys have been raised from theiroriginal values. The soft heap is controlled by a user-defined parameter , the error

parameter , and guarantees

1. deletemin, delete, and meld take constant amortized time

2. insert takes O(log(1

)) amortized time3. The number of corrupted elements in the heap at any time is at most N , where

N is insertions so far.

32



4. An additional operation, dismantle, takes O(N ) time. This is explained in thenext section.

6.2.1 Bad and corrupted edgesEvery item in a soft heap has two keys, original and current. The soft heap uses thecurrent key to “bubble up” elements, and the return value of the deletemin operationis based on the current key. However, given any heap element, we can find out if itis corrupt or not by comparing the current and original keys.

When we dismantle a soft heap, we will often want to find out which items in itare corrupt. This is why the dismantle operation takes O(N ) time – we need to lookat all the elements currently in the heap and decide if they are corrupt or not.

Note that corruption may only raise the weights of edges, not change them ar-bitrarily. Although it is possible to tell exactly how much each edge weight was

corrupted, we will not need this information for the MST algorithm, only the factthat the soft heap thought the weight was higher than it should be.

6.2.2 Consequences for the MST algorithm

When we pick subproblems, we will use a soft heap to define subsets of the graph onwhich to recurse. Ideally, we would like to pick perfectly contractible components.However, since the soft heap corrupts edges as it goes, we have to settle for a differentsort of contractibility on a corrupt graph.

6.3 Strong contractibility and weak contractibility

Recall that for a contractible subgraph C , M ST (G) = M ST (C ) ∪ MST (G \ C ).Let C be a subgraph of a weighted graph G.

6.3.1 Strong contractibility

Definition. C is a strongly contractible with respect to a weighted graph G if thereexists a vertex v0 in C such that if the DJP algorithm starts with v0, it will constructthe MST of C after some number of iterations.

Definition. The maximum weight of a path is the maximum weight of all the edges

in a path.Claim. Let C be strongly contractible for a corruption G of G, and M C be thoseedges which are both corrupt in G and incident to C . Then if u, v, x, y are such

33



that u, x ∈ C and v, y /∈ C and neither are in M C , all edges on the path between uand x in M ST (C ) have weight less than max(w(u, v), w(x, y).

Proof. If T C is the minimum spanning tree of C , let e be the edge with heaviest

weight on the path T C (u, x). When we run the DJP algorithm from a particularvertex v0, we end up with a minimum spanning tree of C . Let’s look at a step inthe DJP algorithm while it is constructing the the MST of C . Let p = z 0, z 1, . . . , z kbe the part of T C (u, x) already selected by the algorithm so far. If neither endpointof p is equal to u or x, then there are two edges in T C (u, x) that have not beenselected yet. Since e is heavier than any other edge on T C (u, x), it is impossible forthe algorithm to select e at this step.

Remark. The converse is false. In particular, consider the graph with vertices a,b,c,d,eand edges a, b, b, c, c, e, b, d, d, f with weights 1,2,5,3,4 respectively. Thenthe subgraph C made of vertices b,c,d and edges b, c, b, d is its own MST and

cannot be constrcuted by starting DJP on b, c or d since any attempt will run into bfirst and select a, b which has weight 1. However it is impossible to find two edgesincident to C that are lighter, since the three incident edges have weights 1,4,5 andthe edges in C have weights 2 and 3.

6.3.2 Weak contractibility

Weakly contractibility guarantees us a composition formula similar to what we getwith ordinary contractibility. Suppose G is our original graph, andGis the identicalgraph, except some edge weights have been raised. Recall that G \ C is the graphresulting from contracting G across C , so G \ C − M C is the contracted graph lessthe edges in M C .

Theorem. If C is a strongly contractible with respect to G, then MST (G) ⊆MST (C ) ∪ MST (G \ C − M C ) ∪ M C .

Note that MST (C ) and MST (G \ C − M c) refer to the MST with the edgeweights of G, not G. The only time G is relevant is when we specify that C isstronly contractible with respect to G.

The following proof is due to Pettie [11].

Proof. We want to show that any edge not in M ST (C )∪MST (G\C −M C )∪M C also

must not be in M ST (G). The only way an edge fails to be in M ST (C ) ∪ MST (G

\C − M C ) ∪ M C is if it is in C but not M ST (C ), or if it is in (G \ C − M C ) but notin MST (G \ C − M C ).

34



Case 1: If e ∈ C and e /∈ M ST (C ) then there exists some cycle in C for which eis heaviest. This cycle also exists in G, so e /∈ M ST (G).

Case 2a: If e ∈ (G \ C − M C ) and e /∈ M ST (G \ C − M C ), and there exists some

cycle in (G

\ C − M C ) that doesn’t involve C (we are loosely using C to mean thecontracted vertex) for which e is heaviest, then that cycle also exists in G.Case 2b: Now suppose the only cycles in (G \ C − M C ) for which e is heaviest

include C . Let P be such a cycle. Then in the noncontracted graph G, there is acycle P consisting of P and edges in MST (C ). Let u, v and x, y be the twoedges in P that have exactly one vertex in C . Applying Claim 6.3.1and the factthat e has maximum weight on P , w(e) ≥ max(w(u, v), w(x, y) ≥ w(f ) for allf ∈ C ∩ P . Thus e is heavier than any edge in P , which is a cycle in the originalgraph. So e /∈ M ST (G).

If the conclusion of 6.3.1 hold, we’ll say a graph is weakly contractible.

We will engage in slight abuse of notation and let M ST (C ) be ∪C ∈CMST (C ).

6.3.3 Strong contractibility on minors

If G0 is a graph, and C 0 is a set of components that is weakly contractible of G0,then let G1 = G0 \ C 0 − M C0. If C 1 is a set of weakly contractible components of G1,then MST (G0) ⊆ M ST (C 0) ∪ MST (C 1) ∪ M C0 ∪ M C1 . This follows from induction.Thus the set C = C 0 ∪ C 1 is weakly contractible.

6.4 Overview revisited

Now that we have the vocabulary and machinery, we can give a more detailedoverview.

1. If the input graph is small enough to run DJP under a fixed time, then runDJP.

2. Find a set C of subgraphs that is weakly contractible, and let M C be the cor-responding set of bad edges.

3. For subgraph x ∈ C , preprocess x to increase density to mn . As explained insection 3.3, this can be done in time O(m + n log D), where D is the desireddensity.

4. For subgraph x ∈ C , recurse on x.

5. Preprocess MST (C ) ∪ M C to increase density to mn

.

35



6. Recurse on M ST (C ) ∪ M C.

Although the subroutine call to raise the densities may seem worrisome, it will turnout to not affect the O(mα(m, n)) running time.

6.5 Motivation for Build-T

Build-T, the subroutine that will give us our set C , is the key to this algorithm. Firstwe to establish what we want out of it:

1. Acceptable subproblems. That is, MST (G) ⊆ MST (C ) ∪ M C, C is edge-disjoint, and C covers all vertices of G.

2. Runs in O(m).

3. Small enough subproblems so that recursion of them does not overwhelm therunning time.

4. Not too many bad edges. This is so the final recursion does not overwhelm therunning time.

The rest of this section is dedicated to elaborating on the last two points. In theremainder of this section, suppose C is a set of subgraphs such that

MST (G) ⊆ M ST (C ) ∪ M C.

Let mL be the total number of edges passed to all the recursive calls except the

last, and mR, nR be the number of edges and vertices passed to the final recursion.

6.5.1 What we already know

The number of vertices in MST (C ) ∪ M C is exactly n, and the number of edges isexactly mB + n − 1, where mB is the number of edges in M C. This follows from thefact that C covers all vertices and is edge-disjoint, so M ST (C ) is a spanning tree of G. After raising the density of MST (C ) ∪ M C via a Fredman-Tarjan iteration, thenumber nR of vertices at most

mBm

n.

If we do not clean up after the Fredman-Tarjan iteration, then mR = mB+n−1 ≥mB + mBm n = mB(1 + nm) for a big enough graph. Here we are making the (possibly

big) assumption that we have managed not to corrupt only a fraction of edges in thegraph, so mBm < 1. So mR ≥ (1 + 1

D)mB.

36



6.5.2 The recursion formula

Let T (m, n) be the maximum running time for any graph with m edges and n ver-tices, and let t(m, n) = T (m, n)/cm , for any constant c. Below let the total over-

head, including the time it takes to find subproblems, take O(S (m, n)) time, and lets(m, n) = S (m, n)/bm, for any constant b.

Then recursive formula for the running time of MST-hierarchical can be writtenas

T (m, n) ≤x∈C

T (mx, nx) + T (mR, nR) + bs(m, n) (3)

=x∈C

cmxt(mx,nx) + cmRt(mR, nR) + bs(m, n)

≤

x∈C

cmxt1 + cmRt2 + bs(m, n) [see below]

= cmLt1 + cmRt2 + bs(m, n)

= (cmLt2 − cmL(t2 − t1)) + cmRt2 + bs(m, n)

= (cmt2 + c(mL + mR − m)t2) − cmL(t2 − t1) + bs(m, n)

= cmt2 + c((mR + mL − m)t2 − mL(t2 − t1) + b

cs(m, n)) (4)

In the above, t1 = maxxt(mx, nx) and t2 = t(mR, nR).To have the entire thing run in O(mf (m, n)), it suffices to have the following

restrictions on t(·, ·) and s(·, ·):

t(mR, nR) ≤ f (m, n) (5)∀x, t(mx, nx) ≤ f (m, n) − 1 (6)

s(m, n) = O(m), (7)

and the following restrictions on mL and mR:

(mR + mL − m)a − mL + bcm

= mRa + mL(a − 1) + ( bc − 1)a (8)

≤ 0

with a being f (m, n).We therefore look for a procedure that will guarantee the last requrement and

runs in O(m), since the first two requirements follow by induction. To see this, notethat if all four of the above hold, substituting t2 with a and t1 with a − 1 in (3) yields

37



an expression equal to or greater than (3). Propagating this replacement down to(4),

cma + c((mR + mL − m)a − mL +

b

cm) [from (5), (6)(7)]≤cma [from (8)]

We could have replaced α(·, ·) with any function f (m, n); as long as we can findsubproblems that will allow (5) through (8) to be fulfilled, we will have an algorithmthat runs in O(mf (m, n)). What’s special about α(m, n) is that, as we will show, if f (m, n) = α(m, n) + 2, then we can fulfil all these requirements.

6.6 Build-T

6.6.1 A hierarchy of minorsIt has already been noted that if G0, G1, . . . , GN is a sequence of contractions, withG0 = G and Gi+1 defined recursively as Gi \ C i − M Ci , where C i is a set of weaklycontractible subgraphs of Gi, then

MST (G) ⊆ M ST (C 0 ∪ . . . ∪ C N ) ∪ M C0 ∪ . . . ∪ M CN .

It is our job to find C i so that the conditions described in the previous section hold.This formulation leads to a hierarchical representation of the subgraphs in the

C is. Each vertex v in Gi really represents a subgraph of Gi−1, which contains multiplevertices of Gi−1. Therefore, we make v the parent of all the vertices in Gi−1 that

it contains, which likewise are subgraphs that themselves contain vertices of Gi−2,and so on. Thus we obtain a hierarchy of subgraphs, with a node at height j in thehierarchy representing both a vertex of Gi and a subgraph of Gi−1, whose childrenare its component vertices, and whose parent is the subgraph of Gi of which thisnode is a part. It is clear that this hierarchy is a tree, since every node has oneparent, and there are no other links other than parent-child ones. Call this hierarchyT .

6.6.2 Building the tree

It turns out that building T layer by layer, minor by minor, will not be as efficient as

building it postorder [10]. Recall that in postorder traversal, all children have lowertraversal numbers than their parents, and left children have lower traversal numbersthan right children.

38



Therefore the first subgraph we want to define is the one at the bottom of theleftmost path of T , which is a vertex of G0 (all leaves of T are vertices of G0) – call itv0. This is rather trivial, so we “visit” its siblings (by defining them to be subgraphs

in C −1), also vertices of G0, until we run out of siblings and “visit” the parent. Havingcome at last to the parent, we know exactly which vertices are in it, and so we are

able to gather them up and put the entire component in C 0.Now the parent C has siblings too, so we start again, by visiting vertices of G0

until we have visited enough, and are able to define another component C . Whenwe have determined that we have defined enough subgraphs of G0 make a subgraphof G1, we stop and throw all the previously-defined C s, which are both subgraphs of G0 but more importantly vertices of G1, into a component and put it into C 1. Thenwe start again at the bottom, with a vertex of G0.

We are not building T so much as discovering it, and recording the nodes wediscover. There are still two unknowns, however. One, how do we know that a node

has “run out of children,” and may be visited? Two, how do we know which siblingto visit next?

6.6.3 Determining which sibling to visit next

While building a component of G0, this is an easy question, and the answer is totake the lightest edge coming out the component. With subgraphs of later minors,the same is still true. After finishing a vertex of Gi (a subgraph of Gi−1), we needto find an appropriate vertex of G0 with which to start again. We will do this withsoft heaps. Each node of T on the active path – the path from the root GN to thenode just visited – maintains a heap (actually several heaps, as we will see later)

that stores the vertices of G0 to which its known descendant leaves are adjacent.If v is the vertex of Gi which we have just visited, and C v is its expansion in G0,

let u be the parent of v in T , i.e. u is the subgraph of Gi of which v will become apart. Then the heap associated with u now tracks every vertex in C v, keyed by theweight of the lightest edge that leads out of C v. In addition, all ancestors of u alsokeep similar heaps. Note that a heap for a node in T only exists if we have visitedone of its children. The next vertex of G0 that we visit will be the min element fromall these heaps. Let this vertex from G0 be v0.

However, we don’t always just start a new bottom-level component with v0 andpropagate the built components back to v. We also keep track of the min-link

between every pair of components on the active path. The min-link between a nodev and its ancestor w is the lightest edge between v and any visited relative for whichw is the lowest common ancestor. The min-links keep track of internal edge costs,

39



while the heaps keep track of external costs. At all times we want to maintain thefollowing invariant:

Invariant1 The next edge taken is lighter than all the min-links in the active path.

At the time v0 is selected, the edge leading to it may be heavier than an existing min-link. To preserve the invariant, we contract subgraphs until the edge is indeed lighterthan any min-link. That is, if w is the highest ancestor such that min-link(w, z ) isheavier than the edge we selected, then for every partially completed subgraph z between u and w, call z finished and put it in the appropriate C i, except the directchild of w. All of these z , then, will have no other children. We have to do somethingspecial with the direct child w of w, since we don’t want to trigger the signal thatcauses the algorithm to think w is finished before we get a chance to add v0 (thiswill be explained below). The min-link between w and w leads out of w and intoanother child w of w. Take these two children of w, and create a new child nodefuse(w, w) whose children are w and w .

At this point we are ready to add v0 to the newest bottom-level component.Note that, because of the invariant, the min-links coming out of a higher node in

T are always heavier than the min-links coming out of lower nodes.

6.6.4 When a node runs out of children

One reason we decide that a node has no more children and thus should be visitedwas described in the last section, when component’s growth is cut short to preservethe invariant. The only other time we stop and decide to finish visiting a node iswhen the subgraph gets big enough. Specifically, a node that is a subgraph of G

i has

no more than A(s, i + 1) children. The parameter s is defined to be

s = min

s : A(t,

m

n

1/4 < n

.

Remark. The leftmost child of any node terminated growth due to the size constraint,because while the child’s subtree was being traversed, the parent had no other de-scendants that were not in the child. This means that any non-terminal node has atleast one child of size A(t, i + 1), where Gi is the minor to which the node belongs.

Remark (2). The previous remark implies that the total number of vertices in Gi

that did not end up in one-element subgraphs (resulting from premature terminationduring expansion) is no more than 2n/A(t, i). Pettie proves this in [10].

40



6.6.5 Data structures and corruption

Each component on the active path maintains a list of soft heaps. It should be clearthat there is only one component per minor under construction at a time. Let X i

denote the active component for Gi. Note: Chazelle and Pettie number X i going inthe opposite direction, with the component of G0 being X k, and the sole node at theroot of T being X 0.

Recall that the number of corrupt items in a soft heap at most N , where N isthe total number of inserts. As Pettie and Chazelle both point out, once we delete K items from the heap, the heap is free to corrupt another K elements without violatingits N corruption constraint. To alleviate the amount of corruption that would becaused by continuously deleting and inserting and re-inserting elements into severalheaps, we instead maintain many different heaps.

X i maintains a heap H (i) and additional heaps H j(i) for all j > i, as well asa special heap H ∞(i). An edge is put into H

j(i) if the endpoint not in X

i is also

incident to X j via another edge, and not incident to any X l for i < l < j. If an edgeis in H ∞(i) then it is not incident to any ancestors X j. An edge is put into H (i) if its other endpoint is already accounted for in one of the H j(i)s.

After we grab a new vertex v from G0, we insert all its incident border edges intothe appropriate heap. In addition, adding v to the current component changes someedges from external to internal. We delete those from their respective heaps.

When we finish visiting a node and its descendants, we put all edges in the heapsmaintained by X i into the appropriate X j heap and discard corrupt edges in X j. If an edge is eligible for H (i+1) then it is inserted there; otherwise redundant edges arethreshed out and H j(i) is melded with H j(i + 1). There are further details involving

finding the minimum edge among redundant edges; the reader is invited to look at[4] and [10].

In addition, X i maintains a list of min-links for all j < i. Chazelle notes that,every time we finish visiting a node, or grab a new one, the min-links can be updatedin time quadratic in the length of the active path.

6.6.6 Error rate and running time

Setting = 1/8 gives us a total of at most m2

+ d3n corrupt border edges, since onaverage each edge is inserted and reinserted into a heap at most four times. The totalcost of the heap operations, the inserts, deletes, melds, and comparing min-links is

O(m log 1/ + d2n), and the min-links contributes O(m) time [10].

41



1. It’s simple(r than the last one).

2. It’s theoretically interesting because it shows that a minimum spanning treecan be found in time proportional to the least number of comparisons needed,

on a pointer machine.

We will refer to this algorithm as MST-decision-tree.

7.1 Decision trees and optimality

We’ve all worked with decision trees at some level. A decision tree can chart thecourse of an algorithm, with each internal node representing a possible branchingpoint, and each leaf containing a possible output. In the case of sorting a list, forexample, each edge weight comparison is a node with two children, one for the ≤result, and one for the > result, and at each leaf a string which is in sorted order if

the decision tree is correct.In general, pointer-machine MST algorithms have binary comparison as their

basic action. Any instance of a deterministic algorithm can be distilled into a decisiontree with the internal nodes representing edge weight comparisons, and two childrenper node.

Let’s take a look at Kruskal’s algorithm. Every time two weights are comparedduring the initial sort there is a node and a two possible child paths. If we knowwhich edges are present (and therefore know beforehand which edges will createcycles if they are added), then the sort order fully determines the MST. Then, wecan say that Kruskal’s algorithm really represents a class of decision trees, or a way

to generate a decision tree for an input graph topology.The height of a decision tree is the maximum length of a path in the tree. Thatis, on a decision tree for a particular unweighted graph, the height is the numberof comparisons we make in the worst-case permuation of edge weights. We shallsay a decision tree is optimal if it is correct and there is no correct decision tree of lesser height. Let T ∗(U ) denote the optimal decision tree height for an unweightedgraph U . Kruskal’s does not always generate an optimal decision tree. For example,given a connected graph of n vertices and n − 1 edges, Kruskal’s makes at least(n−1) log(n−1) comparisons during the initial sort, when it could have just returnedthe set of all edges without doing any work!

Call the class of a graph G the set of all graphs with the same number of edges

and vertices, denoted by G m,n. For refererence, there are

(n

2)m

such graphs in G m,n.We are interested in all the decision trees generated by MST-decision-tree for anyparticular class. Define T ∗(m, n) to be maxT ∗(U ) : U ∈ G m,n. That is, if some

43



hypothetical MST algorithm makes the optimal number of comparisons for eachgraph, T ∗(m, n) is the worst-case number of comparisons possible for a graph withm edges and n vertices. The big result of [11] and this section is there is an algorithm,

MST-decision-tree runs in O(T

∗

(m, n)) time for any graph in G m,n.The hypothetical algorithm above makes the optimal number of comparisons forany graph G. However, this should not be taken to mean that Pettie and Ramachan-dran’s algorithm does the same. The promise is that the number of comparisons andthe amount of time taken for a graph with m edges and n vertices is under T ∗(m, n),which depends only on the class, not the individual graph. For example, one graph inG100,100 is the cycle on 100 vertices, in which case 100 comparisons are needed to findthe edge to exclude. On the other hand, if G is a path on 100 vertices and an extraedge between the last vertex on the path and the third-to-last, then the longest cy-cle is three edges long and only three comparisons are needed. MST-decision-treeguarantees the same time bound for both.

7.1.1 Breaking up the decision tree

Lemma 14. Suppose G is a graph with m edges and n vertices, and let C be an

edge-disjoint collection of subgraphs. Then C ∈C

T ∗(C ) ≤ T ∗(m, n) .

Sketch of proof. The main idea of this proof is that by taking union of all the C swe create a graph H that 1) has at most m edges and n vertices, and 2) has MSTequal to the union of all the M ST (C )s. T ∗(H ) is then clearly below T ∗(m, n). Thesecond main idea is that, by stacking the optimal decision trees for the C s, we cancreate an optimal decision tree for H . For details, see [11].

This will allow us to recurse in strongly contractible subgraphs without messingup the time bound.

7.2 DenseCase

Pettie and Ramachandran note that several previous superlinear algorithms can bemade to run in guaranteed to run in linear time if the graphs are kept sufficientlydense.

DJP with a Fibonacci heap, for example, runs in O(m + n log n) time, so byensuring that k(m/n) ≥ log n for some fixed k, we also make sure that n log n =O(m), so the entire thing runs in O(m). For the MST-hierarchical procedurewhich we will describe in the next section, log n < A(k, m/n) =⇒ α(m, n) < k,bringing the O(mα(m, n)) bound down to O(km) = O(k). For this algorithm, Pettie

44



and Ramachandran single out a relatively simple algorithm with an easy densityrequirement by Fredman and Tarjan [7], introduced in the same paper that debutedthe Fibonacci heap (also described in 3.3). It runs in time O(mβ (m, n)), where

β (m, n) = mini : log

(i)

n ≤ (m/n), so we only need m/n ≥ log

(k)

(n) for β (m, n) ≤k. We’ll call the Fredman-Tarjan algorithm DenseCase. DenseCase may also operateon graphs that have self-loops and multiple edges without affecting the running timeanalysis.

However, when speaking of the asymptotic runtime, we must account for graphsthat do not meet these requirements. Therefore we’ll only run DenseCase afterenough processing to guarantee the density requirement.

7.3 Building and storing decision trees

7.3.1 Keep the parameters small

Pettie and Ramachandran calculate a set of optimal decision trees for all graphs onr vertices. For any number r, the time needed to build all the decision trees on rvertices is a little horrendous. Pettie and Ramachandran go over this calculation, but

in short it is upper-bounded by 224r2

. This number was obtained by hypothesizinga brute-force calculation – building all possible decision trees for all possible graphson n vertices, testing them for correctness by trying out every permutation of edgeweights, and eventually taking the shortest correct tree. However, they also notethat if r < log(3) n, then the entire calculation runs in O(n)!

7.3.2 Emulating table lookups

This subsection explains how, given k subgraphs, each of which have r or fewervertices, we can find the corresponding decision trees in O(kr2 + n) time, for r <log(3) n. These methods are unique to the pointer machine model, and are onlyrelevant when we can’t do table lookups – the important thing to take away is thefinal running time of this retrieval process and the fact that we are able to achievethis time on a pointer machine.

We’ve built 2(r2) decision trees, and now we need to be able to use retrieve themwhen they are needed. The obvious strategy is to store them in a table, and simplyaccess the table entry when we need to. However, the pointer machine model disal-lows such a method, or any method requiring the computation of a machine address.

Theorists running pointer machines have found a way to emulate table lookups bysorting. Sorting takes longer than a table lookup, but under certain circumstancesit can run fast enough.

45



The intuition is this: if I have N things that are orderable, and I have anotherthing that is identical to exactly one of my N things, then if I sort these N +1 things,then my extra thing should show up next to the original thing it matches. Thus the

time it takes to find the original thing is the time it takes to sort the collection of N + 1 things. Clearly if I only had one thing to look up, I wouldn’t bother with thesort; I would just scan the original list of N things. But if I have k extra things, Ican play the same trick, and scan through the sorted string once to find the matchesto all my query objects. So the time it takes to find matches for k extra things isthe time to sort a total of N + k things.

Buschbaum et al. [3] encode a graph on r vertices to a string of r2 symbols,basically by listing the edges present and padding short strings with nulls.

Then we throw our 2(r2) original graphs and k query graphs together and performa bucket sort, returning our items in lexicographic order (there is a natural orderingon the vertex identifiers, and the encodings are basically strings of vertex identifiers).

How long does this take? The bucket sort performs r2 passes, one for each symbolin an encoding. In one pass we need to put each of our elements to be sorted into

a bucket, and we have 2(r2) + k items. The total time taken is O(r22r2

+ r2k) =O(n + r2k).

As a final note, we can implement bucket sort with linked lists instead of arrays,ensuring that we do not violate the rules of the pointer machine. See [3] for details.

7.4 Partitioning

We have three components that seem relevant: DenseCase, the optimal decision

trees, and the strong contraction rule.As the last step we will show how to find an edge-disjoint collection of subgraphsD such that

1. For some corruption G of G, all D ∈ D are strongly contractible with respectto G .

2. Every vertex in G falls in at least one D ∈ D.

3. Let D be the collection of subgraphs obtained by merging any two subgraphsof D that share a vertex. D is the set of connected components of ∪D∈DD.Then every element in D has at least log(3) n vertices.

4. The process of finding D takes O(m) time.

46



7.4.1 Relevance

We can use DenseCase Suppose C is a a collection of subgraphs such that everyC ∈ C has at least log(3)(n) vertices and partitions the vertices of G. First, contract-

ing G across C without removing redundant edges yields a graph with m edges andn < n/log(3)(n) vertices.

n < n

log(3)n

=⇒ m

n >m

n

log(3) n

=⇒ m

n > log(3) n.

Thus we can run DenseCase on the contracted graph, and it will finish in O(m).

(Even if we do remove duplicate edges, running DenseCase on the cleaned-up graphwill be strictly faster than running it on the more complex graph. The stipulationthat we do not clean up the graph prior to passing it in to DenseCase is purely tomake the analysis simpler.)

We can use decision trees Now if D is an edge-disjoint collection of subgraphssuch that every D ∈ D has less than or equal to log(3) n vertices, then we havea precomputed optimal decision tree for every subgraph in D. So we can find∪D∈DMST (D) in

D∈D T ∗(D) < T ∗(m, n) time, by Lemma 14.

Furthermore, if D1, D2, . . . , D j is any subset of D, then the MST of their union

is M ST (D1) ∪ MST (D2) ∪ . . . ∪ MST (D j), since they are edge-disjoint. So when wecreate D by merging subgraphs that share vertices, we don’t need to do any extrawork to find the MSTs of the subgraphs in D.

Now if we add the requirement that D covers every vertex , then by combiningany components of D that share vertices, we can obtain a set of subgraphs thatpartition the vertices, as required above.

7.4.2 Finding partitions

We perform one Fredman-Tarjan iteration, described in 3.3, with two modifications:we use a soft heap instead of a Fibonacci heap, and instead of stopping the growth

of a component when the heap gets too large, we stop growth when the componentsize reaches r =

log(3)(n)

. In addition, after we finish growing a component, we

store the set of vertices in that component and put that set in D . The use of a soft

47



heap entails corruption; after finishing a component, we discard all border edges thathave been corrupted. Another consequence of using a soft heap is that there is nodecreasekey operation. Instead, we just insert all the edges we find and trust that

the heap will return the “minimum”-weight edge.Since a Fredman-Tarjan iteration doesn’t stop until all vertices are marked, i.e.put into a component, D covers all vertices. Components stop growing when orbefore they reach r vertices, so every member of D has r or fewer vertices, makingit eligible to get its optimal decision tree applied. Finally, every component in Dstopped growing either when it reached r vertices, or when it collided with anothercomponent. As in the Fredman-Tarjan iteration, the first component of a set of components linked by shared vertices must have reached its mature size, so thatentire set of components must collectively have r or more vertices.

The use of a soft heap ensures that the procedure runs in O(m) time, and alsothat it generates a of G. The corrupted graph G has at most 2m corrupt edges,

since every edge is inserted at most twice.

7.5 Putting things together

We are now ready to put everything together. The algorithm is as follows:

1. Precompute the decision trees for all graphs with fewer than log(3) n vertices.Store the result in the variable dectree.

2. Run Partition. Store the corrupted graph in G and the collection of sub-graphs in D.

3. Use the sorting trick to retrieve the decision trees of the graphs in D.

4. Apply decision trees to get MST of each subgraph. The result is ∪D∈DMST (D).

5. Combine subgraphs that share vertices to get a new collection of subgraphs D.

6. Contract G across D to get G \ D. Remove bad edges M D

to get G \ D − M C .

7. Run DenseCase(G \ D − M C ) to get M ST (G \ D − M C ).

8. Two Borůvka iterations

9. Recurse: MST-decision-tree(∪D∈DMST (D) ∪ MST (G \ D

− M C ) ∪ M C ).

48



7.6 Time complexity

The time taken for each step is as follows:

1. Precomputing decision trees – O(n)

2. Partitioning graph – O(m + n). Contracting graph – O(m + n).

3. Sorting – O(m + n).

4. Applying decision trees – O(T ∗(m, n)).

5. Finding connected components of ∪D– O(m + n).

6. Contracting across D– O(m + n).

7. DenseCase – O(m).

8. Borůvka iterations – O(m + n)

9. Recursion – O(T ∗(m/2, n/4))

References

[1] Otokar Borůvka. Wikipedia.

[2] A.M. Ben-Amram. What is a pointer machine? ACM SIGACT News , 26(2):88–95, 1995.

[3] Adam L. Buchsbaum, Haim Kaplan, Anne Rogers, and Jeffery R. Westbrook.Linear-time pointer-machine algorithms for least common ancestors, mst verifi-cation, and dominators. In STOC ’98: Proceedings of the thirtieth annual ACM

symposium on Theory of computing , pages 279–288, New York, NY, USA, 1998.ACM.

[4] Bernard Chazelle. A minimum spanning tree algorithm with inverse-ackermanntype complexity. J. ACM , 47(6):1028–1047, 2000.

[5] Bernard Chazelle. The soft heap: an approximate priority queue with optimal

error rate. J. ACM , 47(6):1012–1027, 2000.[6] Jason Eisner. State-of-the-art algorithms for minimum spanning trees - a tutorial

discussion. Master’s thesis, University of Pennsylvania, 1997.

49



[7] Michael L. Fredman and Robert Endre Tarjan. Fibonacci heaps and their usesin improved network optimization algorithms. J. ACM , 34(3):596–615, 1987.

[8] Michael L. Fredman and Dan E. Willard. Trans-dichotomous algorithms for

minimum spanning trees and shortest paths. Journal of Computer and System Sciences , 48(3):533 – 551, 1994.

[9] David R. Karger, Philip N. Klein, and Robert E. Tarjan. A randomized linear-time algorithm to find minimum spanning trees. J. ACM , 42(2):321–328, 1995.

[10] Seth Pettie. Finding minimum spanning trees in o(m α(m,n)) time. Tech-nical report, The University of Texas at Austin, 1999.

[11] Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning treealgorithm. Journal of the ACM , vol. 49, no. 1:16–34, 2002.

50

three minimum spanning tree algorithms

Documents