algorithms and data structures - clarkson universityalexis/cs344/notes/notes.pdf2 chapter 1....

182
Algorithms and Data Structures Spring 2019 Alexis Maciel Department of Computer Science Clarkson University Copyright c 2019 Alexis Maciel

Upload: others

Post on 22-Feb-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Algorithms and Data Structures

Spring 2019

Alexis MacielDepartment of Computer Science

Clarkson University

Copyright c© 2019 Alexis Maciel

ii

Contents

1 Analysis of Algorithms 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Measuring Exact Running Times . . . . . . . . . . . . . . . . . . . . . 2

1.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Other Asymptotic Relations . . . . . . . . . . . . . . . . . . . . . . . . 15

1.6 Some Common Asymptotic Running Times . . . . . . . . . . . . . . 20

1.7 Basic Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.8 Analyzing Summations . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.9 Worst-Case and Average-Case Analysis . . . . . . . . . . . . . . . . . 32

1.10 The Binary Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . 36

2 Recursion 43

2.1 The Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.2 When to Use Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.3 Tail Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.4 Analysis of Recursive Algorithms . . . . . . . . . . . . . . . . . . . . . 59

iii

iv CONTENTS

3 Sorting 673.1 Selection Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.2 Insertion Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.3 Mergesort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.4 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913.5 Analysis of Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983.6 Partitioning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053.7 A Selection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093.8 A Lower Bound for Comparison-Based Sorting . . . . . . . . . . . . 1183.9 Sorting in Linear Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

4 Heaps 1274.1 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1274.2 The Heap Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . 1294.3 Heap Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1324.4 Heapsort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1384.5 Building a Heap in Linear Time . . . . . . . . . . . . . . . . . . . . . 139

5 Hash Tables 1455.1 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1455.2 Direct-Address Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485.3 The Hash Table Data Structure . . . . . . . . . . . . . . . . . . . . . . 1495.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6 Binary Search Trees 1556.1 The Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1556.2 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1556.3 Insertion and Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

CONTENTS v

7 Red-Black Trees 1617.1 The Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1617.2 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1637.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1637.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

Index 175

vi CONTENTS

Chapter 1

Analysis of Algorithms

In this chapter, we will learn how to analyze algorithms in order to evaluatetheir efficiency. We will also discuss the relative advantages and disadvantagesof analysis compared to measuring exact running times through testing.

1.1 Introduction

In general, an algorithm is efficient if it uses a small amount of computationalresources. The two resources that are most often considered are running timeand memory space. An example of another resource is randomness.1

In this chapter, we will learn how to evaluate the efficiency of algorithms. Wewill focus on running time but the main concepts and techniques we will learnalso apply to other resources. We will learn that the efficiency of algorithms canbe evaluated by analyzing them. The analysis can be done from pseudocode,

1Algorithms that use randomness are usually studied in a course such as CS447 ComputerAlgorithms.

1

2 CHAPTER 1. ANALYSIS OF ALGORITHMS

which allows us to choose efficient algorithms without having to implement theinefficient ones.

Note that algorithm analysis is also useful for the analysis of data structures,if only because data structure operations are algorithms.

Study Questions

1.1.1. What does it mean for an algorithm to be efficient?

1.1.2. What two computational resources are most often considered?

1.2 Measuring Exact Running Times

When choosing or designing an algorithm for a particular problem, there aretwo questions that can be asked: Is the algorithm fast enough? Is it as fast aspossible?

The first question is perhaps the more pragmatic. To be able to answer thatquestion, however, we need to know exactly what is meant by fast enough. Onepossibility would be precise time targets such as 5 ms. Now, the running timeof an algorithm depends on several factors including what data it is used on,what computer it runs on and exactly how it is coded. (The input data couldbe arguments, input files or data entered by the user.) If all that information isavailable, then tests can be run to accurately determine if the algorithm is fastenough.

But very often, there are no precise time targets to meet. In that case, thesafest approach is to choose the fastest algorithm among the available alterna-tives. So how can we determine which of several possible algorithms is fastest?

1.2. MEASURING EXACT RUNNING TIMES 3

n 10 103 106

log n µs 3 µs 10 µs 20 µs

n µs 10 µs 1 ms 1 s

n2 µs 100 µs 1 s 12 days

Table 1.1: Running times of three algorithms

An obvious way is to implement each of the algorithms, run them and mea-sure their running times. The choice of what computer to use probably doesn’tmatter much since if an algorithm is significantly faster than another on onecomputer, the same is probably true on most if not all computers.

A more delicate issue is what inputs to use for the tests. Very often, we needan algorithm that will run well on a wide variety of inputs. So we could run testson various inputs and compute the average running time of each algorithm. Butthe running time of an algorithm can vary greatly, especially as a function of thesize of the input.

For example, suppose that three algorithms have running times log n µs, n µsand n2 µs, where n is the size of the input.2 Table 1.1 shows what these runningtimes are for various input sizes. When the input size is only 10, the differencebetween the running time of these three algorithms run is not that large. But atn= 103, the difference is significant and at n= 106, it is huge. Therefore, whencomparing algorithms by measuring their running times, it is important to usea wide range of input sizes.

So we can determine which of several algorithms will be the fastest as fol-

2In these notes, as is customary in computer science, when the base of a log is not specified,it is assumed to be 2.

4 CHAPTER 1. ANALYSIS OF ALGORITHMS

lows: implement the algorithms, run them on a wide variety of inputs, andmeasure the running times. Of course, for the comparisons to be valid, the al-gorithms must be coded in the same language and run on the same computerand similar inputs.

This approach has several significant disadvantages. First, it requires thatall the algorithms be implemented, even those than will end up not being used.Second, writing test drivers and running tests takes time, especially since wemust test on a good number of inputs of each size to make sure we have arepresentative sample. Third, because all the algorithms being compared mustbe implemented in the same language and tested on the same computer and onsimilar inputs, earlier tests done on different computers, with different inputs,or using different programming languages often need to be repeated.

In the rest of this chapter, we will learn that it is possible evaluate the runningtime of an algorithm in a way that addresses these problems.

Study Questions

1.2.1. When comparing the efficiency of algorithms, why is it usually importantto compare running times over a wide range of input sizes?

1.2.2. What are three significant weaknesses of comparing algorithms by mea-suring exact running times?

1.3 Analysis

Our goal is to find a way to assess the running time of an algorithm withouthaving to implement and test it. We also want this assessment to be valid forall implementations of the algorithm and for all the computers on which the

1.3. ANALYSIS 5

for i = 0 to n−1print a[i]

Figure 1.1: Printing the contents of an array

algorithm may run. And, of course, to be useful, this assessment should allowus to compare the running time of various algorithms.

Let’s consider an example. Figure 1.1 shows pseudocode for an algorithmthat prints the contents of an array. The running time of this algorithm can bedetermined as follows. Before the first iteration of the loop, i is initialized andits value is compared to n−1. At every iteration of the loop, an array elementis accessed, then printed, i is incremented and then again compared to n. Theloop is executed n times. Therefore, the running time of the algorithm is

t(n) = cassign + ccomp + (cindex + cprint + cincr + ccomp)n

where the c constants are the running times of the various basic operations per-formed by the algorithm. For example, cassign is the time it takes to assign a valueto an integer variable.

We can simplify this expression by letting a = cindex + cprint + cincr + ccomp andb = cassign + ccomp. The running time of the algorithm can then be written as

t(n) = an+ b

If we knew the exact values of the constants a and b, this expression wouldallow us to determine the exact running time of the algorithm on inputs of anysize. But the values of these constants depend on exactly how the algorithmis implemented and on which computer the algorithm will run. Recall that wewant to assess the running time of an algorithm without having to implement

6 CHAPTER 1. ANALYSIS OF ALGORITHMS

it. We also want this assessment to be valid for all computers. Therefore, wewill not determine the values of the constants and instead focus on the “generalform” of the running time as a function of n.

In our example, the running time of the printing algorithm is a linear functionof n. Is that useful information? Knowing that the running time is a linearfunction doesn’t allow us to determine the exact running time of the algorithmfor any input size. But suppose that another algorithm has a running time that’s aquadratic function of n, for example. Then we know that when n is large enough,the printing algorithm runs faster, much faster, than this other algorithm. Thisbasic fact about linear and quadratic functions is apparent in the numbers thatwere given in Table 1.1. Therefore, is is useful to know that the running time ofthe printing algorithm is a linear function of n.

So analyzing an algorithm to determine the general form of its running timeis a useful alternative to the measurement of exact running times through test-ing. It is useful because it can be used to determine that an algorithm will befaster than another one on every input that is large enough.

Analysis has three main advantages over measuring exact running timethrough testing. First, analysis can be carried out from pseudocode, withouthaving to implement the algorithms. Second, analysis does not require writingtest drivers or performing possibly time-consuming tests. Third, each algorithmneeds to be analyzed only once because the results of the analysis are valid forevery (reasonable) implementation of the algorithm and every computer anddata the algorithm may run on.

On the other hand, analysis has three main disadvantages over measuringexact running times. First, it is not as precise. For example, it does not allowus to distinguish between two linear-time algorithms or to determine if an algo-rithm meets specific time targets. Second, analysis is valid only for large enoughinputs, not for small ones. Third, analysis may require difficult mathematics, al-

1.3. ANALYSIS 7

though this is usually not the case with reasonably simple algorithms.In general, analysis is a convenient and reliable way of quickly identifying

large differences in running times. When more accuracy is needed, or when theanalysis is too difficult, which can happen, we must then resort to measuringexact running times through testing.

In the following sections, we will learn more about the analysis of algorithms.But first, we end this section with a couple of general notes about analysis.

First, when analyzing the running time of the array printing algorithm, wegave separate names to the running times of the various basic operations. This isnot really necessary. For example, we could simply consider that these constantsare all equal to 1. This is acceptable because it doesn’t change the general formof the function t(n), it only changes the values of the constant factors, which areunknown anyway. For example, for the array printing algorithm, we would havestill obtained a linear function. (Note that considering that the running times ofthe basic operations are all 1 means that t(n) equals the number of operationsperformed by the algorithm.)

Second, when analyzing the running time of the array printing algorithm,we said that the running time of basic operations such as integer operations andarray indexing are constant, meaning that they are independent of the size ofthe array. It turns out that this isn’t exactly true. For example, the time it takesto assign a value to an integer variable depends on the number of bits usedto store that integer. And the number of bits required to store an integer x isapproximately log |x |.3

Now, there are two types of integers in the array printing algorithm. On onehand, there are the integers i and n whose value is at most n. On the other

3At Clarkson, the representation of integer values in a computer’s memory, as well as someaspects of the implementation of the basic operations on those integers, are covered in a coursesuch as CS241 Computer Organization.

8 CHAPTER 1. ANALYSIS OF ALGORITHMS

hand, there are the array elements whose values could in principle be muchlarger than that. In general, we consider that integers are small if their valuesare at most polynomial in n, that is, at most nc for some constant c, where n isthe size of the input or the parameter of interest. This implies that operationson small integers can be performed in time approximately c log n, where c is aconstant. If we assume that the arrays contain only small numbers, which is acommon assumption, then the running time of the array printing algorithm isreally of the form

t(n) = (an+ b)c log n

Since this extra factor of log n usually occurs in every algorithm that solvesthe same problem, it is common practice to omit it. In other words, it is commonto do as we did earlier and pretend that all basic operations can be executed inconstant time. But note that this can only be done for basic operations performedon small numbers. For example, if the arrays could contain large numbers, wewould need to know how large and then take this into account in our analysis.

Study Questions

1.3.1. As described in this section, what does analysis seek to determine?

1.3.2. What are three advantages and three disadvantages of analysis over themeasurement of exact running times through testing?

1.4 Asymptotic Analysis

In the previous section, we saw that the general form of the running time of analgorithm, when expressed as a function of a parameter such as its input size, isa useful measure of the efficiency of the algorithm. For example, if we determine

1.4. ASYMPTOTIC ANALYSIS 9

that an algorithm has a linear running time then we know that it will run fasterthan any quadratic-time algorithm on every input that is large enough.

But what should we make of a running time of the form t(n) = an+ b log n+c? How does that compare to linear and quadratic running times, for example?

The key point to remember is that analysis allows us to compare the runningtime of algorithms when the input is large enough. When n is large enough,the terms b log n and c are insignificant compared to an. In other words, thedominant term an is the one that will essentially determine the running time forlarge enough values of n. This means that when n is large enough, an+b log n+cwill behave essentially like the linear function an.

We can make this more precise by taking the limit of the ratio of t(n) and itsdominant term an:

limn→∞

t(n)an= lim

n→∞

anan+ lim

n→∞

b log nan

+ limn→∞

can= 1+ 0+ 0= 1

Once again, this says that when n is large enough, t(n) grows essentially likean. This is useful information because, for example, it tells us that when n islarge enough, t(n) will be much smaller than any quadratic running time.

Therefore, to compare t(n) to other running times, it is useful to simplifyt(n) to an. In fact, since we don’t know the value of a, we can simplify t(n)even further to just n. Note that

limn→∞

t(n)n= a

and that a > 0 since running times are positive.

In general, to make it easier to compare a running time t(n) to other running

10 CHAPTER 1. ANALYSIS OF ALGORITHMS

times, we find a simple function f (n) such that

limn→∞

t(n)f (n)

= c

for some constant c > 0.A problem occurs when the above limit does not exist. Consider, for example,

the possibility that the running time of an algorithm is the following function:

t(n) =

¨

n if n is odd

2n if n is even

This running time seems similar to n but the limit limn→∞ t(n)/n does not exist.Now, it is still true that t(n) grows like n. The key is the following pair of

inequalities:n≤ t(n)≤ 2n

In other words, t(n) is sandwiched between two constant factors of n. This forcest(n) to grow like n. In particular, when n is large enough, t(n) will be muchsmaller than any quadratic running time because t(n) is no larger than the linearfunction 2n.

This idea of a running time being sandwiched between two constant fac-tors of a simpler function is therefore an important concept. It can be definedprecisely as follows.

Definition 1.1 Suppose that f and g are functions that map positive integers tononnegative real numbers ( f , g : Z>0 → R≥0 ). We say that f (n) is Θ(g(n))(“Theta of g of n”), if there are positive constants a and b and a number n0 suchthat for every n≥ n0,

ag(n)≤ f (n)≤ bg(n)

1.4. ASYMPTOTIC ANALYSIS 11

This definition says that when n is large enough, f (n) is sandwiched betweentwo constant multiples of g(n). And this forces f (n) to grow like g(n), whichmeans that when n is large enough, f (n)will be much smaller than any functionthat grows much faster than g(n) and much larger than any function that growsmuch slower than g(n). For example, in the case of our previous running timet(n), we have that t(n) isΘ(n) and, when n is large enough, t(n) is much smallerthan any quadratic function.

We can now revise what we said earlier: when analyzing an algorithm, tomake it easier to compare its running time t(n) to other running times, we finda simple function f (n) such that t(n) is Θ( f (n)). This type of analysis is calledasymptotic analysis. The function f (n) can be called the asymptotic running timeof the algorithm. (Note that this is not a precisely defined concept since wehaven’t defined what we mean by “simple”.)

When a function f (n) is Θ(g(n)), we say that f (n) is asymptotically equiv-alent to g(n). Some people also say that f (n) is order g(n). And when f (n) isΘ(g(n)), we often write f (n) = Θ(g(n)). But note that this equal sign is not areal equal sign. In particular, it doesn’t make sense to write Θ(g(n)) = f (n).

Note the conditions on the functions f and g in the definition of Θ: thesefunctions map positive integers to nonnegative real numbers. It is possible todefine the Θ notation in a more general way but it would be less convenient.And this restriction on the functions is not a problem for our purposes since ourfunctions will either be running times, which are always nonnegative, or simplefunctions like log n, n or n2, which are also nonnegative. Note that we exclude0 from the domain of the functions because log n is not defined when n= 0 andwe include 0 in the range of the functions because log n= 0 when n= 1.

Now, when analyzing running times of algorithms, limits are still useful be-cause they often give us an easy way to establish that a function is Θ of anotherone:

12 CHAPTER 1. ANALYSIS OF ALGORITHMS

Theorem 1.2 Suppose that f , g : Z>0→ R≥0. If

limn→∞

f (n)g(n)

= c

for some constant c > 0, then f (n) is Θ(g(n)).

For example, the first running time of this section was t(n) = an+ b log n+ c.We saw earlier that

limn→∞

an+ b log n+ cn

= a

In addition, a must be positive for this running to be positive. Therefore, t(n) isΘ(n).

To prove the theorem, we would need a precise definition of the concept ofa limit. We won’t go that far but with an intuitive understanding of what a limitis, we can still give a fairly convincing sketch of the proof.

Proof (Sketch.) Suppose that

limn→∞

f (n)g(n)

= c

for some constant c > 0. Intuitively, this means that as n grows, f (n)/g(n) getscloser and closer to c. Eventually, it must be that f (n)/g(n) is no larger than2c and no smaller than c/2, and that it remains that way forever. Say that thishappens when n≥ n0. Then we have that for every n≥ n0,

c2≤

f (n)g(n)

≤ 2c

which implies thatc2

g(n)≤ f (n)≤ 2cg(n)

1.4. ASYMPTOTIC ANALYSIS 13

This proves that f (n) is Θ(g(n)) (with a = c/2 and b = 2c). ut

Often, there is no need to explicitly compute limits because it is clear whatthe dominant term of the running time is. This can be stated precisely as follows:

Theorem 1.3 Suppose that f , g : Z>0 → R≥0. If c g(n) is the dominant termf (n), in the sense that f (n) = cg(n) + r(n), with c > 0, and

limn→∞

r(n)cg(n)

= 0

then f (n) is Θ(g(n)).

Proof By using the previous theorem, all we need to do is compute the limit ofthe ratio of the two functions:

limn→∞

f (n)g(n)

= limn→∞

cg(n)g(n)

+ limn→∞

r(n)g(n)

= c + 0= c

Since c > 0, f (n) is Θ(g(n)). ut

For example, consider again the running time t(n) = an+ b log n+ c. Theterm an dominates in the sense of the theorem since

limn→∞

b log n+ can

= 0

Therefore, t(n) is Θ(n).To summarize, in the previous section, we said that the goal of analysis is to

determine the general form of the running time t(n) of an algorithm. We cannow be more precise: the goal is to find a simple function f (n) that’s asymp-totically equivalent to t(n). In other words, a function f (n) such that t(n) is

14 CHAPTER 1. ANALYSIS OF ALGORITHMS

Θ( f (n)). This is called asymptotic analysis and the function f (n) can be calledthe asymptotic running time of the algorithm. The function f (n) can often befound by simply finding the dominant term of the running time. In other cases,we search for a function f (n) such that

limn→∞

t(n)f (n)

= c > 0

When this limit doesn’t exist, then we have to work directly with the inequalitiesof the Θ definition; that is, we find positive constants a and b such that

ag(n)≤ f (n)≤ bg(n)

when n is large enough.Several examples of asymptotic running times and how they compare to each

other will be given in a coming section. Later in this chapter, we will learn basicstrategies for analyzing the running time of simple algorithms.

Study Questions

1.4.1. What does it mean for a running time to be asymptotically equivalent toa function f (n)?

1.4.2. What does f (n) = Θ(g(n)) mean?

1.4.3. What is asymptotic analysis?

1.4.4. What is the asymptotic running time of an algorithm?

1.4.5. What is the main advantage of simplifying the running time of an algo-rithm?

1.5. OTHER ASYMPTOTIC RELATIONS 15

1.4.6. How can a limit be used to determine the asymptotic running time of analgorithm?

1.4.7. How can the dominant term of a running time allow us to determine theasymptotic running time of an algorithm.

Exercises

1.4.8. Below is a series of statements of the form f (n) = Θ(g(n)). Prove thateach of these statements. First do it by computing limits. Then, do it byexplicitly finding, in each case, positive constants a and b and a numbern0 such that ag(n)≤ f (n)≤ bg(n) for every n≥ n0. Justify your answers.

a) n+ 10= Θ(n).

b) n2 + n= Θ(n2).

c) 3n2 − n= Θ(n2).

d) 3n2 − n+ 10= Θ(n2).

1.4.9. Show that if c and d are any two numbers greater than 1, then logc n =Θ(logd n). (This implies that when using theΘ notation, it is not necessaryto specify the base of logarithms.)

1.5 Other Asymptotic Relations

Besides Θ, there are a few other asymptotic relations that are useful when com-paring running times of algorithms. The following definition includes Θ forcompleteness.

16 CHAPTER 1. ANALYSIS OF ALGORITHMS

Definition 1.4 Suppose that f , g : Z>0→ R≥0.

1. We say that f (n) isΘ(g(n)) (“Theta of g of n”), if there are positive constantsa and b and a number n0 such that for every n≥ n0,

ag(n)≤ f (n)≤ bg(n)

2. We say that f (n) is O(g(n)) (“big-Oh of g of n”), if there is a positive constantb and a number n0 such that for every n≥ n0,

f (n)≤ bg(n)

3. We say that f (n) is Ω(g(n)) (“big-Omega of g of n”), if there is a positiveconstant a and a number n0 such that for every n≥ n0,

f (n)≥ ag(n)

4. We say that f (n) is o(g(n)) (“little-oh of g of n”), if

limn→∞

f (n)g(n)

= 0

5. We say that f (n) is ω(g(n)) (“little-omega of g of n”), if

limn→∞

f (n)g(n)

=∞

Intuitively, Θ says that when n is large enough, f (n) is about the same asg(n), O says that f (n) is not much larger than g(n), Ω says that f (n) is notmuch smaller than g(n), o says that f (n) is much smaller than g(n),ω says that

1.5. OTHER ASYMPTOTIC RELATIONS 17

f (n) is much larger than g(n). In order words, Θ is essentially like =, O is like≤, Ω is like ≥, o is like <, and ω is like >.

For example, n is o(n2) because

limn→∞

nn2= lim

n→∞

1n= 0

And consider the function

f (n) =

¨

n if n is odd

n2 if n is even

All we can say is that f (n) is O(n2) and f (n) is Ω(n).These asymptotic relations have many interesting and useful properties. For

example, we know that if a < b, then it is also true that a ≤ b. So if little-oh islike < and big-Oh is like ≤, then the following should be true:

Theorem 1.5 If f (n) is o(g(n)), then f (n) is O(g(n)).

Proof (Sketch.) Suppose that f (n) is o(g(n)). This means that

limn→∞

f (n)g(n)

= 0

Once again, intuitively, this means that as n grows, f (n)/g(n) gets closer andcloser to 0. Eventually, it must be that f (n)/g(n) becomes no larger than 1 andremains that way forever. Say that this happens for n ≥ n0. Then for everyn≥ n0, f (n)/g(n)≤ 1, which implies that f (n)≤ g(n). This shows that f (n) isO(g(n)) (with b = 1). ut

Here’s another example. We know that if a < b, then it is not true that a ≥ b.

18 CHAPTER 1. ANALYSIS OF ALGORITHMS

So if little-oh is like < and big-Omega is like ≥, then the following should betrue:

Theorem 1.6 If f (n) is o(g(n)), then f (n) is not Ω(g(n)) (and therefore notΘ(g(n)) either).

Proof Suppose that f (n) is o(g(n)). This means that

limn→∞

f (n)g(n)

= 0

Now, suppose that f (n) is also Ω(g(n)). This means that there exists a > 0 suchthat f (n)≥ ag(n), when n is large enough. This implies that

limn→∞

f (n)g(n)

≥ limn→∞

ag(n)g(n)

= a > 0

Clearly, the limit of the ratio f (n)/g(n) cannot be both equal to 0 and greaterthan 0. So if f (n) is o(g(n)), then f (n) cannot be Ω(g(n)). ut

In the previous section, we said that if a running time t(n) is Θ(n), then,when n is large enough, t(n) is much smaller than any quadratic running time.We can now be more precise: t(n) is little-oh of any quadratic running time.

Intuitively, this should be clear. The fact that t(n) is Θ(n) means that t(n) isabout the same as n. If another running time t2(n) is Θ(n2), then that runningtime is about the same as n2. Since n is o(n2), this should imply that t(n) iso(t2(n)).

This intuition can be verified as follows. We want to show that

limn→∞

t(n)t2(n)

= 0

1.5. OTHER ASYMPTOTIC RELATIONS 19

So we want to show that the ratio t(n)/t2(n) is small. To do that, we can showthat t(n) is small while t2(n) is large.

The fact that t(n) is Θ(n) implies that there exists b > 0 such that t(n)≤ bn,when n is large enough. The fact that t2(n) is Θ(n2) implies that there existsa > 0 such that t2(n)≥ an2, when n is large enough. Therefore, when n is largeenough,

t(n)t2(n)

≤bnan2=

ban

This implies that

limn→∞

t(n)t2(n)

≤ limn→∞

ban= 0

On the other hand,

limn→∞

t1(n)t2(n)

≥ 0

because both t(n) and t2(n) are running times so they’re nonnegative. There-fore,

limn→∞

t(n)t2(n)

= 0

which means that t(n) is o(t2(n)).

Study Questions

1.5.1. What are the five asymptotic relations mentioned in the section and whatdo they mean, both intuitively and formally (precisely)?

Exercises

1.5.2. Show that if t1(n)≤ t2(n) and t2(n) is O(n2), then t1(n) is O(n2).

20 CHAPTER 1. ANALYSIS OF ALGORITHMS

RUNNING TIME COMMON NAME TYPICAL EXAMPLE

Θ(1) constant a single basic operation

Θ(log n) logarithmic fast searching algorithms

Θ(n) linear simple searching algo-rithms

Θ(n log n) n log n fast sorting algorithms

Θ(n2) quadratic simple sorting algorithms

O(nk), for k > 0 polynomial most algorithms that arefast enough to be useful inpractice

Θ(cnk), for c > 1, k > 0 exponential exhaustive searches of very

large sets

Table 1.2: Some common running times

1.5.3. Is it true that if t1(n) ≤ t2(n) and t2(n) is Θ(n2), then t1(n) is Θ(n2)?Justify your answer. (To justify a no answer, give a counterexample.)

1.5.4. Show that if t(n) is Θ(n log n), then t(n) is o(n2).

1.6 Some Common Asymptotic Running Times

Table 1.2 gives a list of common asymptotic running times. In this table, both cand k are constants.

1.6. SOME COMMON ASYMPTOTIC RUNNING TIMES 21

The expression Θ(1) may seem a bit odd. According to the definition of Θ,a running time t(n) is Θ(1) if, when n is large enough, a ≤ t(n) ≤ b, for somea, b > 0. In other words, when n is large enough, t(n) is bounded above andbelow by constants.

Note that we call aΘ(1) running time constant even though, strictly speaking,the running time may not be a constant function. The same comment applies tothe other running times in this table. For example, we call a Θ(n) running timelinear even though it may not be a function of the form an+ b.

In contrast to the other running times in this table, polynomial is typicallyviewed as an upper bound, so that, for example, an algorithm that runs in log-arithmic time is also viewed as running in polynomial time.

The running times of Table 1.2 are listed in increasing order in the sense thateach running time is little-oh of the next, except for Θ(n2), which is just a subsetof O(nk) when k ≥ 2.

We already saw numbers that show how large a difference there is betweenlogarithmic, linear and quadratic running times (Table 1.1). Table 1.3 providessome numbers that compare linear, quadratic and exponential running times.Table 1.1 makes it clear that quadratic-time algorithms are usually impracticalon large inputs while Table 1.3 makes it clear that exponential-time algorithmsare typically useless even for relatively small inputs.

Exercises

1.6.1. To each of Tables 1.1 and 1.3, add rows for the running times n log2 nand n3.

22 CHAPTER 1. ANALYSIS OF ALGORITHMS

n 10 20 40 60 80

n µs 10 µs 20 µs 40 µs 60 µs 80 µs

n2 µs 0.1 ms 0.4 ms 1.6 ms 3.6 ms 6.4 ms

2n µs 1 ms 1 s 13 days 37× 103 years 38× 109 years

Table 1.3: More execution times

1.7 Basic Strategies

In this section, we will do several examples of analysis that will illustrate someimportant basic strategies.

Algorithms often consist of a sequence of steps performed by other algo-rithms. The running time of these algorithms is simply the sum of the runningtime of the other algorithms.

Example 1.7 Suppose that an algorithm consists of three steps, performed byalgorithms A, B and C , in that order. Let TA(n), TB(n) and TC(n) denote therunning time of these algorithms. Then the running time of the overall algorithmis

T (n) = TA(n) + TB(n) + TC(n)

Now, suppose that the running times of these algorithms are Θ(n), Θ(1) andΘ(n), respectively. Then we can write that

T (n) = Θ(n) +Θ(1) +Θ(n)

This simply means that T (n) is the sum of three functions and these functionsare Θ(n), Θ(1) and Θ(n), respectively. Since the Θ(n) terms dominate, we get

1.7. BASIC STRATEGIES 23

that T (n) is Θ(n). ut

Intuitively, it should be clear that the last sentence of this example is correct:T (n) is Θ(n) because the Θ(n) terms dominate. But, if necessary — because wewant to be completely sure or to convince someone else — this intuition can beverified as follows. When n is large enough, we have that

T (n)≤ b1n+ b2 + b3n≤ (b1 + b2 + b3)n

Similarly, again when n is large enough,

T (n)≥ a1n+ a2 + a3n≥ a1n

because a2 and a3 are positive. Therefore, when n is large enough,

a1n≤ T (n)≤ (b1 + b2 + b3)n

which implies that T (n) is Θ(n).Here’s another example with different running times.

Example 1.8 Suppose that the running times of the algorithms A, B and C arenow Θ(n), Θ(n2) and Θ(1), respectively. Then

T (n) = Θ(n) +Θ(n2) +Θ(1)

= Θ(n2)

because the Θ(n2) function will dominate. ut

Algorithms often contain loops. Some loops are very easy to analyze whileothers require a bit more work. Here are some examples, in increasing order ofcomplexity.

24 CHAPTER 1. ANALYSIS OF ALGORITHMS

for i = 0 to n−1print a[i]

Figure 1.2: Printing the contents of an array

for i = 0 to n−1for j = 0 to n−1

print a[i,j]

Figure 1.3: Printing the contents of a two-dimensional array

Example 1.9 Figure 1.2 shows an algorithm we considered earlier in this chap-ter. This algorithm prints the contents of an array of size n.

A key observation about this algorithm is that the operations that control theloop as well as the operations in the body of the loop all run in constant time.Some of these operations are performed only once (the assignment i = 0, forexample), while other operations are executed n times (the test i < n and theoperations in the body of the loop, for example). This implies that the runningtime of the loop is of the form T (n) = an+ b. Clearly, T (n) is Θ(n). ut

Example 1.10 Figure 1.3 shows an algorithm that prints the contents of a two-dimensional array. The algorithm consists of two nested loops.

The usual strategy for analyzing nested loops is to work from the inside out.The inner loop can be analyzed as in the previous example. Its running timeT1(n) is Θ(n).

The operations that control the outer loop run in constant time but its body,which is the inner loop, does not run in constant time. It runs in linear time.As before, the total running time of the operations that control the outer loop isΘ(n). Since the inner loop always runs in time T1(n), the total running time of

1.7. BASIC STRATEGIES 25

for i = 0 to n−1for j = 0 to i−1

print a[i,j]

Figure 1.4: Printing the lower left triangle of a two-dimensional array

the inner loop is nT1(n). Therefore, the running time of the outer loop is

T (n) = Θ(n) + nT1(n)

= Θ(n) + nΘ(n)

It should be clear that the term nΘ(n) isΘ(n2). This implies that T (n) isΘ(n2). ut

The fact that a function that’s nΘ(n) is also Θ(n2) should be clear, intuitively.But it can also be easily justified as follows. Let f (n) be the Θ(n) function in theterm nΘ(n). When n is large enough, we have that an≤ f (n)≤ bn. Therefore,an2 ≤ nf (n)≤ bn2, which implies that nf (n) is Θ(n2).

The next example shows that sometimes the analysis of loops involves sum-mations.

Example 1.11 Figure 1.4 shows an algorithm that prints the lower left triangleof a two-dimensional array.

Once again, we analyze the inner loop first. The only difference comparedto the previous example, is that the inner loop repeats i times instead of n times.This implies means that the running time of the inner loop varies with i. In fact,the running time of the inner loop is Θ(i). This has the important consequencethat we cannot simply multiply the running time of the inner loop by the numberof times it is repeated. Instead, we need to add the running time of all theexecutions of the inner loop. In other words, if T1(i) is the running time of the

26 CHAPTER 1. ANALYSIS OF ALGORITHMS

inner loop, then the running time of the outer loop is

T (n) = Θ(n) +n−1∑

i=0

T1(i)

where the Θ(n) term is, as before, the total running time of the operations thatcontrol the outer loop.

So we need to analyze the summation

n−1∑

i=0

T1(i)

We’re going to take advantage of the fact that we know the exact form of therunning time of the inner loop: T1(i) = ai + b. Therefore,

n−1∑

i=0

T1(i) =n−1∑

i=0

(ai + b) = an−1∑

i=0

i +n−1∑

i=0

b = an−1∑

i=0

i + bn

We can then use the well-known formula

k∑

i=0

i =k(k+ 1)

2

This implies that

n−1∑

i=0

T1(i) = a(n− 1)n

2+ bn=

a2

n2 −a2

n+ bn

Therefore, the summation is Θ(n2) and the running time of the outer loop isΘ(n) +Θ(n2), which is Θ(n2). ut

1.7. BASIC STRATEGIES 27

if (n < 10)sort using simple sorting algorithm

elsesort using fast sorting algorithm

Figure 1.5: A hybrid sorting algorithm

The last example of this section concerns hybrid algorithms, which are al-gorithms that combine at least two other algorithms and include a strategy forchoosing which of the other algorithms to use.

Example 1.12 A sorting algorithm is an algorithm that arranges the elementsin a sequence according to some order. For example, the elements of an arraycould be sorted in nondecreasing order. Later in these notes, we will learn thatthere are simple sorting algorithms that run in time Θ(n2) and more complexsorting algorithms that run much faster, in time Θ(n log n). On small inputs,however, the simple sorting algorithms often run faster than the more complexones. The hybrid sorting algorithm shown in Figure 1.5 takes advantage of thatfact by choosing a simple sorting algorithm when the input is small and a fastsorting algorithm when the input is large.

Now, what is the overall running time of this algorithm, Θ(n2) or Θ(n log n)?The important thing to remember is that the asymptotic running time of analgorithm is determined by its running time on large inputs. When n ≥ 10,the running time of the running time of the hybrid algorithm equals the runningtime of the fast sorting algorithm, and when n is large enough, that runningtime is bounded below and above by constant multiples of n log n. Therefore,the running time of the hybrid algorithm is Θ(n log n). ut

28 CHAPTER 1. ANALYSIS OF ALGORITHMS

Exercises

1.7.1. What is the asymptotic running time of each of the following algorithms,as a function of n? Don’t forget to simplify and use the Θ notation. Justifyyour answers.

a) for i = 1 to nfor j = 1 to 2n+1

print ’∗’

b) for i = 1 to 10for j = 1 to n

print ’∗’

c) for i = 1 to nfor j = i to i+5

print ’∗’

d) for i = 1 to nfor j = i to n

print ’∗’

e) for i = 1 to nfor j = 1 to 2∗i+1

print ’∗’

f) for i = 1 to n∗nfor j = 1 to i

print ’∗’

1.7. BASIC STRATEGIES 29

Aif (n < 100)

Belse

for j = 1 to nC

Figure 1.6: Algorithm for Exercise 1.7.2

1.7.2. Consider the algorithm shown in Figure 1.6. Let TA(n), TB(n) and TC(n)denote the running time of algorithms A, B and C , respectively. What isthe asymptotic running time of this algorithm, as a function of n, undereach of the following sets of assumptions? Justify your answers.

a) TA(n) = Θ(n), TB(n) = Θ(n2) and TC(n) = Θ(log n).

b) TA(n) = Θ(n2), TB(n) = Θ(n2) and TC(n) = Θ(log n).

c) TA(n) = Θ(n2), TB(n) = Θ(n3) and TC(n) = Θ(log n).

1.7.3. Justify the fact that if T (n) = Θ(n) +Θ(n2) +Θ(1), then T (n) = Θ(n2).

1.7.4. Suppose that T (n) = Θ(log n)Θ(n). That is, T (n) is the product of twofunctions that are Θ(log n) and Θ(n), respectively. Show that T (n) =Θ(n log n).

30 CHAPTER 1. ANALYSIS OF ALGORITHMS

1.8 Analyzing Summations

In one of the examples of the previous section, the running time contained thesummation

n−1∑

i=1

i

We used a well-known formula to get the exact value of this summation. Butwhat if we didn’t know this formula or what if one didn’t exist for the particularsummation we were dealing with? In this section, we will learn a technique thatcan sometimes be used to get asymptotic bounds on summations.

Getting an upper bound on this summation is easy. Since every term in thesummation is at most n− 1, we get that

n−1∑

i=1

i ≤n−1∑

i=1

(n− 1) = (n− 1)2 = n2 − 2n+ 1

This implies that the summation is O(n2).

We can get a lower bound in a similar way. Every term in the summation isat least 1. Therefore,

n−1∑

i=1

i ≥n−1∑

i=1

1= n− 1

This implies that the summation is Ω(n).

Unfortunately, this lower bound is much smaller than the O(n2) upper bound.Typically, what we need are matching upper and lower bounds. So we need toimprove either the O(n2) upper bound (by lowering it) or improve the Ω(n)lower bound (by increasing it). Or both.

Let’s try to improve the lower bound. One way to do that is to split the

1.8. ANALYZING SUMMATIONS 31

summation in the middle:

n−1∑

i=1

i =n/2−1∑

i=1

i +n−1∑

i=n/2

i

In the last summation, every term is at least n/2. Therefore,

n−1∑

i=1

i ≥n−1∑

i=n/2

n2=

n−n2

n2=

n2

4

This implies that the summation is Ω(n2). Since this lower bound matches theO(n2) upper bound, we can now conclude that the summation is Θ(n2).

All this is correct except for one small technicality: if n is not even, then n/2is not an integer and we cannot have a summation start at i = n/2 when n/2 isnot an integer.

Usually, the most convenient way of dealing with this kind of issue is to usethe floor and ceiling notations. In general, the floor of x is the greatest integerless than or equal to x . The ceiling of x is the smallest integer greater than orequal to x . The floor and ceiling of x are denoted bxc and dxe. For example,b3/2c= 1 and d3/2e= 2. On the other hand, b5c= d5e= 5.

In our example, we can split the summation at dn/2e. We then get that

n−1∑

i=1

i ≥n−1∑

i=dn/2e

i ≥n−1∑

i=dn/2e

ln2

m

=

n−ln

2

mln2

m

Using the fact that n/2≤ dn/2e ≤ n/2+ 1/2, we get

n−1∑

i=1

i ≥

n−

n2+

12

n2=

n2−

12

n2=

n2

4−

n4

32 CHAPTER 1. ANALYSIS OF ALGORITHMS

for i = 0 to n−1if (a[i] == x)

return ireturn −1

Figure 1.7: A sequential search of an array

This implies the desired Ω(n2) lower bound.

Exercises

1.8.1. Obtain an Ω(n2) lower bound on the summation∑n−1

i=1 i by splitting thesummation at bn/2c (instead of dn/2e).

1.9 Worst-Case and Average-Case Analysis

Consider the sequential search algorithm shown in Figure 1.7. What is the run-ning time of this algorithm? The accurate answer is that it depends on the loca-tion of the first occurrence of x in the array.

We can talk of at least three different running times for a given algorithm.All are functions of the input size. The best-case running time is the minimumrunning time required on inputs of size n. In the case of the sequential searchalgorithm, the best case occurs when x is the first element of the array. In thatcase, the running time is constant.

The worst-case running time is the maximum running time required on inputsof size n. In our example, the worst case occurs when x is not found. In thatcase, the running time is linear in n.

1.9. WORST-CASE AND AVERAGE-CASE ANALYSIS 33

The average-case running time is the average running time required on inputsof size n. This running time is usually more difficult to determine, in part becauseit requires knowing how likely each input of size n is. For example, for thesequential search, how likely is it that x will not be found? Given that it isfound, how likely is it that it will be found in each of the possible positions?

In this example, one possible approach is to determine the average-case run-ning time for the two separate cases of a successful and an unsuccessful search.If the search is unsuccessful, the running time will always be the same, so theaverage and worst-case running times are the same: Θ(n).

In the case of a successful search, a common approach when lacking anymore precise knowledge of the particular application we have in mind, is toassume that each location is equally likely. It is easy to see that the running timeof the search is of the form ak+b where k is the position — a number from 1 to n— of the first occurrence of x. The average running time can then be calculatedby taking the average over all possible positions k:

1n

n∑

k=1

(ak+ b) =1n

an∑

k=1

k+ bn

=1n

an(n+ 1)

2+ bn

=a(n+ 1)

2+ b

Therefore, the average running time of a successful search is Θ(n).

In general, the best-case running time is not very useful. The worst-caserunning time is much more useful and has the advantage of giving us a guaranteebecause it is an upper bound on the running time required for all inputs (that arelarge enough). A possible disadvantage of the worst-case running time is thatthis upper bound may be much larger than the running time required by mostinputs. In other words, the worst-case running time can be overly pessimistic.

An example of this occurs with the quicksort algorithm, one of the fast sort-ing algorithms we will study later in these notes. This algorithm has a worst-case

34 CHAPTER 1. ANALYSIS OF ALGORITHMS

running time of Θ(n2) while the mergesort algorithm, another fast sorting algo-rithm, has a Θ(n log n) worst-case running time. This might indicate that quick-sort is much slower than mergesort. However, in practice, quicksort usually runsfaster than mergesort.

This apparent contradiction can be explained in part by the fact that theaverage-case running time of quicksort is Θ(n log n), just like the worst-caserunning time of mergesort. And the fact that quicksort tends to run faster thanmergesort in practice, probably indicates that the inputs that cause quicksort totake quadratic time occur only rarely.

This illustrates how the average-case running time can be more realistic thanthe worst-case running time. However, as we said earlier, the average-case run-ning time can be more difficult to determine because it requires knowledge of theprobability distribution of the inputs. In addition, average-case analysis usuallyrequires additional calculations. This was the case with the sequential searchalgorithm, although the calculations there were fairly easy. The average-caseanalysis of quicksort, on the other hand, is significantly more complicated thanits worst-case analysis. (We will do both later in these notes.)

In cases where even the worst-case analysis of an algorithm proves diffi-cult, it is possible to get an estimate of its asymptotic running time by testingthe algorithm on randomly generated inputs of various sizes and seeing whatkind of function best fits the data. But note that this gives an estimate of theemphaverage-case running time, since there is no guarantee that randomly gen-erated inputs will include the worst-case ones. This kind of “empirical analysis”can be especially useful if the average-case analysis is difficult and we suspectthat the worst-case running time may be too pessimistic.

We sometimes say that the running time of an algorithm is O(n), withoutspecifying if we are talking about the worst-case, average-case or best-case run-ning time. Strictly speaking, this statement is ambiguous. However, it is usually

1.9. WORST-CASE AND AVERAGE-CASE ANALYSIS 35

understood as giving a bound on the running time of all inputs of length n. Sincebig-Oh is an upper bound, this statement is therefore equivalent to saying thatthe worst-case running time of the algorithm is O(n).

Similarly, when we say that the running time of an algorithm is Ω(n), we’resaying that the best-case running time of the algorithm is Ω(n). And when wesay that the running time T (n) of an algorithm is Θ(n), we’re saying both things:that the best-case running is Ω(n) and that the worst-case running time is O(n).Note that this implies that all three running times — worst-case, average-caseand best-case — are Θ(n).

Study Questions

1.9.1. What are the best-case, worst-case and average-case running times of analgorithm?

1.9.2. What is an advantage and a disadvantage of the worst-case running timecompared to the average-case running time?

Exercises

1.9.3. Consider an algorithm that runs a test on its input x. If the test succeeds,the algorithm runs algorithm A on x. If the test fails, the algorithm runsalgorithm B on x. Suppose that algorithms A and B run in time Θ(n)and Θ(n2), respectively. Assuming that for every n, inputs of length n areequally likely to pass or fail the test, what are the worst-case, average-caseand best-case running times of this algorithm?

36 CHAPTER 1. ANALYSIS OF ALGORITHMS

1.10 The Binary Search Algorithm

It is fairly obvious that searching a collection of data for a particular element,or for an element that satisfies a particular property, is a frequent operation. Inthis section, we will learn that under certain conditions, it is possible to searchvery efficiently by using an algorithm called binary search. We will also analyzethe running time of this algorithm.

The simplest way of searching a sequence such as an array or a vector is toscan it from one end to the other, examining elements one by one. This is thesequential search we analyzed in the previous section. We found that its runningtime is linear in the length of the sequence.

If the sequence happens to be ordered, then the search can be done morequickly. For example, consider an array of integers sorted in increasing order.When looking for a particular integer, we can stop searching as soon as we findthe integer we are looking for or an integer that is larger that the integer we arelooking for. The running time of this modified sequential search is still linearbut we can expect unsuccessful searches to be 50% faster, on average.

A much more dramatic improvement in the running time can be obtainedfor sorted sequences that provide constant-time access to their elements, suchas arrays and vectors. The idea is to go straight to the middle of the sequenceand compare the element we are looking for with the middle element of thesequence. Because the sequence is sorted, this comparison tells us if the elementwe are looking for should be located in the first or second half of the sequence.We then only need to search that half.

This searching algorithm is called a binary search. The algorithm is de-scribed in Figure 1.8. In this high-level description of the algorithm, s is a sortedsequence and e is the element begin searched for.

Figure 1.9 shows a sample run of the algorithm on a sequence of integers.

1.10. THE BINARY SEARCH ALGORITHM 37

while (s contains more than one element) locate middle of sif (e < middle element of s)

s = left half of selse

s = right half of scompare e to only element in s

Figure 1.8: The binary search algorithm

e = 25s = [12 16 25 37 38 42 60 73] middle = 38

[12 16 25 37] 25[25 37] 37[25]

Found!

Figure 1.9: A run of the binary search algorithm

The middle element is taken to be the one at the middle or to the immediateright of the middle.

The pseudocode of Figure 1.8 is very general. If we wanted to implementthe algorithm we would need to make it more precise and use some care. Oneissue is that we need to specify how the middle element of the sequence can belocated. This depends on the type of sequence we’re searching. Another issue isthat the lines

s = left half of s

and

38 CHAPTER 1. ANALYSIS OF ALGORITHMS

s = right half of s

give the impression that the algorithm copies half of the sequence. This can’tbe done, otherwise the algorithm would run in linear time, no better than asequential search.

Figure 1.10 shows more detailed pseudocode for a version of binary searchthat works on arrays. The algorithm takes four arguments: a, i, j and e. Theargument a is an array and e is the element we’re searching for. The argumentsi and j are indices that specify the portion of the array that should be searched.Following a common convention, we consider that i and j specify an intervalthat’s closed on the left and open on the right: [i, j). In other words, theportion of the array that will be searched starts at index i and ends just beforeindex j. We can represent that as a[i..j). Note that the number of elementsin a[i..j) is j − i.

The algorithm of Figure 1.10 returns the location of the element in the array,or −1 is the element is not found. Figure 1.11 shows a generic C++ implemen-tation of this algorithm.

We now analyze the running time of the binary search algorithm under thefollowing two assumptions:

1. The middle element of the sequence can be accessed in constant time.

2. Elements can be compared in constant time.

For example, these assumptions are satisfied in the case of arrays and vectorsthat contain small integers or small strings.

Let T (n) be the running time of the binary search algorithm on an arrayof size n. Consider the pseudocode shown in Figure 1.10. It should be clearthat T (n) = Θ(r) where r is the number of iterations of the loop. We need todetermine what r is.

1.10. THE BINARY SEARCH ALGORITHM 39

binary_search(a, i, j, e)// Search a[i..j) for e

while (j − i >= 2)mid = floor((i + j) / 2)if (e < a[mid])

j = mid;else

i = mid;

if (j − i = 1 and e = a[i])return i

elsereturn −1;

Figure 1.10: The binary search algorithm for arrays

40 CHAPTER 1. ANALYSIS OF ALGORITHMS

template <class T>int binary_search(const T a[], int start, int stop,

const T & e)// Performs a binary search in a for e. Returns the// index of e in the range [start,stop). Returns −1 if// e is not found that range.//// PRECONDITION: The indices are valid and the// elements in the range [start,stop) are sorted in// increasing order.//// ASSUMPTION ON TEMPLATE ARGUMENT: Values of type T// can be compared by using the < operator.

while (stop − start >= 2) int middle = (start + stop) / 2;if (e < a[middle])

stop = middle;else

start = middle;if (stop − start == 1 && e == a[start])

return start;else

return −1;

Figure 1.11: An implementation of binary search for arrays

1.10. THE BINARY SEARCH ALGORITHM 41

Let’s first do this for the case when n is a power of 2. So suppose that n= 2k.Let ni be the size of the portion of the array that is left to search after iterationi of the loop. Initially, n0 = n. The loop terminates as soon as ni reaches 1. Inother words, r is the smallest value for which nr = 1.

At every iteration of the loop, the size is cut in half. Therefore, after i it-erations, we have that ni = n/2i = 2k−i. This will be equal to 1 when i = k.Therefore, r = k. Since k = log n, we get that T (n) = Θ(log n).

Let’s now generalize this to account for the fact that n may not be a powerof 2. Suppose that 2k−1 < n≤ 2k. Let’s first work on the upper bound. After oneiteration, n1 ≤ dn/2e. Since n/2 ≤ 2k−1, an integer, we have that dn/2e ≤ 2k−1,which implies that n1 ≤ 2k−1. After two iterations, n2 ≤ dn1/2e. Since n1/2 ≤2k−2, we get that dn1/2e ≤ 2k−2 and that n2 ≤ 2k−2. It should be clear that thiscan be repeated to show that for any i, ni ≤ 2k−i. Since 2k−i = 1 when i = k,we have that nk ≤ 1 so it must be that when ni = 1 for the first time, i ≤ k.Therefore, r ≤ k.

We need to relate r to n. Since we want an upper bound on r, we wantan upper bound on k. This is given to us by the inequality 2k−1 < n, whichimplies that k − 1 < log n and that k < log n+ 1. Therefore, r < log n+ 1 andT (n) = O(log n).

A similar argument shows that r ≥ log n− 1. Therefore, we also have thatT (n) = Ω(log n), which implies that T (n) = Θ(log n).

Once again, note that our analysis of the binary search algorithm relies crit-ically on the fact that the middle element of the sequence can be accessed inconstant time. The binary search algorithm can also be used on other sorted se-quences but, in that case, the running time may not be logarithmic. (An exerciseasks you to explore this issue.)

42 CHAPTER 1. ANALYSIS OF ALGORITHMS

Exercises

1.10.1. Run the binary search sort algorithm on an array containing the follow-ing elements:

11 27 28 30 36 42 58 65

Search for elements 42 and 30. Illustrate each run of the algorithm aswas done in Figure 1.9.

1.10.2. Finish the analysis of the binary search algorithm by proving that r ≥log n− 1.

1.10.3. Suppose that computing the location of the middle element of a se-quence takes time linear in the number of elements in the range currentlybeing searched. (This is the case with linked lists.) Show that the runningtime of the binary search algorithm is linear in this case.

1.10.4. Suppose that an array contains multiple copies of an element beingsearched for. As described in this section, the binary search algorithm willfind the last occurrence of that element. Modify the algorithm so it findsthe first occurrence. Verify your work by revising the implementation ofthe algorithm and testing it.

Chapter 2

Recursion

In this chapter, we will learn about recursion, a technique that greatly simplifiesthe design and implementation of many algorithms, including the fast sortingalgorithms we will learn later in these notes.

2.1 The Technique

Recursion is a technique for designing algorithms. We will see examples of theusefulness of recursion when we study sorting algorithms later in these notes.For now, however, we introduce recursion using simple examples where recur-sion is neither needed nor a particularly good idea. These examples are onlymeant to illustrate the technique.

Consider the problem of printing a line containing n copies of a given char-acter c. An algorithm for this problem can be designed very simply by puttingthe statement print c in a loop that executes n times, as shown in Figure 2.1.

An alternative algorithm can be designed as follows. First, print one c. Thenask, what is left to do? The answer is, to print a line containing n− 1 copies of

43

44 CHAPTER 2. RECURSION

print(n, c)

for i = 1 to nprint c

print \n

Figure 2.1: A simple iterative algorithm

c. And here is the central idea of recursion: this subtask can be performed byusing the algorithm that is being designed as if it was already available:

print cprint(n−1, c)

The function call print(n−1, c) is recursive because it occurs in print itself.This recursive call does not create a trivial infinite loop because the function isnot being called with the same arguments.

However, as is, this recursive algorithm won’t work: it will just keep on call-ing itself. What we need is a base case, a case where recursion is not used. Wealso need to make sure that the base case will eventually be reached. The algo-rithm in Figure 2.2 achieves both these objectives. The base case is when n≤ 0.In that case, we print an empty line.

Now, how can we be sure that this recursive algorithm works, that it actuallydoes what it is supposed to do? The key statement we’re interested in is this:

P(n): print(n, c) prints a line containing n copies of c.

We want to prove that this statement is true for every n≥ 0.We can do this as follows. First, notice that P(0) is true. That’s because the

base case of the algorithm correctly prints an empty line.

2.1. THE TECHNIQUE 45

print(n, c)

if (n <= 0)print \n

elseprint cprint(n−1, c);

Figure 2.2: A recursive version of the printing algorithm

Now, we wish to prove that P(1) is true. When called with n = 1, printprints a c and then calls itself with n = 0. This causes an empty line to beprinted, as we just established, so that the result is a line containing 1 copy ofc. Therefore, P(1) is true.

What about P(2)? We can use essentially the same argument. When calledwith n= 2, print prints a c and then calls itself with n= 1. Because we knowthat P(1) is true, this causes a line with one c to be printed, so that the result isa line containing 2 copies of c. Therefore, P(2) is true.

This argument can then be repeated for P(3), P(4), . . . , showing that P(n)is true for every n≥ 0.

This type of argument is called a proof by induction. Its two key ingredientsare as follows:

1. Basis: Show that P(0) is true.

2. Step: Show that P(i − 1) implies P(i), for every i ≥ 1.

The basis is the starting point of the argument. The step is what allows us toprove P(1) from P(0), P(2) from P(1), P(3) from P(2), and so on, for everyi ≥ 1.

46 CHAPTER 2. RECURSION

It should be clear that the basis and step imply that P(n) is true for everyn≥ 0. This fact is called the Principle of Mathematical Induction.

Theorem 2.1 (Principle of Mathematical Induction) If P(0) and, for every i ≥1, P(i − 1) implies P(i), then, for every n≥ 0, P(n).

The statement P(0) is called the inductive basis. The statement “P(i−1) impliesP(i)” is the inductive step.

The Principle of Mathematical Induction has several variants, which meansthat proofs by induction can be carried out in several different ways. For exam-ple, the basis doesn’t need to be for n = 0. We could start the argument at anyn0. The proof would then show that P(n) is true for every n≥ n0.

Another variant is that in the step, when showing that P(i) is true, we canuse the fact that not only is P(i − 1) true, but all of P(0), P(1), . . . , P(i − 1)are true. In other words, the step becomes the following: for every i ≥ 1,P(0), P(1), . . . , P(i − 1) imply P(i). This is called strong induction.

A third variant is that a proof by induction can have multiple bases. Forexample, we could start by proving P(0) and P(1). The step would then be fori ≥ 2. Multiple bases can be convenient and they are often combined with stronginduction.

Now, in practice, it is not always necessary to write down detailed proofs byinduction when trying to convince ourselves that a recursive algorithm works.We can often rely on the following general properties of correct recursive algo-rithms:

1. The algorithm has at least one base case, one where the problem is solveddirectly, without a recursive call.

2. Every recursive call gets closer to a base case, in such a way that a basecase is eventually reached. (Usually this is based on some notion of integerdistance from the base case.)

2.1. THE TECHNIQUE 47

3. The algorithm works when you assume that the recursive calls work.

For example, consider print. First, the algorithm has a base case (n ≤0) that correctly prints an empty line. Second, every time the algorithm callsitself, the first argument decreases by 1. Since this recursive call is only madewhen n is positive, the first argument is getting is getting closer to 0 and willeventually be 0. At point, we will reach the base case. Third, if we assumethat the recursive call correctly prints a line containing n − 1 copies of c, thisalgorithm does correctly print a line containing n copies of c. Therefore, printsatisfies the three properties and we can conclude that the algorithm is correct.

Note that the first two properties guarantee that the algorithm eventuallyterminates. The first property also gives us the basis for a proof by induction thatthe algorithm does what it is supposed to do. The second and third propertiesgive us the step of that induction.

It is important to realize that each recursive call executes independently fromthe others. In particular, each recursive call has its own arguments and its ownset of local variables. For example, the execution of print(2, ’∗’) can beillustrated as in Figure 2.3.

We end this section with some additional examples of recursive algorithms.Figure 2.4 shows a recursive algorithm that displays the contents of an array aof size n. Figure 2.5 shows a variant of this algorithm where the arguments iand j specify the portion of the array that should be printed. As usual, i and jspecify an interval that’s closed on the left and open on the right ([i, j)) sothat the portion of the array that is printed is a[i..j).

An algorithm that displays an array in reverse order can be easily obtainedby printing the last element of the array first, as shown in Figure 2.6. Whenreasoning about the correctness of this algorithm, it is important to keep in mindthat if the recursive call works, then it prints a[i..j−1) in reverse.

It is interesting to note that an algorithm that displays an array in reverse

48 CHAPTER 2. RECURSION

print(2,’∗’)−−−−−−−−−−−−n = 2c = ’∗’

print(2,’∗’) −−> print(1,’∗’)−−−−−−−−−−−− −−−−−−−−−−−−n = 2 n = 1c = ’∗’ c = ’∗’

print(2,’∗’) −−> print(1,’∗’) −−> print(0,’∗’)−−−−−−−−−−−− −−−−−−−−−−−− −−−−−−−−−−−−n = 2 n = 1 n = 0c = ’∗’ c = ’∗’ c = ’∗’

print(2,’∗’) −−> print(1,’∗’) −−> print(0,’∗’)−−−−−−−−−−−− −−−−−−−−−−−− −−−−−−−−−−−−n = 2 n = 1 n = 0c = ’∗’ c = ’∗’ c = ’∗’

(return)print(2,’∗’) −−> print(1,’∗’)−−−−−−−−−−−− −−−−−−−−−−−−n = 2 n = 1c = ’∗’ c = ’∗’

(return)print(2,’∗’)−−−−−−−−−−−−n = 2c = ’∗’(return)

Figure 2.3: A sample run of a recursive algorithm

2.1. THE TECHNIQUE 49

display(a, n)

if (n > 0)display(a, n−1)print a[n−1]

Figure 2.4: A recursive algorithm that displays the contents of an array

display(a, i, j)

if (j > i)print a[i]display(a, i+1, j)

Figure 2.5: A recursive algorithm that displays a[i..j)

display_reverse(a, i, j)

if (j > i)print a[j−1]display_reverse(a, i, j−1)

Figure 2.6: A recursive algorithm that displays a[i..j) in reverse

50 CHAPTER 2. RECURSION

display_reverse(a, i, j)

if (j > i)display_reverse(a, i+1, j)print a[i]

Figure 2.7: Another recursive algorithm that displays a[i..j) in reverse

sum(a, i, j)

if (j > i)return a[i] + sum(a, i+1, j)

elsereturn 0

Figure 2.8: A recursive algorithm that computes the sum of the elements of anarray of numbers

order can also be obtained by switching the two statements in the recursive caseof the original algorithm, as shown in Figure 2.7. Once again, it is important tokeep in mind that the recursive call prints a[i+1..j) in reverse.

Figure 2.8 shows a recursive algorithm that computes the sum of the ele-ments in an array of numbers. If the array is empty, 0 is returned. If the arrayis not empty, the first element is added to the sum of the others. That sum iscomputed recursively.

Figure 2.9 shows a recursive version of the binary search algorithm. In theprevious chapter, we described this algorithm as a loop (see Figure 1.10). Theidea behind the recursive version is that after e has been compared to the middle

2.1. THE TECHNIQUE 51

binary_search(a, i, j, e)

if (j − i >= 2)mid = floor((i + j) / 2)if (e < a[mid])

return binary_search(a, i, mid, e)else

return binary_search(a, mid, j, e)

if (j − i = 1 and e = a[i])return i

elsereturn −1;

Figure 2.9: A recursive version of the binary search algorithm

element of a[i,j), what is left to do is search one of the halves of a[i,j).That problem can be solved recursively. Even though the difference betweenthe two versions is not dramatic, many people find the recursive version morenatural and easier to understand.

C++ implementations of the recursive algorithms display, sum andbinary_search are shown in Figures 2.10, 2.11 and 2.12.

Study Questions

2.1.1. What are the three properties of a correct recursive algorithm?

52 CHAPTER 2. RECURSION

template <class T>void display(const T a[], int start, int stop)// Displays the elements of a in the range [start,// stop). Elements are separated by one blank space.//// PRECONDITION: The indices are valid indices in a.//// ASSUMPTION ON TEMPLATE ARGUMENT: Values of type T// can be displayed using the << operator.

if (start < stop) cout << a[start] << ’ ’;display(a, start+1, stop);

// if start >= stop, do nothing

Figure 2.10: An implementation of the recursive algorithm display

2.1. THE TECHNIQUE 53

template <class T>T sum(const T a[], int start, int stop)// Adds the elements of a in the range [start, stop).// The sum is returned.//// PRECONDITION: The indices are valid indices in a.//// ASSUMPTION ON TEMPLATE ARGUMENT: Values of type T// can be added using the + operator and 0 can be// converted to a value of type T.

if (start < stop)return a[start] + sum(a, start+1, stop);

else // start >= stopreturn 0;

Figure 2.11: An implementation of the recursive algorithm sum

54 CHAPTER 2. RECURSION

template <class T>int binary_search(const T a[], int start, int stop,

const T & e)// Performs a binary search in a for e. Returns the// index of e in the range [start, stop). Returns −1// if e is not found that range.//// PRECONDITION: The indices are valid and the// elements in the range [start,stop) are sorted in// increasing order.//// ASSUMPTION ON TEMPLATE ARGUMENT: Values of type T// can be compared by using the < operator.

if (stop − start >= 2) int middle = (start + stop) / 2;if (e < a[middle])

return binary_search(a, start, middle, e);else

return binary_search(a, middle, stop, e);if (stop − start == 1 && e == a[start])

return start;else

return −1;

Figure 2.12: An implementation of the recursive version of the binary searchalgorithm

2.2. WHEN TO USE RECURSION 55

Exercises

2.1.2. Verify that the recursive algorithms shown in this section satisfy the threeproperties of a correct recursive algorithm.

2.1.3. Write a recursive function that computes the number of occurrences ofa given element in an array. The function takes as arguments the array,a start index, a stop index and an element. The function returns thenumber of times the element occurs in the range [start,stop).

2.1.4. Write a recursive function that finds the maximum element in a nonemptyarray. The function takes as arguments the array, a start index and astop index. The function returns the maximum value that occurs in therange [start,stop).

2.1.5. Modify the function of the previous exercise so that it returns the indexof the first occurrence of the maximum value.

2.1.6. Write a recursive function that takes as argument an integer n and printsthe numbers n, n− 1, . . . , 3, 2, 1. The function should do nothing if n< 1.

2.1.7. Repeat the previous question but this time print the numbers in increasingorder.

2.1.8. Repeat again, this time printing n, n−1, . . . , 3, 2, 1, 2, 3, . . . , n−1, n. Writea single function.

2.2 When to Use Recursion

First, why use recursion? The main advantage of recursive algorithms is thatthey can be simpler than non-recursive algorithms that solve the same problem.

56 CHAPTER 2. RECURSION

This means that recursive algorithms can be easier to find and design, as wellas easier to understand, implement and modify. We will soon study efficientsorting algorithms and these will be good examples where recursion makes thealgorithms simpler and easier to design.

However, it is not always a good idea to use recursion. The main disadvan-tage of recursive algorithms is that they can generate of lot of function calls.Function calls take more time than most other operations. But for most recur-sive functions, the additional time taken by the recursive calls is not very signifi-cant. What is usually more important is that a recursive function always uses anamount of memory space at least proportional to the number of recursive calls.This should be clear from the sample run shown in Figure 2.3.

In general, the memory requirements of recursive functions lead to the fol-lowing guidelines:

1. Try to avoid recursion if the number of recursive calls can be large.

2. Don’t use recursion if the number of recursive calls can be large and thereis a simple loop that solves the problem.

What large means depends on the context and the size of the input. But, typi-cally, anything at least linear in the input size is considered large while anythinglogarithmic in the input size is considered small.

To summarize, you don’t want to use recursion if you already have a simple,efficient non-recursive algorithm that solves the problem. You want to use re-cursion to design an algorithm when you suspect, or hope, that it will be easierthat way and that you may get a simpler algorithm. But then, once you havedesigned the recursive algorithm, you need to check that it doesn’t use muchmore memory than necessary.

In light of these comments, the recursive algorithms print, display,display_reverse and sum of the previous section are actually examples

2.3. TAIL RECURSION 57

where recursion should not be used because we have simple loops that can solvethese problems using only a constant amount of memory. The case of binarysearch is not as clear-cut. Many would argue that the recursive version is morenatural and that the extra logarithmic space shouldn’t be a problem, even forvery large input sizes.

Note that there are programming languages in which recursion is the normalmechanism for creating repetition because those languages don’t have general-purpose loops. Examples of such languages are Scheme, Haskell and Prolog.1

Study Questions

2.2.1. What is the main advantage and the main disadvantage of recursive al-gorithms?

Exercises

2.2.2. Which of the algorithms you wrote for the exercises of the previous sec-tion should have not been designed recursively?

2.3 Tail Recursion

We know that the minimum amount of memory used by a recursive function isat least proportional to the number of recursive calls it makes. If that number islarge, the algorithm will use a lot of space. In such cases, we probably want tolook for a non-recursive algorithm.

1At Clarkson, some of these languages are normally studied in the course CS341 ProgrammingLanguages.

58 CHAPTER 2. RECURSION

Sometimes it is fairly easy to directly transform a recursive algorithm into anon-recursive one. A recursive algorithm is said to be tail recursive if every timeit runs, at most one recursive call is made and that call is the very last action thatthe algorithm performs. Among the recursive algorithms we saw in this chapter,print, display(a, i, j), the first version of display_reverse andbinary_search are tail recursive while display(a, n), the second ver-sion of display_reverse and sum are not.

A tail recursive algorithm can be transformed into a loop by following thesethree general steps:

1. Turn the recursive case of the algorithm into the body of a loop that exe-cutes until the base case is reached.

2. Replace the recursive call by statements that update the arguments of thealgorithm.

3. Place the base case so it is executed after the loop terminates.

For example, applying these steps to print produces the iterative versionshown in Figure 2.13. Note that this loop uses a constant amount of memorywhile the recursive version of print uses an amount that’s linear in n.

Some compilers are able to make tail recursive functions execute efficiently,as if they had been transformed into loops. In general, whenever a functioncalls another one at the very end of its execution, any memory space used bythe calling function can be deallocated immediately because the calling functionhas nothing left to do. Some compilers are able to compile these “last calls” inthis way. This is called last-call optimization.

Study Questions

2.3.1. What is a tail recursive algorithm?

2.4. ANALYSIS OF RECURSIVE ALGORITHMS 59

print(n, c)

while (n > 0)print cn = n−1;

print \n

Figure 2.13: A version of print with the tail recursion removed

2.3.2. How can a tail recursive algorithm be transformed into a loop?

2.3.3. What is last-call optimization?

Exercises

2.3.4. Transform the recursive algorithms display(a, i, j) andbinary_search into loops by removing the tail recursion as de-scribed in this section.

2.3.5. Among the functions you wrote for the exercises of the first section ofthis chapter, which ones are tail recursive? Transform them into loops byremoving the tail recursion.

2.4 Analysis of Recursive Algorithms

Consider again the recursive print algorithm from the beginning of this chap-ter. The algorithm is shown again in Figure 2.14.

60 CHAPTER 2. RECURSION

print(n, c)

if (n <= 0)print \n

elseprint cprint(n−1, c);

Figure 2.14: A recursive algorithm

This is a simple recursive algorithm that can be analyzed essentially like aloop. The algorithm makes exactly n recursive calls. The last of those calls isthe base case, which runs in constant time. All of the other calls, including theoriginal call, take constant time to perform work that’s not done by the otherrecursive calls (the test n <= 0 and the statement print c). Therefore, therunning time T (n) of this algorithm is nΘ(1) +Θ(1), which is Θ(n).

Many other recursive algorithms cannot be analyzed so easily. In those cases,it is useful to express the running time of the algorithm in terms of the runningtime of the recursive calls. In the case of print, this gives the following equa-tions:

T (n) = T (n− 1) +Θ(1), if n> 0

T (0) = Θ(1)

These equations together are called a recurrence relation because the firstequation expresses the value of T (n) in terms of the value of T on a smallerargument. Recurrence relations are therefore similar to recursive algorithmsand they come up naturally in the analysis of recursive algorithms.

2.4. ANALYSIS OF RECURSIVE ALGORITHMS 61

What we need to do now is extract from the recurrence relation an equationthat gives us the value of T (n)without mentioning T again. This is called solvingthe recurrence relation.

This particular recurrence relation is easy to solve. First, replace the asymp-totics with actual functions. In the case of print, both instances of Θ(1) standfor actual constants so we have that

T (n) = T (n− 1) + b, if n> 0

T (0) = a

Then, note that the recurrence relation implies the following set of equations:

T (n) = T (n− 1) + b

T (n− 1) = T (n− 2) + b...

T (1) = T (0) + b

T (0) = a

We can now add of all these equations to get

T (n) = bn+ a

which implies that T (n) is Θ(n).With some recursive algorithms, this technique may not work. We will see

examples soon when we analyze some of the fast sorting algorithms. In thosecases, induction is a useful tool.

As we saw earlier, proofs by induction are the normal way to prove the cor-rectness of recursive algorithms. It turns out that they can also be used to analyze

62 CHAPTER 2. RECURSION

their running times. Recall the key fact that these proofs rely on:

Theorem 2.2 (Principle of Mathematical Induction) If P(0) and, for every i ≥1, P(i − 1) implies P(i), then, for every n≥ 0, P(n).

When induction is used to prove the correctness of a recursive algorithm,P(n) is a statement about what the algorithm does. When induction is used toanalyze a recursive algorithm, P(n) is normally an inequality: an upper or lowerbound on the running time of the algorithm. Since this is fairly different, let’sfirst see an example of how induction can be used to prove an equation. Here’sa formula we have used before:

Theorem 2.3 For every n≥ 1,∑n

i=1 i = n(n+ 1)/2.

Proof We prove this equation by induction on n. The basis is for n= 1. In thatcase,

∑ni=1 i = 1 and n(n+ 1)/2= 1. Therefore, the equation holds for n= 1.

The inductive step is for n ≥ 2. We need to show that if the equation holdsfor n−1, then it holds for n. So suppose that the equation holds for n−1. Thatis, suppose that

∑n−1i=1 i = (n− 1)n/2. This is called the inductive hypothesis.

We now use this assumption to show that the equation holds for n. We have

n∑

i=1

i =n−1∑

i=1

i + n=(n− 1)n

2+ n=

(n− 1)n+ 2n2

=(n+ 1)n

2

Thefore, the equation holds for n and we have shown, by induction, that theequation holds for every n≥ 1. ut

Let’s now see how induction can be used to solve recurrence relations. Here’s

2.4. ANALYSIS OF RECURSIVE ALGORITHMS 63

again the running time of print:

T (n) = T (n− 1) + b, if n> 0

T (0) = a

Note that it is still necessary to remove asymptotics from the recurrence relation.

To use induction to solve a recurrence relation, we need a candidate solution.In this case, we already know that T (n) is Θ(n). So let’s use induction to showthat T (n) ≤ cn for every n ≥ 0. (A similar argument can be used to show amatching lower bound. An exercise asks you to work out the details.)

Note that we don’t know what the value of c is. This is something we willdiscover while doing the proof. This may seem a bit backwards but it’s fine: aswe do the proof, we will discover what conditions c must satisfy for the proof towork. Then, at the end, we can choose a value for c that meets those conditions.

The basis is for n = 0. We need to show that the inequality T (n) ≤ cn holdswhen n = 0. But that’s not possible since, in that case, cn = 0 and T (0) > 0because it’s a running time.

This is something that can happen when using induction to find an asymp-totic solution to a recurrence relation. The way to fix this is to adjust our goal:let’s prove that T (n) ≤ cn but only for every n ≥ 1. That will still allow us toconclude that T (n) is O(n).

The basis is now for n = 1. We want T (1) ≤ c · 1 = c. This will be true aslong as we choose c to be at least T (1). This is our first condition on the valueof c.

The inductive step is for n ≥ 2. Suppose that the bound holds for n − 1:T (n− 1)≤ c(n− 1). Then,

T (n) = T (n− 1) + b ≤ c(n− 1) + b = cn− c + b

64 CHAPTER 2. RECURSION

Therefore, to show that T (n)≤ cn, all we need to do is show that

cn− c + b ≤ cn

This is equivalent to c ≥ b. This is our second condition on c.

So we choose c =max(T (1), b). With this value, the basis and inductive stepboth work and we have a proof, by induction, that T (n)≤ cn for every n≥ 1.

As we said earlier, to use induction to solve a recurrence relation, we firstneed a candidate solution. This candidate can be a guess based on intuition,or a guess based on a rough analysis (we’ll see an example of that in the nextchapter), or it can be a shot in the dark. But then, what happens if we guesswrong?

For example, consider again the recurrence relation

T (n) = T (n− 1) + b, if n> 0

T (0) = a

We now know that T (n) is Θ(n). What if we tried to use induction to prove thatT (n) ≤ c log n. Since T (n) is actually Ω(n), this lower bound is not valid andthe proof should fail. Let’s see exactly how.

So we’re trying to show that T (n) ≤ c log n. We will do this for every n ≥ 2because when n= 1, log n= 0.

The basis is for n = 2. We want T (2) ≤ c log2 = c. So for the basis to work,all we need is c ≥ T (2). (You may be starting to suspect that the basis alwaysworks. And it’s true: we can always choose c to make the basis work. In fact,many people will simply skip the basis and focus on the inductive step.)

Now, the inductive step is for n≥ 3. Suppose that the bound holds for n−1:

2.4. ANALYSIS OF RECURSIVE ALGORITHMS 65

T (n− 1)≤ c log(n− 1). Then,

T (n) = T (n− 1) + b ≤ c log(n− 1) + b

Therefore, to show that T (n)≤ c log n, all we need to do is show that

c log(n− 1) + b ≤ c log n

And at this point, we get stuck. Because this would mean that for every n≥ 3,

log n− log(n− 1)≥bc

Intuitively, this is hard to believe since log n − log(n/2) = 1, which seems toimply that log n − log(n − 1) is about 1/(n/2) = 2/n. In fact, it can be shownthat log n− log(n−1) is o(1), which implies that it cannot be bounded below bya constant such as b/c.

So this is how the proof of an invalid bound fails: we are unable to find avalue for c to make the inductive step work. Note that the fact that our proof ofT (n)≤ c log n failed does not mean that the upper bound is not valid. It may bethat we need to use a different technique. Or stronger math. The only way toknow that the upper bound is not valid is to prove a larger lower bound, suchas T (n)≥ cn.

Exercises

2.4.1. Consider again the recurrence relation of this section:

T (n) = T (n− 1) + b, if n> 0

T (0) = a

66 CHAPTER 2. RECURSION

Use induction to show that T (n) is Ω(n).

2.4.2. Consider the recurrence relation

T (n) = T (n− 1) + bn, if n≥ 1

T (0) = a

Find an asymptotic solution to this recurrence relation. Do it first by “writ-ing out” the recurrence relation and then by using induction. (When usinginduction, do both the upper and lower bound.)

Chapter 3

Sorting

Two of the most frequent operations performed on a collection of data are tosearch the collection for a particular element and to sort the data by arrangingthe elements in some order. And these two operations are related: as we have al-ready seen, sorted data can be searched much more quickly by using algorithmssuch as the binary search. In this chapter, we will learn four sorting algorithms.The first two are simple but inefficient. The last two — mergesort and quicksort— are more complicated but much more efficient. These two algorithms are alsogood examples of the usefulness of recursion.

3.1 Selection Sort

Our first sorting algorithm is called selection sort. The idea is simple: find thelargest element of the array and move it to the last position. Then repeat forthe rest of the array. The algorithm is shown in Figure 3.1. A sample run of thealgorithm is illustrated in Figure 3.2. The first line shows the initial contents ofthe array. The second line shows the result of the swap and the subarray that will

67

68 CHAPTER 3. SORTING

selection_sort(a, i, j)

if (j − i >= 2)k = index of maximum element in a[i, j)swap a[k] and a[j−1]selection_sort(a, i, j−1)

Figure 3.1: The selection sort algorithm

[60 12 37 42 25 38 16][16 12 37 42 25 38] 60[12 16 25 37 38 42] 60[12 16 25 37 38 42 60]

Figure 3.2: A run of the selection sort algorithm (top level of recursion)

3.1. SELECTION SORT 69

[60 12 37 42 25 38 16][16 12 37 42 25 38] 60[16 12 37 38 25] 42 60[16 12 37 25] 38 42 60[16 12 25] 37 38 42 60[16 12] 25 37 38 42 60[12] 16 25 37 38 42 60[12] 16 25 37 38 42 60[12 16] 25 37 38 42 60[12 16 25] 37 38 42 60[12 16 25 37] 38 42 60[12 16 25 37 38] 42 60[12 16 25 37 38 42] 60[12 16 25 37 38 42 60]

Figure 3.3: A run of the selection sort algorithm (entire recursion)

be recursively sorted. The third line shows the result of sorting that subarray.The fourth line shows the final contents of the array.

Figure 3.3 illustrates the same run of selection sort but this time, the entirerecursion is shown, not just the top level. The first half of the lines shows thecontents of the array after each swap, before the recursive call. The second halfshows these arrays at the end of the recursive calls. The portion of the arraybeing sorted by the current recursive call is shown between the two brackets.

Since selection sort is tail recursive we can easily turn it into a more efficientloop, as shown in Figure 3.4. An implementation of selection sort for arraysis shown in Figure 3.5. Note the use of the STL generic algorithms swap andmax_element.

The analysis of selection sort is simple. Consider first the iterative version,which is the one we would use in practice. Let n be the total number of ele-

70 CHAPTER 3. SORTING

selection_sort(a, i, j)

while (j − i >= 2)k = index of maximum element in a[i, j)swap a[k] and a[j−1]−−j

Figure 3.4: An iterative version of the selection sort algorithm

template <class T>void selection_sort(T a[], int start, int stop)// Sorts elements in a in increasing order using the// selection sort algorithm. Sorts elements in the// range [start, stop). Sorts according to the <// operator.//// PRECONDITION: The indices are valid and start occurs// before stop.//// ASSUMPTION ON TEMPLATE ARGUMENT: Values of type T// can be compared using the < operator.

while (stop − start > 1) auto itr_max =

std::max_element(a + start, a + stop);std::swap(∗itr_max, a[stop − 1]);−−stop;

Figure 3.5: An implementation of selection sort for arrays

3.1. SELECTION SORT 71

ments to be sorted (that’s the initial value of j − i). The algorithm is a loop butthe running time of the body of that loop depends on the size of the interval cur-rently being sorted. That’s because finding the maximum element in a[i, j)

requires a second loop that examines every element in the interval.Let m be the size of the interval currently being sorted; that is, let m be the

current value of j − i. Then the body of the loop runs in time am+ b, for someconstants a and b. Since m varies from n to 2, the running time of the algorithmis

T (n) =n∑

m=2

(am+ b)

As we have seen before, this summation is Θ(n2). Therefore, selection sort runsin quadratic time.

Now, for the sake of the exercise, let’s analyze the recursive version of selec-tion sort. If A is an array of size n≥ 2, the running time of selection sort on A isgiven by the recurrence

T (A) = T (An−1) +Θ(n)

where An−1 is the array that consists of the first n−1 elements of A and the Θ(n)term is the total running time of all the operations except for the recursive call.1

Now, let T (n) be the worst-case running time of selection sort. The aboverecurrence implies that when n≥ 2,

T (n)≤ T (n− 1) +Θ(n)

As usual, to solve this recurrence relation, we first remove the asymptotics. The

1In the case of selection sort, these other operations are easy to analyze and we know thattheir total running is not just Θ(n) but actually of the form an+ b. But things are not alwaysthis simple so, for the sake of practicing, we’ll start with a recurrence relation that containsasymptotics.

72 CHAPTER 3. SORTING

Θ(n) term stands for a function f (n) that satisfies the following: there exista, b > 0 and n0 such that for every n ≥ n0, an ≤ f (n) ≤ bn. Therefore, when nis greater than or equal to both 2 and n0, we have that

T (n)≤ T (n− 1) + bn

Let n1 =max(2, n0). Then the above recurrence is valid for every n≥ n1.

We can now write out the recurrence relation:

T (n)≤ T (n− 1) + bn

T (n− 1)≤ T (n− 2) + b(n− 1)...

T (n1)≤ T (n1 − 1) + bn1

(Note how we stop at n1. That’s because we don’t know if the inequality holdsfor values of n smaller than n1.) Adding all these inequalities gives us

T (n)≤ T (n1 − 1) +n∑

i=n1

bi ≤ T (n1 − 1) + bn∑

i=1

i

Since the value of the last summation is Θ(n2), this implies that the worst-caserunning time of selection sort is O(n2).

To show that the worst-case running time of selection sort is also Ω(n2), wecould find, for each n, an array that causes the algorithm to take that much time.But in the case of selection sort, we can show much more. The recurrence

T (A) = T (An−1) +Θ(n)

3.1. SELECTION SORT 73

implies that the best-case running time satisfies the following recurrence:

Tbest(n)≥ Tbest(n− 1) +Θ(n)

This recurrence is identical to the one for the worst-case running time except thatit is a lower bound instead of an upper bound. The recurrence can be solved inessentially the same way to show that the best-case running time of selectionsort is Ω(n2). This implies that the running time of selection sort on any arrayof size n is Θ(n2). In particular, the best-case, average and worst-case runningtimes of selection sort are all Θ(n2).

Exercises

3.1.1. Run the iterative version of the selection sort algorithm on an array con-taining the following elements:

12 37 25 60 16 42 38

Use brackets to indicate the portion of the array currently being sorted, asin the first half of Figure 3.3.

3.1.2. Rewrite the implementation of selection sort given in this section so thatthe arguments of the function are iterators instead of indices, just as theynormally are in the STL.

3.1.3. Show that the best-case running time of selection sort is Ω(n2).

74 CHAPTER 3. SORTING

insertion_sort(a, i, j)

if (j − i >= 2)insertion_sort(a, i, j−1)insert a[j−1] into a[i..j−1)

Figure 3.6: The insertion sort algorithm

[60 12 37 42 25 38 16][60 12 37 42 25 38] 16[12 25 37 38 42 60] 16[12 16 25 37 38 42 60]

Figure 3.7: A run of the insertion sort algorithm (top level of recursion)

3.2 Insertion Sort

In the recursive version of the selection sort algorithm, the recursive sorting ofthe subarray is done at the end. What if we tried to do it at the beginning? Afterthe recursive call, we would only have to move the last element of the array toits correct position. This gives us a sorting algorithm called insertion sort, whichis shown in Figure 3.6. Note that inserting a[j−1] into a[i, j−1) causesone of the elements of the subarray to overflow onto index j−1.

Figure 3.7 illustrates a run of insertion sort. The first line shows the initialcontents of the array. The second line shows the subarray that will be recursivelysorted. The third line shows the result of sorting that subarray. The fourth lineshows the final contents of the array, after the insertion of the last element intothe sorted subarray.

3.2. INSERTION SORT 75

[60 12 37 42 25 38 16][60 12 37 42 25 38] 16[60 12 37 42 25] 38 16[60 12 37 42] 25 38 16[60 12 37] 42 25 38 16[60 12] 37 42 25 38 16[60] 12 37 42 25 38 16[60] 12 37 42 25 38 16[12 60] 37 42 25 38 16[12 37 60] 42 25 38 16[12 37 42 60] 25 38 16[12 25 37 42 60] 38 16[12 25 37 38 42 60] 16[12 16 25 37 38 42 60]

Figure 3.8: A run of the insertion sort algorithm (entire recursion)

Figure 3.8 illustrates the same run of insertion sort but this time, the entirerecursion is shown, not just the top level. The first half of the lines shows thecontents of the array at the beginning of every recursive call. The second halfshows these arrays at the end of the recursive calls. The portion of the arraybeing sorted by the current recursive call is shown between the two brackets.

The insertion sort algorithm is not tail recursive. So we cannot use the stan-dard technique to easily turn the recursion into a loop. But we can design aniterative version of insertion sort by focusing on the second half of the recursion,that is, on the work that is done after we return from the recursive calls. Essen-tially, this means writing a loop that performs insertions as in the second half ofFigure 3.8. The resulting algorithm is shown in Figure 3.9.

Figure 3.10 shows that same algorithm with the insertion step expanded intopseudocode. The while loops shifts elements to the right as long as they are

76 CHAPTER 3. SORTING

insertion_sort(a, i, j)

for k = i+1 to j−1insert a[k] into a[i..k)

Figure 3.9: An iterative version of the insertion sort algorithm

insertion_sort(a, i, j)

for k = i+1 to j−1// insert a[k] into a[i..k)x = a[k]p = k−1while (p >= i and x < a[p])

a[p+1] = a[p]−−p

a[p+1] = x

Figure 3.10: An iterative version of the insertion sort algorithm

larger than x, the element to be inserted. The loop stops if x is larger thana[p]. In that case, x needs to be copied to a[p+1]. The loop also stops if everp reaches i−1. This happens if x is smaller than every element in a[i..k). Inthat case, x needs to be copied to a[i], which happens to be the same elementas a[p+1].

The analysis of insertion sort is a little more complicated than that of selec-tion sort. Consider the iterative version of insertion sort. On any given array,the running time of insertion sort is essentially the total running time of all the

3.2. INSERTION SORT 77

insertions. But the running time of the insertions depends on how far a[k]needs to move into a[i..k). If r is the number of positions that a[k] mustmove left, then the running time of the insertion is of the form ar + b.

Let m be the size of a[i..k). The maximum value of r is m. Therefore,since m varies from 2 to n − 1, the running time of insertion on any array isbounded above by

n−1∑

m=2

(am+ b)

Once again, this summation is Θ(n2), which implies that the running time ofinsertion sort on any array is O(n2).

Now, is there an array that causes insertion sort to take that much time? Theanswer is yes: an array containing distinct elements sorted in decreasing order.With this particular array, in every insertion, a[k] is smaller than every elementin a[i..k) so it needs to move all the way to the beginning. This implies thateach insertion runs in time am+ b and the running time of insertion sort on thisarray is Θ(n2). Therefore, the worst-case running time of insertion sort is Θ(n2).

Let’s now do a best-case analysis. In each insertion, the minimum numberof positions that a[k] may need to move to the left is 0. In that case, eachinsertion takes time b. Therefore, the running time of insertion sort on anyarray is bounded below by

n−1∑

m=2

b

This summation is Θ(n), which implies that the running time of insertion sorton any array is Ω(n).

Is there an array that causes insertion sort to take this little time? Once again,the answer is yes: a sorted array. On a sorted array, in every insertion, a[k]is greater than or equal to every element in a[i..k). This implies that a[k]

78 CHAPTER 3. SORTING

does not need to move and each insertion runs in time b. Therefore, the runningtime of insertion sort on a sorted array is Θ(n) and the best-case running timeof insertion sort is Θ(n).

Earlier in these notes, we said that the best-case running time of an algorithmis not very useful. That’s usually the case but insertion sort is an exception.That’s because in practice, data that needs to be sorted is often already at leastpartially sorted. Selection sort always runs in quadratic time, even if the arrayis sorted, but insertion sort is able to take advantage of order and run faster onpartially sorted arrays.

Since the best-case and worst-case running times of insertion sort are not thesame, let’s do an average-case analysis. To keep things simple, let’s consider onlyarrays containing distinct elements. We will assume that each possible array isequally likely.

We need to determine the average value of r, that is, the average number ofpositions that a[k] needs to move left into a[i..k). Consider the set of ele-ments present in a[i..k] (including a[k]). Since these elements are equallylikely to be in any order, a[k] is equally likely to be the smallest element, or thesecond smallest, or the third smallest, and so on. In other words, r is equallylikely to be any number from 0 to m. Therefore, the average value of r is

1m+ 1

m∑

r=0

r =1

m+ 1m(m+ 1)

2=

m2

This implies that the average running time of each insertion is a(m/2) + b,so the average running time of insertion sort is

n−1∑

m=2

am2+ b

3.3. MERGESORT 79

Therefore, the average running time of insertion sort is Θ(n2). Asymptotically,this is the same as in the worst case but, in practice, we can reasonably expectthe average running time to be half of the worst case.

Exercises

3.2.1. Run the iterative version of insertion sort on an array containing the fol-lowing elements:

12 37 25 60 16 42 38

Use brackets to indicate the portion of the array that’s already sorted, asin the second half of Figure 3.8.

3.2.2. Implement the iterative version of the insertion sort algorithm.

3.3 Mergesort

It is possible to design a faster sorting algorithm by considering that insertionsort, like a recursive version of the sequential search algorithm, recurses on asubarray that’s only one smaller than the initial array. The binary search algo-rithm, on the other hand, divides the array in half. This reduces the numberof levels in the recursion to Θ(log n) and this reduction is the key factor in thespeed of the algorithm. If we could design a sorting algorithm that divides thearray in half, the depth of the recursion would be Θ(log n) and this may allowthe algorithm to run in time Θ(n log n).

Figure 3.11 presents such an algorithm. It’s called mergesort. After recur-sively sorting the two halves of the array, the algorithm merges these sorted

80 CHAPTER 3. SORTING

mergesort(a, i, j)

if (j − i >= 2)mid = floor((i + j) / 2)mergesort(a, i, mid)mergesort(a, mid, j)merge a[i..mid) and a[mid..j)

Figure 3.11: The mergesort algorithm

[60 12 37 42 25 38 16][60 12 37] [42 25 38 16][12 37 60] [16 25 38 42][12 16 25 37 38 42 60]

Figure 3.12: A run of the mergesort algorithm (top level of recursion)

arrays back into a single sorted array. Note that this is our first example of arecursive algorithm that makes two recursive calls.

Figure 3.12 illustrates a run of mergesort. The first line shows the initialcontents of the array. The second line shows the two subarrays that will berecursively sorted. The third line shows the result of sorting these subarrays.The fourth line shows the final contents of the array, after the merging of thetwo sorted subarrays.

Figure 3.13 illustrates that same run of mergesort but it shows the entire re-cursion, not just the top level. The first half of the lines show the initial contentsof the array at each level of the recursion, as if all the recursive calls at that levelwere executed simultaneously. The second half of Figure 3.13 shows, for each

3.3. MERGESORT 81

[60 12 37 42 25 38 16][60 12 37] [42 25 38 16][60] [12 37] [42 25] [38 16][60] [12] [37] [42] [25] [38] [16][60] [12 37] [25 42] [16 38][12 37 60] [16 25 38 42][12 16 25 37 38 42 60]

Figure 3.13: A run of the mergesort algorithm (entire recursion)

level of the recursion, the final contents of the array, after the merging step.

It is possible to describe mergesort non-recursively, but this requires moreeffort and the resulting algorithm is more complicated and no more efficientthan the recursive version. Therefore, mergesort is an excellent example of theusefulness of recursion.

The performance of mergesort relies on an efficient merging algorithm. Twosorted arrays can be merged by repeatedly choosing the smallest among theleading elements of both subarrays. A run of this algorithm is illustrated in Fig-ure 3.14. The algorithm is described in Figure 3.15. It is clear that the runningtime of the algorithm isΘ(n), where n is the total size of the arrays to be merged.

Note that mergesort is an example of a divide-and-conquer algorithm. Theidea is that initial problem is divided into smaller problems, these problems aresolved and then the solutions to these subproblems are combined to produce asolution to the initial problem.

An implementation of mergesort for arrays is shown in Figure 3.16. Thisimplementation uses the STL generic algorithm inplace_merge. Figures 3.17and 3.18 show a possible implementation of the merging algorithm.

We now analyze the running time of mergesort. If A is an array of size n≥ 2,

82 CHAPTER 3. SORTING

First array Second array Resulting array[12 37 42 60] [16 25 38] []

[37 42 60] [16 25 38] [12][37 42 60] [25 38] [12 16][37 42 60] [38] [12 16 25]

[42 60] [38] [12 16 25 37][42 60] [] [12 16 25 37 38]

[] [] [12 16 25 37 38 42 60]

Figure 3.14: A run of the merging algorithm

the running time of mergesort on A is given by the recurrence

T (A) = T (Lbn/2c) + T (Rdn/2e) +Θ(n)

where Lbn/2c is the array that consists of the first bn/2c elements of A, Rdn/2e is thearray that consists of the last dn/2e elements of A and the Θ(n) term is the totalrunning time of all the operations except for the recursive call. This includes themerging step.

If T (n) is the worst-case running time of mergesort, then the above recur-rence implies that when n≥ 2,

T (n)≤ T (bn/2c) + T (dn/2e) +Θ(n)

Floors and ceilings make it difficult to solve a recurrence by writing it out,as we did when analyzing the recursive version of selection sort. But to useinduction, we first need a candidate solution.

One way to get such a candidate is to perform a rough analysis based onsome simplifying assumptions. In this case, we’re going to assume that n is a

3.3. MERGESORT 83

merge(a, i, j, k)// merge sorted subarrays a[i..j) and a[j..k)

p = i // index of next element in a[i..j)q = j // index of next element in a[j..k)

b = array of size k−i // to store result of mergingr = 0 // index of next available position in b

while (p < j and q < k)if (a[p] < a[q])

b[r] = a[p]++p

elseb[r] = a[q]++q

++r// at least one of the subarrays has been// completely copied to b

copy a[p..j) to b[r..k−i)copy a[q..k) to b[r..k−i)

copy b to a[i..k)

Figure 3.15: An algorithm for the merging step of mergesort

84 CHAPTER 3. SORTING

template <class T>void mergesort(T a[], int start, int stop)// Sorts elements in a in increasing order using the// mergesort algorithm. Sorts elements in the range// [start,stop). Sorts according to the < operator.//// PRECONDITION: The indices are valid and start occurs// before stop.//// ASSUMPTION ON TEMPLATE ARGUMENT: Values of type T// can be compared using the < operator.

if (stop − start > 1) int middle = (start + stop) / 2;

mergesort(a, start, middle);mergesort(a, middle, stop);

std::inplace_merge(a + start, a + middle,a + stop);

Figure 3.16: An implementation of mergesort for arrays

3.3. MERGESORT 85

template <class T>void merge(T a[], int start, int middle, int stop)// Merges two sorted, consecutive subarrays. Runs in// linear time.//// PRECONDITION: The indices are valid and occur in// the following order: start, middle, stop. The// elements in each of the ranges [start,middle) and// [middle,stop) are sorted in increasing order.// There is enough memory available for allocating// stop − start values of type T.//// POSTCONDITION: The elements in the range [start,// stop) are sorted in increasing order according to// the < operator.//// ASSUMPTION ON TEMPLATE ARGUMENT: Values of type T// can be compared using the < operator.

int i1 = start; // index of next element to be// considered in first subarray

int i2 = middle; // same for second subarray

T ∗ result = new T[stop − start];// temporary array to store merged elements

int j = 0; // index of next available position in// result

...

Figure 3.17: An implementation of the merging algorithm (part 1 of 2)

86 CHAPTER 3. SORTING

template <class T>void merge(T a[], int start, int middle, int stop)

...

while (i1 < middle && i2 < stop) if (a[i1] < a[i2])

result[j] = a[i1];++i1;

else result[j] = a[i2];++i2;

++j;

// At this point, at least one of the two// subarrays has been completely copied to result.

// Copy the rest of the first subarray to result.std::copy(a + i1, a + middle, result + j);

// Copy the rest of the second subarray to result.std::copy(a + i2, a + stop, result + j);

// Copy result to original arraystd::copy(result, result + (stop − start),

a + start);

delete [] result;

Figure 3.18: An implementation of the merging algorithm (part 2 of 2)

3.3. MERGESORT 87

power of 2 so that the size of the arrays is always even and, therefore, floorsand ceilings are not needed. We’re also going to assume that the Θ(n) term isbounded above by bn for every n≥ 2, not just when n≥ n0. Then the recurrencebecomes

T (n)≤ 2T (n/2) + bn

We could solve this recurrence relation by writing it out and using a trick.Instead, let’s use another technique that doesn’t require tricks. This techniqueis also useful for some messy recurrences that are hard to write out and hard tosolve by induction. (We’ll see an example in the next section.)

The idea is to represent the execution of mergesort as a tree. Each node inthis tree corresponds to one of the recursive calls of mergesort. The top nodeof this recursion tree corresponds to the initial call, whose input is an array ofsize n. All the other nodes correspond to calls to mergesort on smaller arrays.For example, the top node has two children that each correspond to a call tomergesort on an array of size n/2. The bottom nodes of the tree correspond tothe bases cases of mergesort, that is, calls on arrays of size 1. These nodes don’thave children because they don’t generate further recursive calls. We call thesenodes the leaves of the tree. The other nodes are called internal nodes.

Now, associate to each node the running time of the corresponding call tomergesort minus the running time of the recursive calls. In the case of an internalnode, this is essentially the running time of the merging step. In the case of aleaf, this is the running time of a base case. The running time of mergesort isthe sum of the running times associated to every node in the tree. So we wantto compute this total.

At the top level of the tree, which we call level 0, we have one node thatcorresponds to an array of size n. The running time associated to this node is atmost bn. At the next level, level 1, we have 2 nodes that correspond to arrays ofsize n/2. The total time associated with these nodes is at most 2b(n/2) = bn.

88 CHAPTER 3. SORTING

At level 2, we have 4 nodes that correspond to arrays of size n/4. The total timeassociated with these nodes is at most 4b(n/4) = bn. It should be clear that thispattern will continue and that the total time associated with each of the levelsthat contain internal nodes will be at most bn. The bottom level, which containsthe leaves of the tree, is different. At that level, we have n nodes correspondingto arrays of size 1. The total running time associated with these nodes is an,where a = T (1).

To compute the total time associated with the nodes in the tree, we need toknow how many levels of internal nodes it contains. These levels correspondsto arrays of size n, n/2, n/4, . . . , 2. We’re assuming that n is a power of 2, sayn = 2k. Since n/2 = 2k−1, n/4 = 2k−2, . . . , 2 = 21, there are k levels of internalnodes. Therefore, T (n) ≤ kbn+ an. Since k = log n, this implies that T (n) isO(n log n).

Now that we have a candidate upper bound, we can remove our simplifyingassumptions and use induction to solve the original recurrence relation:

T (n)≤ T (bn/2c) + T (dn/2e) +Θ(n), if n≥ 2

As usual, the first step is to remove the asymptotics. We know that there existb > 0 and n0 such that for every n≥ n0, the Θ(n) term is bounded above by bn.Therefore, when n is greater than both 2 and n0, we have that

T (n)≤ T (bn/2c) + T (dn/2e) + bn

Let’s try to show that T (n) ≤ cn log n for every n ≥ 2. We start at 2 becausewhen n= 1, log n= 0 and running times are positive.

In the induction step, we will have that n ≥ 3 since the case n = 2 will becovered by the basis. But it is also convenient to have n ≥ n0 in the inductivestep. This way, the inductive step will be based on the above inequality and we

3.3. MERGESORT 89

will avoid having to do a special case for when n < n0. So let n1 = max(3, n0).The inductive step will be for n≥ n1. This implies that the inductive basis needsto cover all the values of n from 2 to n1 − 1.

So the basis is for n ∈ [2, n1−1]. We want T (n)≤ cn log n. This will be trueas long as we choose c to be greater than or equal to all the values T (n)/(n log n)for n ∈ [2, n1 − 1]. In other words,

c ≥maxT (n)/(n log n) | n ∈ [2, n1 − 1]

The inductive step is for n≥ n1. Suppose that

T (n− 1)≤ c(n− 1) log(n− 1)

This is our inductive hypothesis. We have that

T (n)≤ T (bn/2c) + T (dn/2e) + bn

And now we have two problems. First, when n ≥ 3, this inequality involvesvalues T (k) with k < n− 1, so we can’t apply the inductive hypothesis on thosevalues. Second, when n = 3, this inequality involves T (1) and the basis startsat 2. In fact, we’re not even trying to prove an upper bound on T (1). We coulddeal with T (1) as a constant but this would create a special case for n= 3.

To address the first problem, we will use strong induction. The inductivehypothesis will then be that T (k)≤ ck log k for every k ∈ [2, n− 1].

To address the second problem, we need to ensure that bn/2c ≥ 2. This istrue if n≥ 4. So we will make sure that in the inductive step, n≥ 4. The easiestway to do this is to redefine n1 =max(4, n0).

So, back to the inductive step. It is still for n ≥ n1. Assume that T (k) ≤ck log k for every k ∈ [2, n− 1]. Then, since bn/2c ≥ 2 and dn/2e < n, we have

90 CHAPTER 3. SORTING

that

T (n)≤ cbn/2c logbn/2c+ cdn/2e logdn/2e+ bn

≤ cbn/2c logdn/2e+ cdn/2e logdn/2e+ bn

= c(bn/2c+ dn/2e) logdn/2e+ bn

= cn logdn/2e+ bn

It is easy to show that that when n≥ 2, dn/2e ≤ (2/3)n. Therefore,

T (n)≤ cn log

23

n

+ bn= cn log n+ cn log23+ bn

This is at most cn log n if cn log(2/3)+ bn≤ 0. Note that since 2/3< 1, log(2/3)is negative. Therefore, this last inequality is equivalent to −cn log(3/2)+ bn≤ 0and cn log(3/2) ≥ bn. For this to hold, all we need to do is choose c ≥b/ log(3/2). This completes the inductive step and our proof that T (n) ≤cn log n. Therefore, the worst-case running time of mergesort is O(n log n).

A similar analysis shows that the best-case running time of mergesort isΩ(n log n). Therefore, the running time of mergesort on any array of size nis Θ(n log n).

Exercises

3.3.1. Run mergesort on an array containing the following elements:

22 37 25 60 16 42 38 46 19

Show the top level of the recursion, as in Figure 3.12. Then show theentire recursion, as in Figure 3.13.

3.4. QUICKSORT 91

3.3.2. Run the merging algorithm on the following two sorted arrays:

First array Second array[16 22 25 37 60] [19 38 42 46]

Illustrate this run as in Figure 3.14.

3.3.3. Use induction to prove that the best-case running time of mergesort isΩ(n log n).

3.4 Quicksort

Any sorting algorithm must examine every element in the input array so therunning time of any sorting algorithm is Ω(n). (Note that this statement appliesto every sorting algorithm, not just one of them. It is a statement about thecomputational complexity of the sorting problem itself, not about the runningtime of a particular algorithm.)

Is it possible to sort in timeΘ(n)? Anything smaller thanΘ(n log n), meaninganything o(n log n)? If the elements to be sorted are small numbers, then thereare algorithms such as counting sort that can sort in linear time (in exchange forpossibly using a lot of memory). We will cover counting sort later in these notes.

But a generic sorting algorithm should make as little assumptions as possibleabout the elements to be sorted. One possibility is to do what the STL does andonly require that elements be comparable (with the < operator, for example).Such sorting algorithms are called comparison-based. Selection sort, insertionsort and mergesort are all examples of comparison-based sorting algorithms.Counting sort, on the other hand, is not comparison-based.

So, is it possible for a comparison-based sorting algorithm to run in timeΘ(n)? Anything o(n log n)? Later in these notes, we will show that the answer is

92 CHAPTER 3. SORTING

no: the running time of every comparison-based sorting algorithm is Ω(n log n).Even on average. This means that, asymptotically, no comparison-based sortingalgorithm can run faster than mergesort.

However, there is a sorting algorithm that in practice tends to run faster thanmergesort. (Of course, it can only be faster by a constant factor.) We have stud-ied three sorting algorithms so far: selection sort, insertion sort and mergesort.All three algorithms work by dividing the array in two parts, recursively sort-ing one or two subarrays and combining these sorted subarrays. In the case ofselection sort and insertion sort, the array is divided very unevenly into a sin-gle element on one side and all other elements on the other side. This leads toquadratic running times. Mergesort achieves Θ(n log n) by dividing the array asevenly as possible.

It is interesting to note, however, that insertion sort and mergesort havesomething in common: both algorithms divide the array quickly and then spendmost of the effort combining the two sorted subarrays. In the case of insertionsort, one of these subarrays is just a single element. Selection sort, on the otherhand, spends most of the effort dividing the array and essentially no effort com-bining the resulting sorted subarray with the element that was set aside. Can wedesign an algorithm that divides the array as evenly as possible, like mergesort,but that spends most of the effort dividing the array, like selection sort, so thatessentially no effort is required to combine the two sorted halves?

A key observation is that combining two sorted halves would be trivial ifall the elements in one half were smaller than all the elements in the other half.This observation leads to the quicksort algorithm, which is shown in Figure 3.19.First, an array element is chosen to play the role of a pivot. Second, the array isdivided according to that pivot: all elements smaller than the pivot are movedto the left, all the others are moved to the right. Note that this may cause thepivot to move. Third, the two subarrays are recursively sorted. After that, there

3.4. QUICKSORT 93

quicksort(a, i, j)

if (j − i >= 2)choose a pivot in a[i..j)partition a[i..j) around the pivotk = new index of the pivotquicksort(a, i, k)quicksort(a, k+1, j)

Figure 3.19: The quicksort algorithm

[60 12 37 42 25 38 16][12 37 25 38 16] 42 [60][12 16 25 37 38] 42 [60][12 16 25 37 38 42 60]

Figure 3.20: A run of the quicksort algorithm (top level of recursion)

is nothing left to do.Figure 3.20 illustrates a run of quicksort. The first line shows the initial

contents of the array. The second line shows the two subarrays that will berecursively sorted. Element 42 was used as pivot. The third line shows theresult of sorting these subarrays. The fourth line shows the final contents of thearray.

Note that if the pivot ends up being the smallest or largest element of thearray, then one of the subarrays will be empty while the other will contain theother elements. This is the reason why the pivot is excluded from the subarrays:it ensures that the recursive calls make progress towards the base case even ifthe pivot ends up being the smallest or largest element of the array.

94 CHAPTER 3. SORTING

[60 12 37 42 25 38 16][12 37 25 38 16] 42 [60][12] 16 [37 25 38] 42 [60][12] 16 [25] 37 [38] 42 [60][12] 16 [25 37 38] 42 [60][12 16 25 37 38] 42 [60][12 16 25 37 38 42 60]

Figure 3.21: A run of the quicksort algorithm (entire recursion)

Figure 3.21 illustrates that same run of quicksort but by showing the entirerecursion, not just the top level. The first half of the lines show the initial con-tents of the array at every level of the recursion. As if all the recursive calls at acertain level were executed simultaneously. The elements that were chosen aspivots are 42, 16 and 37. The second half of Figure 3.21 shows, for every levelof the recursion, the final contents of the array, after the recursive sorting of thesubarrays.

Like mergesort, quicksort is an example of a recursive algorithm that makestwo recursive calls. And as with mergesort, it is also possible to implement quick-sort non-recursively but this requires more effort and the resulting algorithm ismore complicated and no more efficient than the recursive version. Therefore,quicksort is another good example of recursion.

As you may have realized from the above illustrations, the performance ofquicksort depends heavily on the choice of pivot. We want a pivot that splits thearray as evenly as possible but also a pivot that doesn’t take too long to find.

In terms of splitting, the ideal pivot is the median element of the array. Andthere is an algorithm that can find the median element of an array in linear time.Therefore, with this choice of pivot, the running time of quicksort on an array A

3.4. QUICKSORT 95

is given by the recurrence

T (A) = T (Lb(n−1)/2c) + T (Rd(n−1)/2e) +Θ(n)

where, in general, Lk is the array that consists of the smallest k elements of Aand Rk is the array that consists of the largest k elements of A. The Θ(n) term isessentially the computation of the median and the partitioning of the array. Thisis virtually the same recurrence relation we had for mergesort and its solutionis also Θ(n log n). Therefore, quicksort runs in time Θ(n log n) when the medianelement is used as pivot. However, the linear-time algorithm that computesthe median is not that simple, with the consequence that the hidden constantfactors are somewhat large. So it’s unlikely that this version of quicksort wouldrun faster than mergesort in practice.

A much faster option is to simply pick the first element of the array as pivot.But if the array is already sorted, the smallest element would be used as pivot,leading to the most uneven partition possible. What happens next depends onthe details of the partitioning step. For example, suppose that elements arenot reordered unnecessarily during the partitioning. Then, the array will bepartitioned into an empty array on one side and a sorted array on the other.This implies that the smallest element will always be used as pivot and thatevery partition will be as uneven as possible. In this case, the running time ofquicksort is given by

T (A) = T (Rn−1) +Θ(n)

where the Θ(n) term is essentially the time it takes to partition the array. This isessentially the same recurrence relation that gave the running time of selectionsort. This implies that when the first element is used as pivot, the running timeof quicksort on a sorted array is Θ(n2).

In practice, it is not that uncommon for at least large portions of the input

96 CHAPTER 3. SORTING

array to already be sorted. Therefore, using the first element as pivot would leadquicksort to often run slowly. Similar problems occur when the last or middleelements are used as pivots.

Another idea is to use as pivot the median of the first, middle and last ele-ments of the array. If the array is sorted, then this would cause the median tobe used as pivot, leading to the best possible partitions and an optimal runningtime of Θ(n log n). It’s not hard to construct arrays that would be unevenly par-titioned under this choice of pivot (see one of the exercises) but they’re probablyunlikely to occur in practice.

The safest option is probably to use a random element as pivot. This givesa bad partition only if the pivot is among the smallest or largest elements ofthe array. The probability that this should happen very often should be small.In fact, in the next section, we will show that using a random element as pivotleads to an average running time of Θ(n log n). The average is taken over all thepossible pivot choices and is the same for every input array.

An algorithm that uses randomness is said to be a randomized algorithm. (Incontrast, the other versions of quicksort are called deterministic.) Even thoughthe randomized version of quicksort has an average running time ofΘ(n log n), itis still possible that the algorithm would repeatedly choose the smallest elementas pivot while running on some array. This is unlikely but possible. In that case,the running time of quicksort would once again degenerate to theΘ(n2) runningtime of selection sort.

Because this quadratic running time occurs when the partitions are as un-even as possible, it is tempting to conclude that the worst-case running timeof quicksort is Θ(n2). But the truth is we haven’t ruled out the possibility thatquicksort may run even slower for some other type of partition. Fortunately, ourintuition turns out to be correct: in the next section, we will also show that theworst-case running time of quicksort is Θ(n2).

3.4. QUICKSORT 97

It is important to point out that even though the worst-case running timeof quicksort is Θ(n2), which is as bad as the running time of selection sort andinsertion sort, in practice, quicksort tends to run faster than even mergesort.The fact that the average running time of quicksort is Θ(n log n), which is thesame as mergesort, explains why this is possible.

Study Questions

3.4.1. What is the fastest possible asymptotic running time of a comparison-based sorting algorithm?

3.4.2. What are the worst-case and average-case running times of quicksort?

Exercises

3.4.3. Run quicksort on an array containing the following elements:

22 37 25 60 16 42 38 46 19

Show the top level of the recursion, as in Figure 3.20. Then show theentire recursion, as in Figure 3.21. Use the first element as pivot.

3.4.4. Consider the version of quicksort that chooses the median of the first,middle and last elements as pivot. Assume that elements are not reorderedunnecessarily during the partitioning step. Describe arrays that would bepartitioned in the best and worst possible ways by this choice of pivot.

3.4.5. Implement quicksort.

98 CHAPTER 3. SORTING

3.5 Analysis of Quicksort

In the previous section, we saw that when quicksort splits the array unevenly, itsrunning time is Θ(n2). In this section, we will show that it can’t be any worse:quicksort always runs in time O(n2). This result will apply to both the deter-ministic and randomized versions of quicksort. We will also show that when thepivot is chosen at random, the average running time of quicksort is Θ(n log n),which is the same as the running time of mergesort. The average is taken overall the possible pivot choices and, asymptotically, is the same for every inputarray.

We start with the worst-case running time. If A is an array of size n≥ 2, therunning time of quicksort is given by the recurrence

T (A) = T (Lk) + T (Rn−k−1) +Θ(n)

where k is the index of the pivot after the first partition, Lk is the array thatconsists of the smallest k elements of A and Rn−k−1 is the array that consists ofthe largest n− k−1 elements of A. The Θ(n) term is essentially the partitioningof the array.

As usual, we first remove the asymptotics. Since we are interested in anupper bound on the running time, we get that

T (A)≤ T (Lk) + T (Rn−k−1) + bn

if n is at greater than or equal to both 2 and some number n0.

Now, consider a recursion tree that represents the execution of quicksort onA. Internal nodes correspond to recursive calls that aren’t base cases. These callsare on arrays of size at least 2. Each of these calls uses an element of the arrayas a pivot. Because the pivot is not included in further recursive calls, each of

3.5. ANALYSIS OF QUICKSORT 99

the internal nodes essentially “consumes” a pivot. This implies that there can beno more than n internal nodes.

The running time associated with these internal nodes is the partition time.If the array at an internal node is of size at least n0, then that time is at mostbn. Otherwise, let T (k) be the worst-case running time of quicksort on an arrayof size k. The running time associated with internal nodes where the array is ofsize less than n0 is at most max(T (2), . . . , T (n0−1)). Let a be this number. Thenthe total time associated with internal nodes is at most n(bn) + na = bn2 + an.

The leaves of the recursion tree correspond to the base cases, which are forarrays of size 0 or 1. Each leaf is the child of an internal node. Since there are atmost n internal nodes, there are at most 2n leaves. The running time associatedwith each leaf is either T (0) or T (1). Let c = max(T (0), T (1)). Then the totalrunning time associated with all the leaves is at most (2n)c = 2cn.

The running time of quicksort on A is the total time associated with all theleaves in the recursion tree. The above analysis shows that T (A)≤ bn2+an+2cn.This implies that the worst-case running time of quicksort is O(n2).

In the previous section, we mentioned that when quicksort splits the arrayevenly, its running time is Θ(n log n). Recursion trees can also be used to showthat quicksort can’t run any faster: its best-case running time is Ω(n log n). (Anexercise asks you to work out the details.) This result also applies to both thedeterministic and randomized versions of quicksort.

We now turn to an average-case analysis of the randomized version of quick-sort, the one that chooses a random element as pivot. In the case of determin-istic algorithms, the average running time T (n) is the average running time ofthe algorithm over all inputs of size n. In the case of a randomized algorithm,the running time T (A) is a random variable that depends on the outcome of arandom event. In the case of quicksort, this event is the choice of pivot. Withrandomized algorithms, we are interested in the expected value of T (A), which is

100 CHAPTER 3. SORTING

the average value of T (A) over all possible outcomes of this random event. Wecall that the expected running time of the algorithm and denote it E(T (A)).

Recall that if A is an array of size n greater than or equal to both 2 and n0,then the running time of quicksort is given by the recurrence

T (A) = T (Lk) + T (Rn−k−1) + bn

where k is the index of the pivot after the first partition. To simplify the analysis,assume that all the elements of A are distinct. Then since each element is equallylikely to be chosen as pivot, we get that

E(T (A)) =1n

n−1∑

k=0

(E(T (Lk)) + E(T (Rn−k−1)) + bn)

Let T (n) be the maximum expected running time over all arrays of size n. Then

T (n)≤1n

n−1∑

k=0

(T (k) + T (n− k− 1) + bn)

By separating the summation, we get

T (n)≤1n

n−1∑

k=0

T (k) +n−1∑

k=0

T (n− k− 1) +n−1∑

k=0

bn

Now, the first two summations turn out to contain the same exact terms, just inreverse order. And the third summation equals bn2. Therefore,

T (n)≤1n

2n−1∑

k=0

T (k) + bn2

=2n

n−1∑

k=0

T (k) + bn

3.5. ANALYSIS OF QUICKSORT 101

Recall from the previous section that we’re hoping that T (n) = O(n log n).So let’s try to show that T (n)≤ cn log n for every n≥ 2.

As was the case in the analysis of mergesort, it’s convenient for the inductivestep to be for values of n that are at least n0 so that the above inequality appliesto T (n). Since the basis will take care of the case n = 2, in the inductive step,we will also have n ≥ 3. Let n1 = max(3, n0). Then the basis will be for n ∈[2, n1 − 1] and the inductive step will be for n≥ n1.

So the basis is for n ∈ [2, n1−1]. We want T (n)≤ cn log n for every n in thatrange. This is easy to ensure: simply choose c ≥max(T (2), . . . , T (n1 − 1)).

The inductive step is for n ≥ n1. Suppose that T (k) ≤ ck log k for everyk ∈ [2, n− 1]. Then

T (n)≤2n

T (0) + T (1) +n−1∑

k=2

ck log k

+ bn

Let a =max(T (0), T (1)). Then

T (n)≤4an+

2cn

n−1∑

k=2

k log k+ bn

We need an upper bound on the summation∑n−1

k=2 k log k.

A first idea is n2 log n because each term in this summation is bounded aboveby n log n. By using this upper bound, we get that

T (n)≤4an+

2cn(n2 log n) + bn=

4an+ 2cn log n+ bn

Recall that we’re trying to show that T (n)≤ cn log n. We’re obviously not goingto get that from the above inequality. We need a tighter upper bound on the

102 CHAPTER 3. SORTING

summation∑n−1

k=2 k log k.

A second idea is to use the fact that each log k in the summation is at mostlog n:

n−1∑

k=2

k log k ≤ log nn−1∑

k=2

k = log n(n− 1)n

2≤

n2

2log n

This gives us

T (n)≤4an+

2cn

n2

2log n

+ bn=4an+ cn log n+ bn

This is still not small enough. So we need an even tighter upper bound on thesummation

∑n−1k=2 k log k.

A third idea is to split the summation in half and get independent upperbounds on each of the halves. As mentioned earlier in these notes, this is astandard and useful technique.

n−1∑

k=2

k log k =bn/2c∑

k=2

k log k+n−1∑

k=bn/2c+1

k log k

In the first summation, each log k is at most logbn/2c. In the second summation,each log k is at most log n. Therefore,

n−1∑

k=2

k log k ≤ logbn/2cbn/2c∑

k=2

k+ log nn−1∑

k=bn/2c+1

k

3.5. ANALYSIS OF QUICKSORT 103

Since logbn/2c ≤ log(n/2) = log n− 1, we get

n−1∑

k=2

k log k ≤ (log n− 1)bn/2c∑

k=2

k+ log nn−1∑

k=bn/2c+1

k

By distributing (log n− 1), we get

n−1∑

k=2

k log k = log nbn/2c∑

k=2

k−bn/2c∑

k=2

k+ log nn−1∑

k=bn/2c+1

k

Then, by combining the first and third summations, we get

n−1∑

k=2

k log k ≤ log nn−1∑

k=2

k−bn/2c∑

k=2

k

It’s easy to get an upper bound on the first summation:

n−1∑

k=2

k =(n− 1)n

2− 1≤

(n− 1)n2

Because the second summation is negated, we need a lower bound for it. Wehave

bn/2c∑

k=2

k =bn/2c(bn/2c+ 1)

2− 1

Since bn/2c ≥ n/2− 1/2, we get

bn/2c∑

k=2

k ≥(n/2− 1/2)(n/2+ 1/2)

2− 1

104 CHAPTER 3. SORTING

Therefore,

bn/2c∑

k=2

k ≥(n− 1)(n+ 1)

8− 1=

n2 − 18− 1=

n2

8−

98

By putting the two bounds together, we get

n−1∑

k=2

k log k ≤ log n

(n− 1)n2

n2

8−

98

=n2 log n

2−

n log n2−

n2

8+

98

We can simplify this. In the inductive step, n≥ 3. This implies that

n log n2

>32>

98

Therefore,n−1∑

k=2

k log k ≤n2 log n

2−

n2

8

Putting this back in the recurrence relation, we get

T (n)≤4an+

2cn

n2 log n2

−n2

8

+ bn=4an+ cn log n−

cn4+ bn

We will have T (n)≤ cn log n if

4an−

cn4+ bn≤ 0

which is equivalent tocn4≥

4an+ bn

3.6. PARTITIONING ALGORITHM 105

and

c ≥16an2+ 4b

Since n≥ 3, this will happen if

c ≥16a

9+ 4b

Therefore, all we need to do is choose c ≥ 16a/9 + 4b. This completes theinductive step and our proof that T (n)≤ cn log n for every n≥ 2.

Therefore, for every array A of size n, the expected running time of therandomized version of quicksort is O(n log n). As mentioned earlier, the best-case running time of the deterministic and randomized versions of quicksort isΩ(n log n). This implies that the expected running time of the randomized ver-sion of quicksort is Θ(n log n).

Exercises

3.5.1. Show that the best-case running time of quicksort is Ω(n log n). Hint:Starting at the top, consider the minimum number of elements that mustbe present at every level of a recursion tree.

3.6 Partitioning Algorithm

Up until now, we haven’t looked in detail at how quicksort partitions the array.A simple algorithm uses a temporary array and performs two passes of the inputarray. The first pass places all the elements less than or equal to the pivot at thebeginning of the temporary array. The pivot is placed after those elements. Thesecond pass places all the elements larger than the pivot after the pivot in the

106 CHAPTER 3. SORTING

42 12 25 16 3742 12 37 16 25

||42 12 37 16 |25|42 |12 37 16 |2512 |42 |37 16 |2512 |42 37 |16 |2512 16 |37 42||2512 16 25 42 37

Figure 3.22: A run of the in-place partitioning algorithm

temporary array. The temporary array is then copied back to the original array.This can also be done in a single pass by placing the larger elements in reverseorder at the end of the temporary array.

The above algorithm has the important weakness of requiring the use of atemporary array. In this section, we will learn an algorithm that avoids that. Theonly extra memory used by the algorithm will be a constant number of variables.Such an algorithm is called in-place.

The idea is to gradually grow the portions of the array that contain thesmaller and larger elements. First, the pivot is placed at the end of the arrayby swapping it with the last element. This is illustrated by the first two lines ofFigure 3.22 where we assume that 25 was chosen as the pivot.

Next, we split the array into four regions in this order: elements known tobe less than or equal to the pivot, elements known to be greater than the pivot,elements whose status is unknown, and the pivot. Initially, the smaller and largerregions are empty and all the elements except for the pivot are in the unknownregion. This is illustrated by the third line of Figure 3.22. Vertical bars are usedto separate the regions.

We then perform a single pass of the unknown region. When an element is

3.6. PARTITIONING ALGORITHM 107

larger than the pivot, we add it to the larger region by moving the boundarybetween the larger and unknown regions, as illustrated by the fourth line ofFigure 3.22. When an element in the unknown region is less than or equal tothe pivot, we swap it with the first element of the larger region and move all theboundaries one position to the right, as illustrated by the fifth line of Figure 3.22.This continues until the unknown region is empty, as illustrated by the next twolines of Figure 3.22. The last step is to swap the pivot with the first element ofthe larger region, as illustrated by the last line of Figure 3.22.

Figure 3.23 gives pseudocode for this algorithm. As usual, the portion of thearray to be partitioned in specified by indices i and j. The argument k is theindex of the pivot. Since the partitioning will likely cause the pivot to move, thisargument is updated by the algorithm. This is necessary because, as can be seenin Figure 3.19, quicksort needs to know where the pivot is after the array hasbeen partitioned.

The algorithm uses indices p, q and r to split the array into the four regions.The smaller region is a[i..p). The larger region is a[p..q). The unknownregion is a[q..r). The pivot is copied to a[r].

Exercises

3.6.1. Run the in-place partitioning algorithm on an array containing the fol-lowing elements:

22 37 25 60 16 42 38 46 19

Use 37 as the pivot. Show the result of the various steps as in Figure 3.22.

3.6.2. Implement the in-place partitioning algorithm.

108 CHAPTER 3. SORTING

partition(a, i, j, k)

p = iq = ir = j − 1swap a[k] with a[r]pivot = a[r]while (q < r)

if (a[q] <= pivot)swap a[p] with a[q]++p

++qswap a[p] with a[r]k = p

Figure 3.23: The in-place partitioning algorithm

3.7. A SELECTION ALGORITHM 109

3.7 A Selection Algorithm

The Selection Problem is as follows: given an array A and a number r, whatelement has rank r in A? The rank of an element in A is the position it wouldoccupy if A was sorted. For example, when r = 1, that element is the smallestelement of A. When r = n, that element is the largest element of A. And whenr = bn/2c, that element is the (lower) median of A.

The Selection Problem is related to sorting because an obvious algorithm forthe Selection Problem is to sort the array and then look up the element at indexk− 1. Using mergesort, this gives us a Θ(n log n) algorithm.

But sorting seems like overkill. A sorting algorithm does a lot more workthan is apparently necessary since it allows us to retrieve the element with anyrank in A, not just the element with rank r. So the question is, can we sort fasterthan in time Θ(n log n)?

The idea is to partially sort the array. For example, we could modify selectionsort to place the minimum element at the beginning of the array. After r passes,the algorithm would have found the element with rank r. This gives us an algo-rithm that runs in time Θ(rn). When r is a constant, this linear, which is fasterthan Θ(n log n). But when n= bn/2c, for example, this gives us a quadratic-timealgorithm, which is worse than just sorting with mergesort.

It’s not clear how we can apply this partial sorting idea to mergesort becauseonce the array is split in two, we have no idea which half contains the elementwith rank r.

So let’s consider quicksort. Let k be the index of the pivot after the firstpartition. Then the pivot has rank q = k + 1. The elements to left of the pivotare all the elements that are smaller than the pivot. So if r < q, we know thatthe element with rank r is in the left subarray. If r = q, then the element withrank r is the pivot. If r > q, then the element with rank r is in the right subarray.

110 CHAPTER 3. SORTING

[60 12 37 42 25 38 16] r = 4, pivot = 42[12 37 25 38 16] 42 60 r = 4, pivot = 1612 16 [37 25 38] 42 60 r = 2, pivot = 3712 16 25 37 38 42 60 return 37

Figure 3.24: A run of the selection algorithm

But note that in that subarray, the rank of that element is not r, it’s r − q.

Figure 3.24 shows a sample run of this algorithm. The subarray currentlyunder consideration is indicated by brackets.

The resulting algorithm is shown in Figure 3.25, with some adjustments toaccount for the fact that the input array is a[i..j), not all of a. The algorithmassumes that r is valid, that is, a number in the range [1, j − i].

We now analyze the running time of this algorithm. Let T (A, r) be the run-ning time of the selection algorithm on array A and rank r. If A is of size n ≥ 2,then

T (A, r)≤max(T (Lq−1, r), T (Rn−q, r − q)) +Θ(n)

where q is the rank of the pivot after the first partition, Lq−1 is the array thatconsists of the smallest q−1 elements of A and Rn−q is the array that consists ofthe largest n− q elements of A. The Θ(n) term is essentially the partitioning ofthe array.

As usual, we first remove the asymptotics. Since we are interested in anupper bound on the running time, we get that

T (A, r)≤max(T (Lq−1, r), T (Rn−q, r − q)) + bn

if n is greater than or equal to both 2 and n0.

It should be clear that the worst-case running time occurs when the algorithm

3.7. A SELECTION ALGORITHM 111

select(a, i, j, r)

if (j − i >= 2)choose a pivot in a[i..j)partition a[i..j) around the pivotk = new index of the pivotq = k − i + 1 // rank of pivotif (r = q)

return a[k]else if (r < q)

return select(a, i, k, r)else

return select(a, k + 1, j, r − q)else

return a[0]

Figure 3.25: A selection algorithm based on quicksort

112 CHAPTER 3. SORTING

partitions the array as unevenly as possible. That’s because this will cause therecursion to be as deep as possible and, at the same time, cause each of thepartitions to take as long as possible. In that case, if T (n) is the worst-caserunning time of the algorithm, we get the recurrence

T (n)≤ T (n− 1) + bn

This is the same recurrence we got for selection sort and, as we saw earlier, itssolution is O(n2).

So we now know that the worst-case running time of the selection algorithmis O(n2). But is it actually that large? The answer depends on how the pivot ischosen. If the pivot is always the first element of the array, then, if the array issorted in reverse and r = 1, the pivot will always be the largest element of thearray and the algorithm will always choose to go left, resulting in the followingrecurrence:

T (A, r) = T (Ln−1, r) +Θ(n)

This implies that in this case, the running time is Θ(n2). Therefore, the worst-case running time of the algorithm is Θ(n2) for this choice of pivot.

A similar argument shows that the worst-case running time is also Θ(n2) ifthe pivot is chosen to be the median of the first, middle and last elements ofthe array. And if the pivot is chosen at random, it is unlikely but possible forthe pivot to always end up being the largest element of the array. Therefore,all three versions of the selection algorithm have a worst-case running time ofΘ(n2) .

We now analyze the expected running time of the randomized version ofthe selection algorithm. First, what are we hoping for? As with quicksort, it’sreasonable to think that a random pivot will split the array somewhat evenlymost of the time. If the array is always split evenly, then, assuming that n is

3.7. A SELECTION ALGORITHM 113

a power of 2, the recurrence for the worst-case running time of the algorithmwould become

T (n)≤ T (n/2) + bn

Writing this out, we get

T (n)≤ T (n/2) + bn

T (n/2)≤ T (n/4) + b(n/2)

T (n/4)≤ T (n/8) + b(n/4)...

T (2)≤ T (1) + b2

Adding all these inequalities gives us

T (n)≤ T (1) + b(2n)

which implies that T (n) is Θ(n). So it’s reasonable to hope that the expectedrunning time of the selection algorithm is Θ(n).

Now let’s see if that’s really the case. We said earlier that

T (A, r)≤max(T (Lq−1, r), T (Rn−q, r − q)) + bn

where q is the rank of the first pivot. We’re going to need to be more precise:

T (A, r)≤

T (Lq−1, r) + bn if r < q

a+ bn if r = q

T (Rn−q, r − q) + bn if r > q

where a is a constant.

114 CHAPTER 3. SORTING

To simplify the analysis of the expected running time, assume that all theelements of A are distinct. The rank q of the pivot is a number from 1 to n. Thensince each element is equally likely to be chosen as pivot, the rank of the pivotis equally likely to be any number in the interval [1, n]. Therefore,

E(T (A, r))≤1n

r−1∑

q=1

E(T (Rn−q, r − q) + a+n∑

q=r+1

E(T (Lq−1, r)

+ bn

=1n

r−1∑

q=1

E(T (Rn−q, r − q) +n∑

q=r+1

E(T (Lq−1, r)

+an+ bn

Once again, this holds if n is greater than or equal to both 2 and n0.

Since we’re hoping for a linear expected running time, let’s try to show thatE(T (A, r))≤ cn, for every n≥ 1.

Once again, it’s convenient for the inductive step to be for n≥ n0 so that theabove inequality applies to T (n). Since the basis will take care of the case n= 1,we also have that n ≥ 2 in the inductive step. Let n1 = max(2, n0). Then thebasis should will be for n ∈ [1, n1 − 1] and the inductive step will be for n≥ n1.

So the basis is for n ∈ [1, n1 − 1]. We want E(T (A, r)) ≤ cn. This is easy toensure: simply choose c to be the maximum value of T (A, r) for every A of sizen ∈ [1, n1 − 1].

The inductive step is for n≥ n1. Suppose that E(T (B, s))≤ ck for every arrayB of size k ∈ [1, n− 1]. Recall that

E(T (A, r))≤1n

r−1∑

q=1

E(T (Rn−q, r − q) +n∑

q=r+1

E(T (Lq−1, r)

+an+ bn

In the first summation, the size of the array R ranges from n−1 down to n−(r−1),and n − (r − 1) ≥ 1. In the second summation, the size of L ranges from r to

3.7. A SELECTION ALGORITHM 115

n−1, and r ≥ 1. Therefore, the inductive hypothesis can be applied to the termsof these summations. The result is

E(T (A, r))≤1n

r−1∑

q=1

c(n− q) +n∑

q=r+1

c(q− 1)

+an+ bn

Now, it is useful to simplify the terms in these summations. First, the constantc can be factored out. Second, we can set k = n− q in the first summation andk = q− 1 in the second summation. This result is

E(T (A, r))≤cn

n−1∑

k=n−(r−1)

k+n−1∑

k=r

k

!

+an+ bn

So each of these summations contains some of the trailing terms of the sequence1,2, . . . , n−2, n−1. The first summation has r−1 terms while the second summa-tion has n− r terms. It’s not hard to see that the sum of these two summations islargest when the number of terms in the two summations is as even as possible.

Since the total number of terms in both summations is (r−1)+(n−r) = n−1,the number of terms is even when each summation contains (n−1)/2 terms. Thisis only possible if n is odd. In that case, each summation must start at

k = n−n− 1

2= n−

n2+

12=

n2+

12=ln

2

m

If n is even, then the number of terms is as even as possible when one summa-tion has (n−1)/2−1/2= n/2−1 terms while the other has (n−1)/2+1/2= n/2terms. We can get an upper bound on the sum of the two summations by makingboth summations contain n/2 terms. This implies that both summations start atk = n− n/2= n/2, which still equals dn/2e because n is even.

116 CHAPTER 3. SORTING

Therefore, in either case, we have an upper bound on the sum of the twosummations when both summations start at k = dn/2e. This implies that

E(T (A, r))≤2cn

n−1∑

k=dn/2e

k+an+ bn

We now need an upper bound on this summation.

A first idea is to use to the fact that

n−1∑

k=dn/2e

k ≤n−1∑

k=1

k =(n− 1)n

2

By using this upper bound, we get that

E(T (A, r))≤2cn(n− 1)n

2+

an+ bn= c(n− 1) +

an+ bn= cn− c +

an+ bn

Recall that we’re trying to show that E(T (A, r))≤ cn. We’re not going to be ableto get that from the above inequality. So we need a smaller upper bound on thesummation.

A second idea is to use the fact

n−1∑

k=dn/2e

k =n−1∑

k=1

k−dn/2e−1∑

k=1

k

Then,n−1∑

k=dn/2e

k =(n− 1)n

2−(dn/2e − 1)dn/2e

2

3.7. A SELECTION ALGORITHM 117

Since dn/2e ≥ n/2,

n−1∑

k=dn/2e

k ≤(n− 1)n

2−(n/2− 1)(n/2)

2

from which we getn−1∑

k=bn/2c

k ≤38

n2

Using this upper bound in our inequality for E(T (A, r)), we get that

E(T (A, r))≤2cn

38

n2 +an+ bn=

34

cn+an+ bn

We will have E(T (A, r))≤ cn if

34

cn+an+ bn≤ cn

which is equivalent tocn4≥

an+ bn

andc ≥

4an2+ 4b

Since we are in the inductive step of the proof, we have that n≥ 2. This impliesthat the right side of this last inequality is largest when n= 2. Therefore, all weneed to do is choose c ≥ 4a/22 + 4b = a+ 4b.

This completes the inductive step and our proof that E(T (A, r)) ≤ cn forevery n ≥ 1. Therefore, for every array A of size n ≥ 1, the expected runningtime of the selection algorithm is O(n). Since the algorithm performs at least a

118 CHAPTER 3. SORTING

partition on A, we have that the best-case running time is Ω(n). Therefore, theexpected running time of the algorithm is Θ(n).

Note that there is a deterministic selection algorithm that has a Θ(n) worst-case running time. It achieves that by choosing the pivot in a clever way. Butthis algorithm is more complicated and in practice slower than the randomizedalgorithm we learned in this section.

Exercises

3.7.1. Run the selection algorithm on an array containing the following ele-ments:

22 37 25 60 16 42 38 46 19

Use the first element of each subarray as pivot. After each partition, showthe new subarray under consideration, the new r and the new pivot, as inFigure 3.24. Do this for r = 1, 5 and 7.

3.8 A Lower Bound for Comparison-Based Sorting

We have seen in this chapter that it is possible to sort n elements in timeO(n log n). Mergesort always runs that fast while quicksort runs that fast onaverage. Now, is it possible to sort faster? In other words, is there an algorithmthat can sort n elements in time o(n log n)?

In this section, we will show that this is not possible, at least for an importantclass of sorting algorithms.

The four sorting algorithms we studied in this chapter — selection sort, in-sertion sort, mergesort and quicksort — have an important property: the only

3.8. A LOWER BOUND FOR COMPARISON-BASED SORTING 119

assumption they make about the elements begin sorted is that these elementscan be compared. This implies that these algorithms can be used on many dif-ferent types of elements, for example, numbers, strings, times and dates.

In other words, these sorting algorithms are generic, in the sense that canbe used on more than one type of element. In fact, comparison-based sortingalgorithms, as they are called, are as generic as possible, because if elements areto be sorted, then we must be able to compare them.

In this section, we will show that every comparison-based sorting algorithmhas a worst-case running time that’s Ω(n log n).

The execution of a comparison-based sorting algorithm on arrays of a partic-ular size can be represented by a decision tree. Each node in this tree correspondsto a particular ordering of the array and specifies a comparison to be performedon two elements of the array. The top node of the tree corresponds to the initialordering of the array and specifies the first comparison to be performed.

Depending on the outcome of that comparison, the algorithm either goesleft or right. The next node specifies a reordering of the elements and the nextcomparison. This goes on until a terminal node is reached. Such a node iscalled a leaf of the decision tree. Each leaf specifies a new final reordering ofthe elements but no new comparison. The algorithm halts when it reaches aleaf.

For example, consider the iterative version of insertion sort on an array ofsize 3. The top node corresponds to the initial ordering [x , y, z] of the array andspecifies that the first comparison should be y < x . If the answer is yes, then thenext node specifies the new ordering [y, x , z] as well as the next comparison:z < x . If the answer is yes again, then the next node specifies the ordering[y, z, x] and the comparison z < y . If the answer is no, then the next node is aleaf that specifies the same ordering [y, z, x]. The array is sorted.

There are two important observations to be made about decision trees that

120 CHAPTER 3. SORTING

represent sorting algorithms. The height of a tree is the maximum number ofsteps required to go from the top node of the tree to any of its leaves. The firstobservation is that the height of the tree provides a lower bound on the worst-case running time of the algorithm: if a tree has height h, then the algorithmmakes at least h comparisons in the worst case, so its running time is Ω(h).

Recall that our goal is to show that the worst-case running time of anycomparison-based sorting algorithm isΩ(n log n). Therefore, one way to achievethat would be to show that the height of these decision trees is Ω(n log n).

Now, consider arrays of size n that contain the numbers 1 through n in someorder. When the algorithm runs on one of these arrays, it will eventually reacha leaf that specifies how the elements of the array should be reordered for thearray to be sorted. But this reordering can only work for one array. For example,consider the reordering [y, z, x]. If an array containing 1,2, 3 is sorted by thisreordering, it must be that y = 1, z = 2 and x = 3. In other words, the onlyarray that can be sorted by this reordering is [3,1, 2].

This implies that the number of leaves must be greater than or equal to thenumber of possible arrays that contain the numbers 1 through n. And that num-ber is n!. This is the second important observation about these decision trees:they must contain at least n! leaves.

So these trees have lots of leaves. Our goal is to show that the height of thesetrees is large. Intuitively, it seems clear that shallow trees shouldn’t be able tohave that many leaves. We’re going to prove that this intuition is correct.

A binary tree is a tree in which every node has at most two nodes directlybelow it. Our decision trees are binary trees because all the comparisons haveonly two possible answers: yes or no.

Theorem 3.1 Every binary tree of height h has at most 2h leaves.

Proof By induction on the height of the tree. The base case is when the tree

3.8. A LOWER BOUND FOR COMPARISON-BASED SORTING 121

has height 0. We need to show that such a tree has at most one leaf because20 = 1. But a tree of height 0 has only one node and that node is both the topnode of the tree and its only leaf. So the result holds for trees of height 0.

The inductive step is for trees of height h≥ 1. Suppose that the result holdsfor every tree of height less than h. Consider a tree of height h. That treecontains two subtrees, one to the left of the top node and one to the right of thetop node. These subtrees are both of height at most h−1. This implies that theyeach contain at most 2h−1 leaves. Since every leaf of the original tree is a leaf ofone of these two subtrees, it must be that the number of leaves in the originaltree is at most 2h−12= 2h. ut

Now, suppose that a decision tree represents the execution of a comparison-based algorithm on arrays of size n. If h is the height of that tree, then thenumber of leaves the tree contains is both at most 2h and at least n!. This impliesthat

2h ≥ n!

Taking the log on both sides, we get that

h≥ log(n!)

Recall that n!= n(n− 1)(n− 2) · · ·2 · 1. Therefore,

h≥n∑

i=1

log i

We can get a lower bound on this summation by splitting it in half, as we’ve donebefore:

h≥n∑

i=dn/2e

log i ≥ (n− dn/2e+ 1) logdn/2e

122 CHAPTER 3. SORTING

Now, n/2≤ dn/2e ≤ n/2+ 1/2. In particular, this implies that

n− dn/2e+ 1≥ n− (n/2+ 1/2) + 1= n/2+ 1/2≥ n/2

Therefore,h≥ (n/2) log(n/2) = (n/2)(log n− 1)

This implies that h= Ω(n log n) and completes the proof of the following result:

Theorem 3.2 The worst-case running time of every (deterministic) comparison-based sorting algorithm is Ω(n log n).

Note that it is possible to extend this result to the average-case running timeof deterministic comparison-based sorting algorithms, as well as the expectedrunning time of randomized comparison-based sorting algorithms.

3.9 Sorting in Linear Time

In the previous section, we showed that comparison-based sorting algorithmscannot sort any faster than in time Ω(n log n). In this section, we will learnthat it is possible to sort faster than that when we make additional assumptionson the values to be sorted — and use on those values operations other thancomparisons.

Our first sorting algorithm assumes that the elements to be sorted are non-negative integers in the range [0, M] for some value M . For example, considerFigure 3.26. The first line gives the initial contents of array A. In this example,M = 5. The idea behind the algorithm is to count the number of occurrences ofeach possible element in A. We do this by using another array C in which C[x]will be used to count the number x ’s in A. Since the elements in A are in the

3.9. SORTING IN LINEAR TIME 123

A: 2 5 3 0 3 4 3 5

0 1 2 3 4 5C: 1 0 1 3 1 2

A: 0 2 3 3 3 4 5 5

Figure 3.26: A run of the counting sort algorithm

range [0,5], C needs to be of size 6 so it has indices ranging from 0 to 5. To dothe counting, we first initializes C to all zeros. Then, we scan A and, for eachelement, add one to the appropriate count. For example, the first element of Ais 2, so we add one to C[2]. Once the counts are computed, we scan C from leftto right and copy the correct number of copies each element to A. The result isshown on the last line of Figure 3.26.

This algorithm is called counting sort. It is described in pseudocode in Fig-ure 3.27. Note that the value M is given as argument to the algorithm. If it isnot available, it can of course be easily computed.

The analysis of counting sort is simple. It essentially consists of three loops.The first loop runs in time Θ(M). The second loop runs in time Θ(n), where n isthe number of elements to be sorted. The third loop is a nested loop. The bodyof the inner loop will run exactly once for each element in A, so the running timeof the third loop is also Θ(n). Therefore, the running time of counting sort isΘ(n+M).

If the numbers to be sorted are very small, for example, if M is O(n), thencounting sort will run in time O(n). But if M is, say, Ω(n2), then counting sortwill run in time Ω(n2). So counting sort is typically only useful if M is O(n).

Our second algorithm is a generalization of counting sort. It is very commonthat the values to be sorted consist of two parts, an element and a key, and that

124 CHAPTER 3. SORTING

counting_sort(a, i, j, M)

C = array of size M + 1initialize all of C to 0

for p = i to j−1++C[A[p]]

p = ifor x = 0 to M

repeat C[x] timesA[p] = x++p

Figure 3.27: The counting sort algorithm

those elements need to be sorted only with respect to the keys. For example,consider Figure 3.28. Each value in A consists of a number and a letter. Supposethat the numbers are the keys. As with counting sort, we assume that the keysare non-negative integers in the range [0, M] for some value M . In this example,we again have M = 5.

Now, we can’t simply count the number of occurrences of each key. We needto somehow keep track of which elements occur with each of the keys. This canbe done by replacing the array of counts by an array of lists. This is shown asarray B in Figure 3.28. The list B[x] will be used to record all the values thathave key x . To fill the lists, we simply scan A from left to right and add eachvalue to the appropriate list. We then scan B from left to right and copy thevalues to A.

The lists in array B are typically called buckets. For this reason, this algorithm

3.9. SORTING IN LINEAR TIME 125

A: 2A 5B 3C 0D 3E 4F 3G 5H

0 1 2 3 4 5B: 0D 2A 3C 4F 5B

3E 5H3G

A: 0D 2A 3C 3E 3G 4F 5B 5H

Figure 3.28: A run of the bucket sort algorithm

is called bucket sort. Pseudocode for the algorithm is given in Figure 3.29. Thisdescription assumes that the values to be sorted are objects with a data membercalled key.

The analysis of bucket sort is very similar to the analysis of counting sort. Iflinked lists are used, then adding an element to a list takes constant time. Inthis case, bucket sort also runs in time Θ(n+M). As was the case with countingsort, this will be linear if the keys are small, in the sense that M is O(n).

Bucket sort has an interesting additional property. As can be seen in Fig-ure 3.28, in the resulting array, elements with the same key occur in the sameorder that they occurred in the original array. This will always happen as longas we add elements to the end of the lists in B. A sorting algorithm with thisproperty is said to be stable. Mergesort can be easily implemented to be stable.Quicksort can too but it usually isn’t since quicksort implementations are usuallyoptimized for speed.

Exercises

3.9.1. Run counting sort on an array containing the following elements:

126 CHAPTER 3. SORTING

bucket_sort(a, i, j, M)

B = array of size M + 1initialize all of B to empty lists

for p = i to j−1add A[p] to list B[A[p].key]

p = ifor each list L in B

for each x in LA[p] = x++p

Figure 3.29: The bucket sort algorithm

2 3 5 6 3 4 3 4 1

Show the array of counts and the resulting sorted array, as in Figure 3.26.

3.9.2. Run bucket sort on an array containing the following elements:

5A 4B 2C 8D 1E 4F 3G 4H 1I

Each array element consists of a number and a letter. Sort according tothe numbers. Show the array of buckets and the resulting sorted array, asin Figure 3.28.

Chapter 4

Heaps

The main focus of this chapter is the heap data structure. Heaps can be usedto implement priority queues and they also form the basis of the heapsort algo-rithm.

4.1 Priority Queues

A priority queue is similar to a queue but with the difference that the nextelement removed from a priority queue is the one with the highest priority. Forexample, a priority queue could contain printing jobs that should be processedin order of highest priority. The basic priority queue operations are described inFigure 4.1.

Priority queues can be implemented in variety of ways. Table 4.1 gives somepossibilities. The first one is a sorted linked list. If the elements are in decreasingorder of priority, then the element with highest priority is at the front of thelist, which means it can be accessed and removed in constant time. But a newelement needs to be inserted into its correct position. This requires linear time.

127

128 CHAPTER 4. HEAPS

insert(e) Adds a copy of element e to the queue.

front() Returns the element with highest priority in the queue.

pop() Removes the element with highest priority from the queue.

Figure 4.1: Basic priority queue operations

insert front pop

Sorted linked list Θ(n) Θ(1) Θ(1)

Unsorted linked list Θ(1) Θ(n) Θ(n)

Heap Θ(log n) Θ(1) Θ(log n)

Balanced binary search tree Θ(log n) Θ(1) Θ(log n)

Table 4.1: Some possible implementations of priority queues

4.2. THE HEAP DATA STRUCTURE 129

To avoid linear-time insertions, we can use an unsorted linked list. A newelement can then be added at either end of the list in constant time. But nowa search is required to find the element with highest priority in the list. Thisrequires linear time.

Arrays would give similar running times. With a sorted array, we would putthe element with highest priority at the end of the array so it could be removedin constant time. With an unsorted array, the constant running time of insertwould be amortized since the array may need to grow.

So with both arrays and linked lists, some of the operations run in lineartime. There are at least two data structures that allow all the priority queueoperations to run in less than linear time.

In the next two sections, we will learn about a data structure called a heap.When a heap is used to implement a priority queue, insert and pop run inlogarithmic time while front runs in constant time. Heaps are usually imple-mented by using an array so the logarithmic running time of insert is amor-tized because, once again, that array may need to grow.

Another possibility is a balanced binary search tree, a data structure that wewill learn later in these notes. A balanced binary search tree implementationof priority queues achieves the same asymptotic running times as a heap. Butbalanced binary search trees are more complex than heaps because the supportmore operations. The result is that in practice, they run slower than heaps. (Oneexample of an operations supported by balanced binary search trees but not byheaps is a general find operation.)

4.2 The Heap Data Structure

Figure 4.2 shows an example of a heap. A heap is a binary tree that satisfies twoproperties:

130 CHAPTER 4. HEAPS

16

14

8 7

10

9

Figure 4.2: An example of a heap

1. All levels in a heap are full except possibly for the last one, which is thenfilled from left to right. (Such a binary tree is called nearly complete.)

2. Every node in a heap is greater than or equal to its children.

The first property is a structural property. The second property is called themax-heap property. It guarantees that the largest element is always at the top ofthe heap. A similar min-heap property would guarantee that the smallest elementis always at the top of the heap. Unless otherwise indicated, all our heaps willbe max-heaps.

Heaps can be implemented as a linked structure, just like a linked list. Eachelement would be stored in a node that would also contain three pointers: oneto the parent node, one to the left child and one to the right child.

But heaps are usually implemented with an array. This often requires lessspace and fewer memory allocations than a linked structure. For example, whena heap is used to implement a priority queue, once the queue has reached its peaksize, no further memory allocations will be needed with an array, while a linkedstructure would continue to require a memory allocation for every insert.

4.2. THE HEAP DATA STRUCTURE 131

When stored in an array, the top element of a heap is usually placed at index1. The remaining elements follow, level by level, from left to right. For example,the heap of Figure 4.2 would be stored in an array as follows:

16 14 10 8 7 9

The structural property of heaps then allows us to find the parent and chil-dren of every node as follows. Suppose that i is the index of an element. Then

left(i) = 2∗iright(i) = 2∗i + 1parent(i) = floor(i/2)

These operations can be implemented very efficiently by bit shifting.

The basic heap operations are similar to the priority queues operations:insert(e), top() and pop(). In the next section, we will learn that insertand pop can be implemented in logarithmic time while front can be imple-mented in constant time. As discussed in the previous section, this means thatheaps give us an efficient way of implementing priority queues.

Heaps can also be used as the basis of a sorting algorithm called heapsort.The idea is simple, to sort n elements, add them one by one to a heap and thenremove them one by one. Since the elements will be removed in decreasingorder, they will be sorted. Each insert and pop will be executed on a heapof size at most n. Therefore, each of those operations will run in time O(log n).This implies that the running time of heapsort is O(log n), which, as we know, isoptimal for comparison-based sorting algorithms. Heapsort will be discussed inmore detail later in this chapter.

132 CHAPTER 4. HEAPS

4.3 Heap Operations

In this section, we will learn how to carry out the basic heap operations. Thoseoperations will be given as operations on the array that stores the heap:

insert_heap(A, n, x)top_heap(A, n)pop_heap(A, n)

Each operation is given an array A and the number n of elements currently storedin the heap. We assume that the array is of size at least n+ 1 so that the heapelements can be stored at indices 1 through n. In the case of insert, we assumethat the array is of size at least n+2 so it is large enough to hold the new element.

These operations can be used to implement a class of heaps that would alsomanage the size of the array. As we will see in the next section, these operationscan also be used to implement heapsort.

The top operation is trivial to implement:

top_heap(A, n) return A[1]

The operations insert and pop operations are more interesting. Let’s startwith insert. Consider again the heap of Figure 4.2. Suppose we want to add20 to this heap. The idea is to first add the new element to the next availableposition in the heap, which is as the right child of 10, as shown in the first heapof Figure 4.3.

Now, 20 cannot stay there because its parent (10) now violates the heapproperty. So we swap 20 with its parent, as shown in the second heap of Fig-ure 4.3. Since 20 is larger than 10, we know that 20 is larger than its previoussibling (9) so the heap property holds at 20.

4.3. HEAP OPERATIONS 133

16

14

8 7

10

9 20

16

14

8 7

20

9 10

20

14

8 7

16

9 10

Figure 4.3: Inserting element 20 into a heap

134 CHAPTER 4. HEAPS

insert_heap(A, n, x)

i = n + 1A[i] = xj = parent(i)while (i > 1 and A[i] > A[j])

swap A[i] and A[j]i = jj = parent(i)

Figure 4.4: The insert_heap operation

But 20 cannot stay there either because it is again smaller than its new par-ent (16). So we again swap 20 with its parent, as shown in the third heap ofFigure 4.3. Since 20 is larger than 16, we know that 20 is larger than its previoussibling (14) so the heap property holds at 20. And the heap property also holdsat 16 — at its new location — because its children were already its descendantsin the original heap, which implies that they cannot be larger than 16.

To summarize, we add the new element at the next available position andthen keep moving it up until the new element is no longer larger than its parent.The complete algorithm is described in pseudocode in Figure 4.4.

We now turn to pop. Suppose we want to remove the top element of theheap of Figure 4.5. We start by replacing the top element by the last element ofthe heap (9), as shown in the first heap of Figure 4.6.

Now, 9 cannot stay there because it is smaller than its children. So we swap9 with its largest child (14), as shown in the second heap of Figure 4.6. Since14 was the larger than its previous sibling (10), the heap property holds at 14.

But 9 cannot stay there either because it is larger than one of its children.

4.3. HEAP OPERATIONS 135

16

14

8 12

10

9

Figure 4.5: Another heap

So we again swap 9 with its largest child (12), as shown in the third heap ofFigure 4.6. The heap property holds at 12 because, once again, it was the largersibling. And the heap property also holds at the new parent of 12 (14) because12 was already a descendant of its new parent in the original heap, which impliesthat it cannot be larger than that new parent.

To summarize, we replace the top element by the last element of the heap andthen keep moving it down until it is no longer smaller than either of its children.And every time we move the element down, we swap it with its largest child.

Figure 4.7 describes the pop operation in pseudocode. A new operationcalled heapify takes care of fixing the heap by moving the top element to itsproper position. The third argument of heapify is the index of the elementthat needs to be moved down.

The heapify operation is described in Figure 4.8. It assumes that the heaprooted at A[i] is valid expect that A[i] may be smaller than one (or both) ofits children. In this pseudocode, to test that A[i] is not a leaf, we can check ifit has a child by testing if left(i) is less than or equal to n, the index of thelast element of the heap.

136 CHAPTER 4. HEAPS

9

14

8 12

10

14

9

8 12

10

14

12

8 9

10

Figure 4.6: Fixing the heap

4.3. HEAP OPERATIONS 137

pop_heap(A, n)

A[1] = A[n]heapify(A, n−1, 1)

Figure 4.7: The pop_heap operation

heapify(A, n, i)

if (A[i] is not a leafand A[i] < at least one child)

j = index of largest child of A[i]swap A[i] and A[j]heapify(A, n, j)

Figure 4.8: The heapify operation

138 CHAPTER 4. HEAPS

build_heap(A, n)

for i = 2 to ninsert_heap(A, i−1, A[i])

Figure 4.9: Building a heap

4.4 Heapsort

The heap operations of the previous section can be used to describe the heapsortalgorithm mentioned earlier. First, assume that the n elements to be sorted arealready stored in an array starting at index 1. Heapsort then consists of twosteps: first, turn those elements into a heap; then, remove the elements one byone to get them in sorted order.

Figure 4.9 describes the first step in pseudocode. It simply consists of re-peated insertions.

Figure 4.10 describes the second step in pseudocode. It essentially consistsof repeated pop’s. The only difference is that the top element is not simplyremoved: it takes the place of the last element of the heap. The result is thatthe array will contain the elements in increasing order.

Figures 4.11 and 4.12 show a sample run of the sorting step. The originalheap is shown first. The heap is then shown after each iteration of the loop. Theelement that was removed from the heap and placed at the end of the array isshown in italics and linked to the rest of the tree by a dashed line.

4.5. BUILDING A HEAP IN LINEAR TIME 139

sort_heap(A, n)

for i = n to 2swap A[1] with A[i]heapify(A, i−1, 1)

Figure 4.10: Sorting a heap

4.5 Building a Heap in Linear Time

The first step of the heapsort algorithm is to build a heap out of the elementsgiven in an array. In the previous section, that step was carried out by an al-gorithm that simply performs repeated insertions. The running time of thatalgorithm is Θ(n log n). In this section, we will learn that a heap can be built inlinear time.

The idea is to start from the bottom. Consider the first tree in Figure 4.13. Itshows the contents of the array. The lines are dashed since the elements do notstill constitute a heap.

Now, the leaves in this tree are valid heaps of size 1. So we can run theheapify operation on the parents of those leaves, which are the Level 2 nodes.The result is the second tree of Figure 4.13. We can then repeat at the root ofthe tree, to produce the third tree of Figure 4.13.

Figure 4.14 describes the the algorithm in pseudocode. The loop starts atindex bn/2c because n is the index of the last element in A and bn/2c is theindex of its parent in a heap. That parent is the rightmost internal node of thetree and also the rightmost node of the second-to-last level that’s not a leaf.

140 CHAPTER 4. HEAPS

16

14

8 12

10

9

14

12

8 9

10

16

12

9

8 14

10

16

Figure 4.11: Sorting a heap

4.5. BUILDING A HEAP IN LINEAR TIME 141

10

9

12 14

8

16

9

8

12 14

10

16

8

9

12 14

10

16

Figure 4.12: Sorting a heap

142 CHAPTER 4. HEAPS

12

23

8 30

5

16

12

30

8 23

16

5

30

23

8 12

16

5

Figure 4.13: Building a heap from the bottom up

4.5. BUILDING A HEAP IN LINEAR TIME 143

build_heap(A, n)

for i = floor(n/2) to 1heapify(A, n, i)

Figure 4.14: Building a heap in linear time

144 CHAPTER 4. HEAPS

Chapter 5

Hash Tables

In this chapter, we start our exploration of map implementations. This chapterwill cover hash tables while balanced binary search trees will be covered in thefollowing chapters.

5.1 Maps

A map is a collection of pairs that each consist of a key and an element. Thekeys are used to identify and access the map elements. For example, a basicmap operation is to find the element that has a given key. The three basic mapoperations are described in Figure 5.1.

As an abstract data structure, maps are incredibly useful. For example, a mapcan store the entries of a phone book or address book using names as keys. A mapcan also store a dictionary: the keys would be words and the elements wouldbe definitions. Or students records: the elements would be student records andthe keys could be names or ID numbers. In fact, very often, as in the case ofstudent records, we may want multiple maps that each use a different key so

145

146 CHAPTER 5. HASH TABLES

find(k) Returns the element with key k in the map.

insert(k, e) Adds to the map a pair that consists of key k and ele-ment e.

erase(k) Removes the element with key k from the map.

Figure 5.1: Basic map operations

the elements can be accessed easily and efficiently in multiple ways.Maps can be restricted to have at most one element associated with each key,

or they can be allowed to have multiple elements associated with the same key.(If multiple elements are allowed, the find and erase map operations shownin Figure 5.1 need to be adjusted.) In these notes, we will focus on the single-element case but the map implementations we will learn can all be adapted tothe multiple-element case.

Maps can be implemented in variety of ways. Table 5.1 gives some possibil-ities. The first one is an unsorted linked list of pairs. In this case, searches anddeletions require a linear-time sequential search to find the element with thegiven key. Insertions require a scan of the list to ensure that no element with thegiven key is already in the map. An unsorted array would give the same runningtimes. (If multiple elements with the same key are allowed, then insertions canbe performed in constant time be simply adding the new element to the end ofthe list. In the case of an array, the running time would be amortized becausethe array may need to grow.)

Searches can be sped up by using a sorted array and the binary search algo-rithm. Deletions still take linear time because elements needs to be shifted tofill the hole left by the deleted element. (Insertions would now require lineartime even in the multiple-element case because pairs would have to be shifted

5.1. MAPS 147

find insert erase

Unsorted linked list or array Θ(n) Θ(n) Θ(n)

Sorted array Θ(log n) Θ(n) Θ(n)

Balanced binary search tree Θ(log n) Θ(log n) Θ(log n)

Hash table Θ(1) Θ(1) Θ(1)

Table 5.1: Some possible implementations of maps

to make room for the new one.)

There are two categories of data structures that can give us fast searches,insertions and deletions. One of them is the balanced binary search tree. Thiscategory of data structures includes, for example, red-black trees and AVL trees.Balanced binary search trees support all three operations in logarithmic time.

Another option is the hash table. With hash tables, all operations run inconstant time on average. In the case of insertions, this is an amortized runningtime because hash tables use an array that may need to grow. The worst-caserunning time of all three operations is linear. But the low average means thatthe total running time of the operations over a long run is linear. This comparesto Θ(n log n) for balanced binary search trees. Therefore, hash tables may be thebest option when the total running time needs to be minimized while balancedbinary search trees may be better when the running time of each operation mustbe guaranteed to be as low as possible. In these notes, we will learn both hashtables and balanced binary search trees.

Note that hash tables and balanced binary search trees can also be used toimplement sets since a set can be viewed simply as a map in which the elements

148 CHAPTER 5. HASH TABLES

find(x) Determines if element x is in the set.

insert(x) Adds element x to the set.

erase(x) Removes element x from the set.

Figure 5.2: Basic set operations

are the keys. The basic map operations can be reinterpreted as set operations asshown in Figure 5.2.

5.2 Direct-Address Tables

Before learning about hash tables, it’s important to note that in some specialsituations, it is possible to use a simple array to implement a map in a veryefficient way. Suppose, for example, that the keys are integers in the range[0, M) for some value M . Then the map elements can be stored in an array ofsize M in which the indices are used as keys. If the map contains no elementwith a given key, then that location in the array is marked as empty.

This implementation of a map is sometimes called a direct-address table.Its main advantage is that it allows all three map operations to run in constanttime, and that’s in the worst case, not just on average. However, if M is muchlarger than the number of elements to be stored in the map, then a direct-addresstable would waste a lot of memory space. As we will see in the next section, hashtables can be viewed as a way to address this problem.

In principle, direct-address tables can also be used with non-integer keys.The idea is to encode each key as a non-negative integer. For example, if the keysare strings, then a string can be associated to a number by using a numerical

5.3. THE HASH TABLE DATA STRUCTURE 149

code for each character and essentially putting those numbers side by side toform a larger number. If there are 32 = 24 possible character codes, then astring of length r can be encoded by a number in the range [0, 24r) so that adirect-address table of size 24r can be used. If r is very small, say r = 3, then24r = 212, or about 4,000, which may be reasonable. But if r is only slightlylarger, say r = 10, then 24r = 240, or about 1 trillion, which is almost certainlyexcessively large. Therefore, even though direct-address tables can be used withnon-integer keys, this is only practical is fairly limited situations.

An example in which a direct-address table would be efficient is when atime-intensive algorithm running on a graph (such as a network) needs frequentaccess to information about the nodes in that graph. If there are n nodes in thegraph, it may be worth relabeling the graph nodes 1 through n so they can thenbe stored in a direct-address table and accessed in constant time.

5.3 The Hash Table Data Structure

As mentioned in the previous section, hash tables can be viewed as a way toreduce the memory space used by a direct-address table. Suppose again thatthe keys are integers in the range [0, M) for some value M . And now considerusing a much smaller array.

Let m be the size of the array. What we now need is a way to map keys inthe range [0, M) to indices in the range [0, m). One simple strategy is to reducethe value of the key modulo m. That is, to divide the key by m and use theremainder as the index. This works because the remainder is guaranteed to bea number in the range [0, m).

A function that maps keys to indices is called a hash function. There aredifferent possible strategies for hash functions. The above idea is perhaps thesimplest one. It is sometimes called the division method.

150 CHAPTER 5. HASH TABLES

9A 19B 3C 0D 24E 11F 10G 16H

0 1 2 3 4 5 60D 2A 3C 11F 19B

16H 24E10G

Figure 5.3: A hash table of size 7 that uses separate chaining

Now, as you may have realized, it is possible that a hash function will map(or hash) two different keys to the same index. For example, if m= 7, then keys9 and 16 both map to index 2. This is called a collision. When working withhash tables, we need a strategy for dealing with collisions. Such a strategy isoften called a collision resolution strategy.

A simple collision resolution strategy is to simply store, at each location ofthe array, a list of all the elements that hashed to that location. For example,suppose that we need to store the elements shown at the top of Figure 5.3 ina hash table of size 7. The numbers are the keys while the letters representthe associated elements. Those elements would be stored in the hash table asindicated at the bottom of Figure 5.3.

The basic map operations are easy to implement with hash tables, as shownin Figure 5.4. In that (high-level) pseudocode, h is the hash function and A isthe array used by the hash table.

5.4 Analysis

Whether the lists are stored in arrays or linked lists, the running time of the hashtable operations depends on the length of the lists. In the worst case, it is always

5.4. ANALYSIS 151

find(k)

search list A[h(k)] for a pair with key kif found

return that pair

insert(k, e)

search list A[h(k)] for a pair with key kif not found

add pair (k, e) to the list

erase(k)

search list A[h(k)] for a pair with key kif found

delete that pair from the list

Figure 5.4: Map operations in a hash table implementation

152 CHAPTER 5. HASH TABLES

possible that all the elements hash to the same location in the table. This resultsin operations that run in linear time.

What about on average? Suppose that the hash function distributes elementsas evenly as possible. This is called uniform hashing. If n elements are in thetable, then the average length of the lists is n/m. This number is called the loadfactor of the table.

Intuitively, it seems clear that we want the load factor to be small. Thisimplies that as n grows, m will need to grow too. One possible strategy is tokeep the load factor below 1 so that the average length of the lists is at most1. This can be done, for example, by doubling m whenever n reaches m. Whenthat happens, a new array is created and all the elements are rehashed. Thetime spend rehashing the elements will be amortized constant, just like whenwe efficiently grow arrays.

Now, if the average length of the lists is n/m, it’s tempting to say that thehash table operations will search lists of average length n/m. This turns out notto be quite right. To analyze this correctly, it’s necessary to consider separatelysuccessful and unsuccessful searches.

If a search is unsuccessful, then the key we are searching for has an equalchance of hashing to any index. Therefore, the length of the list at that locationin the table is n/m, as we excepted.

But if a search is successful, things are slightly different because the keywon’t hash to any index: it will hash to the index of the list that contains theelement we are looking for. Apart from that element, the list contains some of theother elements. Since there are n− 1 other elements, and since those elementsare distributed evenly by the hash function, we can expect the list to contain(n − 1)/m other elements. Therefore, in total, the list contains 1 + (n − 1)/melements. This is equal to 1+n/m−1/m. When n is large, m is also large, whichimplies that 1/m is very small and that the number of elements is about 1+n/m.

5.4. ANALYSIS 153

For example, if n/m= 1, then, in an unsuccessful search, the average lengthof the list is 1 but in a successful search, it’s approximately 2.

If the elements inserted into a hash table are random, then the simple divisionhash function we considered in the previous section will spread the elementsevenly in the hash table. But, in practice, elements are usually not random.In that case, a division hash function may end up spreading the elements veryunevenly. There are hash functions that spread the elements evenly even if theelements are not random. The idea is to choose the hash function randomlyfrom a carefully constructed set of possible hash functions. This approach iscalled universal hashing.

We won’t cover universal hashing in these notes but even when using thesimple division hash function, there is something we can do to help with non-random elements.

As in the previous section, let m be the size of the array. Since h(k) is theremainder of the division of k by m, we have that k = qm + h(k), where q isthe quotient of the division of k by m. Now, suppose that d divides m and thatd also happens to divide k. Then d will also divide h(k). Therefore, if manykeys happen to be divisible by d, then all those keys will map to indices that aredivisible by d, resulting in lists that are longer than ideal.

For example, suppose that all the keys are multiples of 10. Then all theelements would hash to locations with indices that are multiples of 10. Thiswould result in only a tenth of the table being used. And the lists would be, onaverage, 10 times longer than ideal.

In practice, it is not unusual for many elements to have keys that are divisibleby the same number. One way to avoid this problem is to chose m to be a primenumber. If n is large, then m will be a large prime number and the chance thatmany elements are divisible by that number is probably low.

154 CHAPTER 5. HASH TABLES

Chapter 6

Binary Search Trees

6.1 The Data Structure

A binary search tree is a binary tree that satisfies the following order property:

For every node x in a binary search tree, every node in the left sub-tree of x has a key that’s no larger than x ’s, and every node in theright subtree of x has a key that’s no smaller than x ’s.

6.2 Basic Operations

The basic operations search, min, inorder and successor are describedin Figures 6.1 to 6.4.

155

156 CHAPTER 6. BINARY SEARCH TREES

search(x, k)

while (x != null and x−>key != k)if (k < x−>key)

x = x−>leftelse

x = x−>rightreturn x

Figure 6.1: The search operation

min(x)// x is not null.

while (x−>left != null)x = x−>left

return x

Figure 6.2: The min operation

inorder(x, visit)// visit is a unary function.

if (x != null)inorder(x−>left)visit(x−>element)inorder(x−>right)

Figure 6.3: The inorder operation

6.3. INSERTION AND DELETION 157

successor(x)// x is not null.

if (x−>right != null)return min(x−>right)

elsey = x−>parentwhile (y != null and x = y−>right)

x = yy = x−>parent

return y

Figure 6.4: The successor operation

6.3 Insertion and Deletion

The operations insert and erase are described in Figures 6.5 and 6.6. Thereplace_subtree operation used by erase is described in Figure 6.7.

158 CHAPTER 6. BINARY SEARCH TREES

insert(root, z)// z points to the node to be inserted.// That node is assumed to have all pointers set to// null.// root is passed by reference.

if (root = null)root = z

elseq = rootdone = falsewhile (not done)

if (z−>key < q−>key)if (q−>left = null)

q−>left = zdone = true

elseq = q−>left

elseif (q−>right = null)

q−>right = zdone = true

elseq = q−>right

z−>parent = q

Figure 6.5: The insert operation

6.3. INSERTION AND DELETION 159

erase(root, z)// root is passed by reference.

if (z−>left == null)replace_subtree(root, z, z−>right)

else if (z−>right == null)replace_subtree(root, z, z−>left)

elser = z−>righty = min(r)if (y != r)

replace_subtree(root, y, y−>right)y−>right = rr−>parent = y

replace_subtree(root, z, y)p = z−>lefty−>left = pp−>parent = y

z−>left = z−>right = null

Figure 6.6: The erase operation

160 CHAPTER 6. BINARY SEARCH TREES

replace_subtree(root, u, v)// Replaces subtree u by subtree v.// u is not null but v could be.// u’s parent pointer is set to null.// root is passed by reference.

if (u = root)root = vif (v != null)

v−>parent = nullelse

q = u−>parentif (u = q−>left)

q−>left = velse

q−>right = vif (v != null)

v−>parent = qu−>parent = null

Figure 6.7: The replace_subtree operation

Chapter 7

Red-Black Trees

7.1 The Data Structure

A full binary tree is a binary tree in which every node is either a leaf or hasexactly two children. A red-black tree is a full, non-empty binary search treethat satisfies the following properties:

1. Every node is either red or black.

2. The root is black.

3. The leaves are all black.

4. If a node is red, then it has two black children.

5. Every path from a node to any of its descendant leaves contains the samenumber of black nodes.

161

162 CHAPTER 7. RED-BLACK TREES

6. Data is stored only in internal (non-leaf) nodes.

The fact that a red-black tree is always non-empty implies that the smallestpossible red-black tree is a single black leaf that doesn’t store any data.

The black height of a node is the number of black nodes below that nodeon any path to any one of its descendant leaves. (This concept is well-definedbecause of Property 5 above.) The black height of a red-black tree is the blackheight of its root.

We are interested in red-black trees because they are guaranteed to be oflogarithmic height:

Theorem 7.1 A red-black tree with n internal nodes has height at most 2 log(n+1).

Proof If n = 0, then the height of the tree is 0 and this is at most 2 log 1 = 0.So, suppose that a red-black tree has n ≥ 1 internal nodes. Let b be the blackheight of the tree. Since n≥ 1, b ≥ 1. Every path from the root to a leaf containsexactly b+1 black nodes. This implies that no leaf can occur in the first b levelsof the tree. In other words, the first b levels of the tree contain only internalnodes.

Now, consider those b levels. The first one contains the root of the tree soit is complete. The second level is complete because the root has two children.The third level is complete because the previous one is complete and containsinternal nodes that each have two children. And so on. This shows that the firstb levels of the tree are complete.

Since those levels contain only internal nodes, we get a lower bound on thenumber of internal nodes in the tree:

n≥ 1+ 2+ · · ·+ 2b−1 = 2b − 1

7.2. BASIC OPERATIONS 163

Therefore, b ≤ log(n+ 1).Now, let h be the height of the tree and consider one of the longest paths in

the tree. This path has length h so it contains h + 1 nodes. Since a red nodecannot have a red child, and since the root and all the leaves are black, this pathcontains b + 1 black nodes and at most b red nodes. Therefore, h+ 1 ≤ 2b + 1and h≤ 2b ≤ 2 log(n+ 1). ut

7.2 Basic Operations

Since red-black trees are binary search trees, in principle, the basic binary searchtree operations search, min, inorder and successor can also be used onred-black trees. However, these operations need to be modified to take intoaccount the fact that elements are stored only in internal nodes. The resultingalgorithms are shown in Figures 7.1 to 7.4. In all cases, the argument x is apointer to a node that’s assumed not to be null. In some cases, as documentedin the pseudocode, that pointer is also assumed to point to an internal node (onethat contains data).

7.3 Insertion

To insert a new node into a red-black tree, we start by inserting the node as ifthe tree was just a plain binary search tree. And we make that new node red.We also need to give that node two black leaves as children.

The insertion itself can be performed by the same algorithm we used forbinary search trees, modified to account for the fact that leaves do not holdelements. The modified algorithm is shown in Figure 7.5.

Once the new node is inserted, the tree is guaranteed to be a valid red-black

164 CHAPTER 7. RED-BLACK TREES

search(x, k)

while (x is not a leaf and x−>key != k)if (k < x−>key)

x = x−>leftelse

x = x−>rightif (x is a leaf)

return nullelse

return x

Figure 7.1: The search operation

min(x)// x is an internal node.

while (x−>left is not a leaf)x = x−>left

return x

Figure 7.2: The min operation

7.3. INSERTION 165

inorder(x, visit)// visit is a unary function.

if (x is an internal node)inorder(x−>left)visit(x−>element)inorder(x−>right)

Figure 7.3: The inorder operation

successor(x)// x is an internal node.

if (x−>right is not a leaf)return min(x−>right)

elsey = x−>parentwhile (y != null and x = y−>right)

x = yy = x−>parent

return y

Figure 7.4: The successor operation

166 CHAPTER 7. RED-BLACK TREES

insert(root, z)// z points to the node to be inserted.// That node is assumed to be red and to have two black// leaves as children.// root is passed by reference.

if (root is a leaf)root = z

elseq = rootdone = falsewhile (not done)

if (z−>key < q−>key)if (q−>left is a leaf)

q−>left = zdone = true

elseq = q−>left

elseif (q−>right is a leaf)

q−>right = zdone = true

elseq = q−>right

z−>parent = qinsert_fix(root, z)

Figure 7.5: The insert operation

7.3. INSERTION 167

x

α y

β γ

Figure 7.6: A red-black tree

y

x

α β

γ

Figure 7.7: The red-black tree of Figure 7.6 left-rotated at x

tree except for the fact that z’s parent may also be red. The key to fixing thisproblem is an operation called a rotation. Consider the tree shown in Figure 7.6.(In this picture, x and y are pointers to nodes, not the values stored in thosenodes. And α, β and γ represent the subtrees of those two nodes.) A left rotationat x produces the tree shown in Figure 7.7. And a right rotation at y on the treeof Figure 7.7 produces the tree shown in Figure 7.6.

It is easy to verify that if the original tree is a red-black tree, then a rotationproduces a tree that’s still a valid binary search tree. For example, with the above

168 CHAPTER 7. RED-BLACK TREES

left rotation, we already had that all the nodes in α were less than x, which wasless than all the nodes in β , which were less than y, which was less than all thenodes in γ.

But after a rotation is performed in a red-black tree, the resulting tree maynot be a valid red-black tree. For two reasons. Consider the tree of Figure 7.6.It is possible for x and y’s left child (the root of β) to both be red. In that case,after the rotation, as shown in Figure 7.7, x will be red and have a red child.The other way in which the resulting tree may not be a valid red-black tree isthat it may no longer satisfy the black-height property.

Therefore, after performing a rotation in a red-black tree, we may need toadjust the colors of some nodes to ensure that the resulting tree is still a validred-black tree.

The complete algorithm for fixing a red-black tree after an insertion is shownin Figure 7.8. This algorithm is called as the last step of the insert algorithmof Figure 7.5.

Figure 7.9 gives pseudocode for a left rotation. This pseudocode simplymoves nodes without adjusting their colors. Any necessary adjustment is doneby the algorithm insert_fix.

7.4 Deletion

To delete a node from a red-black tree, we also start by deleting it as if the treewas just a plain binary search tree. This can be done by the same algorithm weused for binary search trees, once again modified to account for the fact thatleaves do not hold elements. But once the node is deleted, we need to makesure that the number of black nodes on any path from the root to any of theleaves remains the same (the black-height property). One way to deal with thisis to (temporarily) add an extra color to one of the nodes. In the pseudocode of

7.4. DELETION 169

insert_fix(root, z)// z is red and its parent may be red.// root is passed by reference.

if (z is the root)make it blackreturn

if (z’s parent is black)return

if (z’s uncle is red) // case 1change the colors of z’s parent, grandparent

and uncleinsert_fix(root, z’s grandparent)

else // z’s uncle is blackif (z, its parent and grandparent are not

in a straight line) // case 3y = z−>parentrotate at y to make z move upz = y

// z, its parent and grandparent are now in a// straight line (case 2)rotate at z’s grandparent to make z move upchange the colors of z’s parent and new sibling

Figure 7.8: The operation insert_fix

170 CHAPTER 7. RED-BLACK TREES

left_rotate(root, x)// x is the node where the rotation will be performed

y = x−>rightq = x−>parent

p = y−>leftx−>right = pp−>parent = x

y−>left = xx−>parent = y

y−>parent = qif (q = null) // x was the root

root = yelse if (x = q−>left)

q−>left = yelse

q−>right = y

Figure 7.9: Pseudocode for a left rotation

7.4. DELETION 171

Figure 7.10, node x is the one with the extra color.Next, we need to deal with the extra color. If that extra color is red, then

there’s nothing to do because we can just ignore it and leave x with its origi-nal color. If the extra color is black, then x is either black-black or red-black.The complete algorithm for fixing a red-black tree after a deletion is shown inFigure 7.11. This algorithm is called as the last step of the erase algorithm ofFigure 7.10.

172 CHAPTER 7. RED-BLACK TREES

erase(root, z)// z points to the node to be deleted.// This node is assumed to be an internal node.// root is passed by reference.

if (z−>left is a leaf)x = z−>rightextra_color = z−>colorreplace_subtree(root, z, x)

else if (z−>right is a leaf)x = z−>leftextra_color = z−>colorreplace_subtree(root, z, x)

elser = z−>righty = min(r)x = y−>rightextra_color = y−>colorif (y != r)

replace_subtree(root, y, x)y−>right = rr−>parent = y

replace_subtree(root, z, y)y−>color = z−>colorp = z−>lefty−>left = pp−>parent = y

z−>left = z−>right = nullif (extra_color = black)

erase_fix(root, x)

Figure 7.10: The erase operation

7.4. DELETION 173

erase_fix(root, x)// x has two colors and one is black// root is passed by reference.

if (x is red−black or x is the root)make x blackreturn

// x is black−blackif (x’s sibling is red) // case 4

rotate at x’s parent to make x move downchange the colors of x’s parent and new

grandparent// x’s sibling is now blackif (x’s nieces are both black) // case 1

make x blackmake x’s sibling redadd an extra black to x’s parenterase_fix(root, x’s parent)

else // one of x’s nieces is redif (x’s far niece is black) // case 3

rotate at x’s sibling to make x’s farniece move down

change the colors of x’s new sibling andnew far niece

// x’s far niece is now red (case 2)rotate at x’s parent to make x move downmake x and its new uncle blackswap the colors of x’s parent and new

grandparent

Figure 7.11: The operation erase_fix

174 CHAPTER 7. RED-BLACK TREES

Index

O notation, 16Ω notation, 16Θ notation, 10, 16ω notation, 16o notation, 16

asymptoticanalysis, 11equivalence, 11relations, 16

balanced binary search tree, 147Big-Oh notation (O), 16Big-Omega notation (Ω), 16binary search, 36binary search tree, 155binary tree

full, 161black height, 162bucket sort, 125

collision, 150computational complexity, 91

counting sort, 123

deterministic algorithm, 96direct-address table, 148divide and conquer, 81

expected running time, 100

full binary tree, 161

hash function, 149hash table, 147heap, 129

in-place algorithm, 106insertion sort, 74

last-call optimization, 58little-oh notation (o), 16little-omega notation (ω), 16load factor, 152

map, 145mathematical induction, 46

175

176 INDEX

mergesort, 79

priority queue, 127proof by induction, 45

quicksort, 92

randomized algorithm, 96recurrence relation, 60recursion, 43recursion tree, 87red-black tree, 161rotation, 167running time

asymptotic, 11average-case, 33worst-case, 32

Selection Problem, 109selection sort, 67set, 147stable sorting algorithm, 125strong induction, 46

tail recursion, 58Theta notation (Θ), 10, 16tree

binary search, 155red-black, 161

uniform hashing, 152

universal hashing, 153