algorithms instructor: ming li david r. cheriton school of computer science university of waterloo...

Algorithms

Instructor: Ming LiDavid R. Cheriton School of Computer Science

University of Waterloohttp://www.cs.uwaterloo.ca/~cs341/

CS341, Winter, 2011

The last ½ century has witnessed the development of a beautiful and elegant new scientific field: the design and analysis of algorithms.

This course will teach you how to “design” and “analyze” algorithms.

History, Nature, …

The name of “algorithm” comes from al-Khowarizmi, 9th century Persian mathematician and astronomer who wrote a book, eventually translated into Latin in the 12th century as Algoritmi de numero Indorum ("al-Khowarizmi's book on Indian numbers"), that gave our modern pencil-and-paper algorithms for addition, subtraction, multiplication, and division.

Every algorithm should have: 1. Input, 2. Output, 3.Deterministic 4. Effective, 5. Finite. Cook book.

http://pages.cs.wisc.edu/~roy/ApplePie/long.html We did not invent algorithms. Nature did: DNA. Shellsort: http://portal.acm.org/citation.cfm?

doid=368370.368387

Why study algorithms

Many aspects of “programming” are more important than algorithms: user-friendliness, correctness, extensibility,

maintainability, simplicity, scalability, cost, …

But good algorithm is fundamental to all of above.

Warm up: Insertion-sort Reference: CLR, Section 2.1. INSERTION-SORT (A)

1 for j := 2 to n do 2 key := A[j] 3 i := j-1 4 while (i>0) and (A[i] > key) do 5 A[i+1] := A[i] 6 i := i-1 7 A[i+1] := key

What's the worst-case running-time of this algorithm? The worst-case is when the numbers are in decreasing order. In that case, lines 1-3 and line 7 take n-1 steps each, while the loop in line 4-6 takes (1+2+...+n-1) = n(n-1)/2 steps for each line. The total cost is therefore 4(n-1) + 3n(n-1)/2, which is Θ(n2 ).

To appreciate this, consider sorting a file of social insurance numbers for all Canadians (so n = 34,000,000). Then n2 is about 1015, so even with a computer that can do a billion operations per second, this will take 106 seconds, or about 11 days. On the other hand, an algorithm running in n lg n time will take only about a second on the same file. Algorithms matter!

Maximum sub-range sum More warm-up: consider the problem of the maximum sub-range sum. Here we

are given an array x[1..n] of integers (possibly negative) and we want to find the subblock (contiguous entries) with the largest sum. Applications to biology (find stretches of DNA's with more G's and C's than A's and T's). We will see some different algorithms for this problem. Note: throughout the course we will use the notation x[m..n] to denote the substring of length n-m+1 starting at position m and ending at position n.

First try: enumerate all possible sub-blocks of the array: Maxsubrangesum1(x,n);

max := 0; for l := 1 to n do for u := l to n do sum := 0 for i := l to u do sum := sum+x[i]; if sum > max then max := sum; return(max);

Each loop is O(n), so the total time is O(n3), and one can also prove that it is Θ(n3). Can we do better?

Maximum sub-range sum: second try

Yes, if we recognize that the inner loop is doing a lot of re-computation each time:

Maxsubrangesum2(x,n); max := 0; for l := 1 to n do sum := 0; for u := l to n do sum := sum + x[u]; if (sum > max) then max := sum; return(max);

Here we've improved our algorithm to Θ(n2) by recognizing that the subrange sum for x[l..u+1] can be obtained from the subrange sum fo x[l..u] by simply adding x[u+1] to it.

Maximum sub-range sum: third try There's another approach that also produces a Θ(n2) algorithm: namely,

first do some pre-computation to produce the "running-sum" array rs, in which rs[i] is just x[1]+x[2]+...+x[i]. Then to find the subrange sum of x[l..u], just do rs[u]-rs[l-1], which costs only one subtraction. This gives the following algorithm:

Maxsubrangesum3(x,n); rs[0] : = 0; for i := 1 to n do rs[i] := rs[i-1] + x[i]; // prefix sum max := 0; for l := 1 to n do for u := l to n do sum := rs[u] - rs[l-1]; if sum > max then max := sum; return(max);

Maximum sub-range sum: 4th try Now let's try to apply an algorithmic paradigm to this problem: divide-and-conquer. We'll break the

array up into two pieces of equal size, find the maximum subrange sum in each part, and then take the maximum. But we also need to consider subranges that straddle the midpoint. Such a subrange has a maximum sum if it is formed by a maximum-sum suffix of the first half, and a maximum-sum prefix of the second half.

Maxsubrangesum4(x,l,u); if l > u then return(0); if l = u then return(max(0,x[l])); m := floor( (l+u)/2 ); suml := 0; maxleft := 0; for i := m downto l do suml := suml + x[i]; maxleft := max(maxleft,suml); // max suffix of the left sumr := 0; maxright := 0; for i := m+1 to u do sumr := sumr + x[i]; maxright := max(maxright,sumr); // max prefix of the right maxa := maxsubrangesum4(x, l, m); maxb := maxsubrangesum4(x, m+1, u); return(max(maxa, maxb, maxleft+maxright));

To start, call Maxsubrangesum4(x,1,n). Letting T(n) denote the time needed for n items, this algorithm's running time satisfies the recurrence T(n) = 2T(n/2) + O(n). so T(n) = O(n log n). Much better! Can we even do better? Next time we'll see a linear-time algorithm.

Take home message

From the sub-range problem, you can see that a problem often has many algorithmic solutions: some are good, some are bad …

Good algorithms are beautiful.

algorithms instructor: ming li david r. cheriton school of computer science university of waterloo...

Documents