Warmup Sorting Selection Closest pair Multiplication FFT
CSE 548: (Design and) Analysis of AlgorithmsDivide-and-conquer Algorithms
R. Sekar
1 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation
Divide-and-Conquer: A versatile strategy
Steps
Break a problem into
subproblems that are
smaller instances of the
same problem
Recursively solve these
subproblems
Combine these answers
to obtain the solution to
the problem
Benefits
Conceptual simplification
Speed up:
rapidly (exponentially) reduce problem
space
exploit commonalities in subproblem
solutions
Parallelism: Divide-and-conquer algorithms
are amenable to parallelization
Locality: Their depth-first nature increases
locality, extremely important for today’s
processors.
2 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation
Topics
1. Warmup
Overview
Search
H-Tree
Exponentiation
2. Sorting
Mergesort
Recurrences
Fibonacci
Numbers
Quicksort
Lower Bound
Radix sort
3. Selection
Select k-th min
Priority Queues
4. Closest pair
5. Multiplication
Matrix
Multiplication
Integer
multiplication
6. FFT
Fourier
Transform
DFT
FFT Algorithm
Fast
multiplication
3 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation
Binary Search
Problem: Find a key k in an ordered collection
Examples: Sorted array A[n]: Compare k with A[n/2], then
recursively search in A[0 · · · (n/2− 1)] (if k < A[n/2]) or
A[n/2 · · · n] (otherwise)
Binary search tree T : Compare k with root(T ), based on the
result, recursively search left or right subtree of root.
B-Tree: Hybrid of the above two. Root stores an array M of m
keys, and has m + 1 children. Use binary search on M to
identify which child can contain k, recursively search that
subtree.
4 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation
H-tree: Planar embedding of full binary tree
Key properties
fractal geometry —divide-and-conquer structure
n nodes in O(n) area
all root-to-leaf paths equal inlength
Applications
compact embedding of binarytrees in VLSI
hardware clock distribution
5 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation
Divide-and-conquer Construction of H-tree
MkHtree(l, b, r, t, n)
horizLine(b+t2 , l + l+r4 , r −
l+r4 )
vertLine(l + l+r4 , b + b+t
4 , t −b+t4 )
vertLine(r − l+r4 , b + b+t
4 , t −b+t4 )
if n ≤ 4 return
MkHtree(l, b+t2 ,l+r2 , t,
n4 )
MkHtree( l+r2 ,b+t2 , r, t,
n4 )
MkHtree(l, b, l+r2 ,b+t2 ,
n4 )
MkHtree( l+r2 , b, r,b+t2 ,
n4 )
6 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation
Divide-and-conquer Construction of H-tree
Questions
How compact is the embedding
Ratio of minimum distance
between nodes and the average
area per node
What is the root-to-leaf path
length?
Can we do better?
Finally, how can we show that
the algorithm is correct?
MkHtree(l, b, r, t, n)
horizLine(b+t2 , l + l+r4 , r −
l+r4 )
vertLine(l + l+r4 , b + b+t
4 , t −b+t4 )
vertLine(r − l+r4 , b + b+t
4 , t −b+t4 )
if n ≤ 4 return
MkHtree(l, b+t2 ,l+r2 , t,
n4 )
MkHtree( l+r2 ,b+t2 , r, t,
n4 )
MkHtree(l, b, l+r2 ,b+t2 ,
n4 )
MkHtree( l+r2 , b, r,b+t2 ,
n4 )
7 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation
Exponentiation
How many multiplications are required to compute xn?
Can we use a divide-and-conquer approach to make it faster?
ExpBySquaring(n, x)
if n > 1
y = ExpBySquaring(bn/2c, x2)if odd(n) y = x ∗ yreturn y
else return x
8 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Merge Sort
Binary searchThe ultimate divide-and-conquer algorithm is, of course, binary search: to find a key k in alarge file containing keys z[0, 1, . . . , n− 1] in sorted order, we first compare k with z[n/2], anddepending on the result we recurse either on the first half of the file, z[0, . . . , n/2 − 1], or onthe second half, z[n/2, . . . , n− 1]. The recurrence now is T (n) = T (n/2)+ O(1), which is thecase a = 1, b = 2, d = 0. Plugging into our master theorem we get the familiar solution: arunning time of just O(log n).
2.3 MergesortThe problem of sorting a list of numbers lends itself immediately to a divide-and-conquerstrategy: split the list into two halves, recursively sort each half, and then merge the twosorted sublists.
function mergesort(a[1 . . . n])Input: An array of numbers a[1 . . . n]Output: A sorted version of this array
if n > 1:return merge(mergesort(a[1 . . .n/2]), mergesort(a[n/2+ 1 . . . n]))
else:return a
The correctness of this algorithm is self-evident, as long as a correct merge subroutine isspecified. If we are given two sorted arrays x[1 . . . k] and y[1 . . . l], how do we efficiently mergethem into a single sorted array z[1 . . . k + l]? Well, the very first element of z is either x[1] ory[1], whichever is smaller. The rest of z[·] can then be constructed recursively.
function merge(x[1 . . . k], y[1 . . . l])if k = 0: return y[1 . . . l]if l = 0: return x[1 . . . k]if x[1] ≤ y[1]:return x[1] merge(x[2 . . . k], y[1 . . . l])
else:return y[1] merge(x[1 . . . k], y[2 . . . l])
Here denotes concatenation. This merge procedure does a constant amount of work perrecursive call (provided the required array space is allocated in advance), for a total runningtime of O(k + l). Thus merge’s are linear, and the overall time taken by mergesort is
T (n) = 2T (n/2) + O(n),
or O(n log n).
Looking back at the mergesort algorithm, we see that all the real work is done in merg-ing, which doesn’t start until the recursion gets down to singleton arrays. The singletons are
56
9 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Merge Sort (Continued)
Binary searchThe ultimate divide-and-conquer algorithm is, of course, binary search: to find a key k in alarge file containing keys z[0, 1, . . . , n− 1] in sorted order, we first compare k with z[n/2], anddepending on the result we recurse either on the first half of the file, z[0, . . . , n/2 − 1], or onthe second half, z[n/2, . . . , n− 1]. The recurrence now is T (n) = T (n/2)+ O(1), which is thecase a = 1, b = 2, d = 0. Plugging into our master theorem we get the familiar solution: arunning time of just O(log n).
2.3 MergesortThe problem of sorting a list of numbers lends itself immediately to a divide-and-conquerstrategy: split the list into two halves, recursively sort each half, and then merge the twosorted sublists.
function mergesort(a[1 . . . n])Input: An array of numbers a[1 . . . n]Output: A sorted version of this array
if n > 1:return merge(mergesort(a[1 . . .n/2]), mergesort(a[n/2+ 1 . . . n]))
else:return a
The correctness of this algorithm is self-evident, as long as a correct merge subroutine isspecified. If we are given two sorted arrays x[1 . . . k] and y[1 . . . l], how do we efficiently mergethem into a single sorted array z[1 . . . k + l]? Well, the very first element of z is either x[1] ory[1], whichever is smaller. The rest of z[·] can then be constructed recursively.
function merge(x[1 . . . k], y[1 . . . l])if k = 0: return y[1 . . . l]if l = 0: return x[1 . . . k]if x[1] ≤ y[1]:return x[1] merge(x[2 . . . k], y[1 . . . l])
else:return y[1] merge(x[1 . . . k], y[2 . . . l])
Here denotes concatenation. This merge procedure does a constant amount of work perrecursive call (provided the required array space is allocated in advance), for a total runningtime of O(k + l). Thus merge’s are linear, and the overall time taken by mergesort is
T (n) = 2T (n/2) + O(n),
or O(n log n).
Looking back at the mergesort algorithm, we see that all the real work is done in merg-ing, which doesn’t start until the recursion gets down to singleton arrays. The singletons are
56
10 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Merge Sort Illustration
Figure 2.4 The sequence of merge operations in mergesort.
2 3 10 1 6 7 135
102 53 137 1 6
2 5 3 7 13 1 610
Input: 10 2 3 1135 7 6
1 6 10 1332 5 7 .
merged in pairs, to yield arrays with two elements. Then pairs of these 2-tuples are merged,producing 4-tuples, and so on. Figure 2.4 shows an example.
This viewpoint also suggests how mergesortmight be made iterative. At any given mo-ment, there is a set of “active” arrays—initially, the singletons—which are merged in pairs togive the next batch of active arrays. These arrays can be organized in a queue, and processedby repeatedly removing two arrays from the front of the queue, merging them, and puttingthe result at the end of the queue.
In the following pseudocode, the primitive operation inject adds an element to the endof the queue while eject removes and returns the element at the front of the queue.
function iterative-mergesort(a[1 . . . n])Input: elements a1, a2, . . . , an to be sorted
Q = [ ] (empty queue)for i = 1 to n:
inject(Q, [ai])while |Q| > 1:
inject(Q,merge(eject(Q),eject(Q)))return eject(Q)
57
11 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Merge sort time complexity
mergesort(A) makes two recursive invocations of itself, each with
an array half the size of A
merge(A, B) takes time that is linear in |A|+ |B|
Thus, the runtime is given by the recurrence
T (n) = 2T(n2
)+ n
In divide-and-conquer algorithms, we often encounter recurrences
of the form
T (n) = aT(nb
)+ O(nd)
Can we solve them once for all?12 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Master Theorem
If T (n) = aT(nb
)+O(nd) for constants a > 0, b > 1, and d ≥ 0, then
T (n) =
O(nd), if d > logb a
O(nd log n) if d = logb a
O(nlogb a) if d < logb a
13 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Proof of Master TheoremFigure 2.3 Each problem of size n is divided into a subproblems of size n/b.
Size 1
Size n/b2
Size n/b
Size n
Depthlogb n
Width alogb n = nlogb a
Branching factor a
subproblems, each of size n/bk (Figure 2.3). The total work done at this level is
ak × O n
bk
d= O(nd) ×
a
bd
k.
As k goes from 0 (the root) to logb n (the leaves), these numbers form a geometric series withratio a/bd. Finding the sum of such a series in big-O notation is easy (Exercise 0.2), and comesdown to three cases.
1. The ratio is less than 1.Then the series is decreasing, and its sum is just given by its first term, O(nd).
2. The ratio is greater than 1.The series is increasing and its sum is given by its last term, O(nlogb a):
nd a
bd
logb n= nd
alogb n
(blogb n)d
= alogb n = a(loga n)(logb a) = nlogb a.
3. The ratio is exactly 1.In this case all O(log n) terms of the series are equal to O(nd).
These cases translate directly into the three contingencies in the theorem statement.
55
Can be proved by induction, or by summing up the series where
each term represents the work done at one level of this tree.14 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
What if Master Theorem can’t be appplied?
Guess and check (prove by induction)
expand recursion for a few steps to make a guess
in principle, can be applied to any recurrence
Akra-Bazzi method (not covered in class)
recurrences can be much more complex than that of Master theorem
15 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
More on time complexity: Fibonacci Numbers
function fib1(int n)
if n = 0 return 0;
if n = 1 return 1;
return fib1(n− 1) + fib1(n− 2)
Is this algorithm correct? Yes: follows the defintion of Fibonacci
What is its runtime?
T (n) = T (n− 1) + T (n− 2) + 3, with T (k) ≤ 2 for k < 2Solution is an exponential function . . .Prove this by induction!
Can we do better?
16 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Structure of calls to fib1Figure 0.1 The proliferation of recursive calls in fib1.
Fn−3
Fn−1
Fn−4
Fn−2
Fn−4
Fn−6Fn−5Fn−4
Fn−2 Fn−3
Fn−3 Fn−4 Fn−5Fn−5
Fn
As with fib1, the correctness of this algorithm is self-evident because it directly uses thedefinition of Fn. How long does it take? The inner loop consists of a single computer step andis executed n − 1 times. Therefore the number of computer steps used by fib2 is linear in n.From exponential we are down to polynomial, a huge breakthrough in running time. It is nowperfectly reasonable to compute F200 or even F200,000.1As we will see repeatedly throughout this book, the right algorithm makes all the differ-
ence.
More careful analysis
In our discussion so far, we have been counting the number of basic computer steps executedby each algorithm and thinking of these basic steps as taking a constant amount of time.This is a very useful simplification. After all, a processor’s instruction set has a variety ofbasic primitives—branching, storing to memory, comparing numbers, simple arithmetic, andso on—and rather than distinguishing between these elementary operations, it is far moreconvenient to lump them together into one category.But looking back at our treatment of Fibonacci algorithms, we have been too liberal with
what we consider a basic step. It is reasonable to treat addition as a single computer step ifsmall numbers are being added, 32-bit numbers say. But the nth Fibonacci number is about0.694n bits long, and this can far exceed 32 as n grows. Arithmetic operations on arbitrarilylarge numbers cannot possibly be performed in a single, constant-time step. We need to auditour earlier running time estimates and make them more honest.We will see in Chapter 1 that the addition of two n-bit numbers takes time roughly propor-
tional to n; this is not too hard to understand if you think back to the grade-school procedure1To better appreciate the importance of this dichotomy between exponential and polynomial algorithms, the
reader may want to peek ahead to the story of Sissa and Moore, in Chapter 8.
14
Complete binary tree of depth n, contains 2n calls to fib1
But there are only n distinct Fibonacci numbers being computed!
Each Fibonacci number computed an exponential number of times!
17 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Improved Algorithm for Fibonacci
function fib2(n)
int f [max(2, n + 1)];
f [0] = 0; f [1] = 1;
for (i = 2; i ≤ n; i++)
f [i] = f [i − 1] + f [i − 2];
return f [n]
Linear-time algorithm!
But wait! We are operating on very large numbersnth Fibonacci number requires approx. 0.694n bitsProve this by induction!
Operation on k-bit numbers require k operations
i.e., Computing Fn requires 0.694n log n operations18 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Quicksort
qs(A, l, h) /*sorts A[l . . . h]*/
if l >= h return;
(h1, l2) =
partition(A, l, h);
qs(A, l, h1);
qs(A, l2, h)
partition(A, l, h)
k = selectPivot(A, l, h); p = A[k];
swap(A, h, k);
i = l − 1; j = h;
while true do
do i++ while A[i] < p;
do j−− while A[j] > p;
if i ≥ j break;
swap(A, i, j);
swap(A, i, h)
return (j, i + 1)
19 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Analysis of Runtime of qs
General case: Given by the recurrence T (n) = n + T (n1) + T (n2)
where n1 and n2 are the sizes of the two sub-arrays after partition.
Best case: n1 = n2 = n/2. By master theorem, T (n) = O(n log n)
Worst case: n1 = 1, n2 = n− 1. By master theorem, T (n) = O(n2)
A fixed choice of pivot index, say, h, leads to worst-case behavior incommon cases, e.g., input is sorted.
Lucky/unlucky split: Alternate between best- and worst-case splits.
T (n) = n + T (1)+ T(n-1) + n (worst case split)
= n + 1+ (n-1) + 2T((n-1)/2) = 2n + 2T ((n− 1)/2)
which has an O(n log n) solution.
Three-fourths split:
T (n) = n + T (0.25n) + T (0.75n) ≤ n + 2T (0.75n) = O(n log n)
20 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Average case analysis of qs
Define input distribution: All permutations equally likely
Simplifying assumption: all elements are distinct. (Nonessential assumption)
Set up the recurrence: When all permutations are qually likely, the selected
pivot has an equal chance of ending up at the i th position in the
sorted order, for all 1 ≤ i ≤ n. Thus, we have the following recurrence
for the average case:
T (n) = n +1
n
n−1∑i=1
(T (i) + T (n− i))
Solve recurrence: Cannot apply the master theorem, but since it seems that
we get an O(n log n) bound even in seemingly bad cases, we can try
to establish a cn log n bound via induction.
21 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Establishing average case of qs
Establish base case. (Trivial.)
Induction step involves summation of the form∑n−1
i=1 i log i .
Attempt 1: bound log i above by log n. (Induction fails.)Attempt 2: split the sum into two parts:
n/2∑i=1
i log i +n−1∑
i=n/2+1
i log i
and apply the approx. to each half. (Succeeds with c ≥ 4.)Attempt 3: replace the summation with the upper bound
n∫x=1
x log x =x2
2
(log x − 1
2
)∣∣∣∣nx=1
(Succeeds with the constraint c ≥ 2.)
22 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Randomized Quicksort
Picks a pivot at random
What is its complexity?
For randomized algorithms, we talk about expected complexity, which
is an average over all possible values of the random variable.
If pivot index is picked uniformly at random over the interval [l, h],
then:
every array element is equally likely to be selected as the pivot
every partition is equally likely
thus, expected complexity of randomized quicksort is given by the
same recurrence as the average case of qs.
23 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Lower bounds for comparison-based sortingSorting algorithms can be depicted as trees: each leaf identifies the input
permutation that yields a sorted order.
An n log n lower bound for sortingSorting algorithms can be depicted as trees. The one in the following figure sorts an array ofthree elements, a1, a2, a3. It starts by comparing a1 to a2 and, if the first is larger, comparesit with a3; otherwise it compares a2 and a3. And so on. Eventually we end up at a leaf, andthis leaf is labeled with the true order of the three elements as a permutation of 1, 2, 3. Forexample, if a2 < a1 < a3, we get the leaf labeled “2 1 3.”
3 2 1
Yes
a2 < a3?
a1 < a2?
a1 < a3?
a2 < a3? a1 < a3?
2 3 1
2 1 3
3 1 2 1 3 2
1 2 3
No
The depth of the tree—the number of comparisons on the longest path from root to leaf,in this case 3—is exactly the worst-case time complexity of the algorithm.This way of looking at sorting algorithms is useful because it allows one to argue that
mergesort is optimal, in the sense that Ω(n log n) comparisons are necessary for sorting nelements.Here is the argument: Consider any such tree that sorts an array of n elements. Each of
its leaves is labeled by a permutation of 1, 2, . . . , n. In fact, every permutation must appearas the label of a leaf. The reason is simple: if a particular permutation is missing, whathappens if we feed the algorithm an input ordered according to this same permutation? Andsince there are n! permutations of n elements, it follows that the tree has at least n! leaves.We are almost done: This is a binary tree, and we argued that it has at least n! leaves.
Recall now that a binary tree of depth d has at most 2d leaves (proof: an easy induction ond). So, the depth of our tree—and the complexity of our algorithm—must be at least log(n!).And it is well known that log(n!) ≥ c · n log n for some c > 0. There are many ways to see
this. The easiest is to notice that n! ≥ (n/2)(n/2) because n! = 1 · 2 · · · · · n contains at leastn/2 factors larger than n/2; and to then take logs of both sides. Another is to recall Stirling’sformula
n! ≈
π
2n +
1
3
· nn · e−n.
Either way, we have established that any comparison tree that sorts n elements must make,in the worst case, Ω(n log n) comparisons, and hence mergesort is optimal!
Well, there is some fine print: this neat argument applies only to algorithms that usecomparisons. Is it conceivable that there are alternative sorting strategies, perhaps usingsophisticated numerical manipulations, that work in linear time? The answer is yes, undercertain exceptional circumstances: the canonical such example is when the elements to besorted are integers that lie in a small range (Exercise 2.20).
59
The tree has n! leaves, and hence a height of log n!. By Stirling’s
approximation, n! ≈√2πn
(ne
)n, so, log n! = O(n log n)
No comparison-based sorting algorithm can do better!
24 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Bucket sort
Overview
Divide: Partition input intointervals (buckets), based on keyvalues
Linear scan of input, dropinto appropriate bucket
Recurse: Sort each bucket
Combine: Concatenate bin
contents
Example
Images from Wikipedia commons
25 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Bucket sort (Continued)
Bucket sort generalizes quicksort to multiple partitions
Combination = concatenation
Worst case quadratic bound applies
But performance can be much better if input distribution is uniform.
Exercise: What is the runtime in this case?
Used by letter sorting machines in post oces
26 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Counting Sort
Special case of bucket sort where each bin corresponds to an
interval of size 1.
No need to recurse. Divide = conquered!
Makes sense only if range of key values is small (usually constant)
Thus, counting sort can be done in O(n) time!
Hmm. How did we beat the O(n log n) lower bound?
27 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Radix Sorting
Treat an integer as a sequence of digits
Sort digits using counting sort
LSD sorting: Sort first on least significant digit, and most
significant digit last. After each round of counting sort, results
can be simply concatenated, and given as input to the next
stage.
MSD sorting: Sort first on most significant digit, and least
significant digit last. Unlike LSD sorting, we cannot concatenate
after each stage.
Note: Radix sort does not divide inputs into smaller subsets
If you think of input as multi-dimensional data, then we break
down the problem to each dimension. 28 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Stable sorting algorithms
Stable sorting algorithms: don’tchange order of equal elements.
Merge sort and LSD sort are stable.Quicksort is not stable.
Why is stability important?
Eect of sorting on attribute A andthen B is the same as sorting on〈B, A〉LSD sort won’t work without thisproperty!
Other examples: sorting spreadsheets or tables on web pages
Images from Wikipedia Commons
29 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort
Sorting strings
Can use LSD or MSD sortingEasy if all strings are of same length.
Requires a bit more care with variable-length strings.
Starting point: use a special terminator character t < a for all valid
characters a.
Easy to devise an O(nl) algorithm, where n is the number ofstrings and l is the maximum size of any string.But such an algorithm is not linear in input size.
Exercise: Devise a linear-time string algorithm.
Given a set S of strings, your algorithm should sort in O(|S|)time, where
|S| =∑s∈S|s|
30 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues
Select kth largest element
Obvious approach: Sort, pick kth element — wasteful, O(n log n)
Better approach: Recursive partitioning, search only on one side
qsel(A, l, h, k)
if l = h return A[l];
(h1, l2) = partition(A, l, h);
if k ≤ h1return qsel(A, l, h1, k)
else return qsel(A, l2, h, k)
Complexity
Best case: Splits are even:
T (n) = n+ T (n/2), which has an O(n)
solution.
Skewed 10%/90% T (n) ≤ n + T (0.9n) —
still linear
Worst case: T (n) = n + T (n− 1) —
quadratic!
31 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues
Worst-case O(n) Selection
Intuition: Spend a bit more time to select a pivot that ensures
reasonably balanced partitions
MoM Algorithm [Blum, Floyd, Pratt, Rivest and Tarjan 1973]
32 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues
O(n) Selection: MoM Algorithm
Quick select (qsel ) takes no time to pick a pivot, but then spends
O(n) to partition.
Can we spend more time upfront to make a better selection of the
pivot, so that we can avoid highly skewed splits?
Key Idea
Use the selection algorithm itself to choose the pivot.
Divide into sets of 5 elements
Compute median of each set (O(5), i.e., constant time)Use selection recursively on these n/5 elements to pick their mediani.e., choose the median of medians (MoM) as the pivot
Partition using MoM, and recurse to find kth largest element.
33 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues
O(n) Selection: MoM Algorithm
Theorem: MoM-based split won’t be worse than 30%/70%
Result: Guaranteed linear-time algorithm!
Caveat: The constant factor is non-negligible; use as fall-back if
random selection repeatedly yields unbalanced splits.
34 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues
Selecting maximum element: Priority Queues
Heap
A tree-based data structure for priorityqueues
Heap property: H is a heap if for everysubtree h of H
∀k ∈ keys(h) root(h) ≥ k
where keys(h) includes all keysappearing within h
Note: No ordering of siblings or cousins
Supports insert , deleteMax and max.Typically implemented using arrays, i.e.,without an explicit tree data structure
Task of maintaining maxis distributed to subsetsof the entire set;alternatively, it can bethought of asmaintaining severalparallel queues with asingle head.
Images from Wikimedia Commons
35 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues
Binary heap
Array representation: Store heap elements in breadth-first order in
the array. Node i ’s children are at indices 2 ∗ i and 2 ∗ i + 1
Conceptually, we are dealing with a balanced binary tree
Max: Element at the root of the array, extracted in O(1) time
DeleteMax: Overwrite root with last element of heap. Fix heap –
takes O(log n) time, since only the ancestors of the last node need
to be fixed up.
Insert: Append element to the end of array, fix up heap
MkHeap: Fix up the entire heap. Takes O(n) time.
Heapsort: O(n log n) algorithm, MkHeap followed by n calls to
DeleteMax36 / 81
Warmup Sorting Selection Closest pair Multiplication FFT
Finding closest pair of points
Problem: Given a set of n points in a d-dimensional space, identify
the two that have the smallest Euclidean distance between them.
Applications: A central problem in graphics, vision, air-trac control,
navigation, molecular modeling, and so on.
Images from Wikipedia Commons
37 / 81
Warmup Sorting Selection Closest pair Multiplication FFT
Divide-and-conquer closest pair (2D)
Divide: Identify k such that the line x = k divides the points evenly.
(Median computation, takes O(n) time.)
Recursive case: Find closest pair in each half.
Combine:
Can’t just take the min of the closest pairs from two halves.
Need to consider pairs across the divide line — seems that this
will take O(n2) time!
38 / 81
Warmup Sorting Selection Closest pair Multiplication FFT
Speeding up search for cross-region pairs
Observation (Key Observation 1)
Let δ1 and δ2 be the minimum distances in each half.
Need only consider points within δ = min(δ1, δ2) from the dividing
line
We expect that only a small number of points will be within such a
narrow strip.
But in the worst case, every point could be within the strip!
39 / 81
Warmup Sorting Selection Closest pair Multiplication FFT
Sparsity condition
Consider a point p on the left δ-strip. How many points q1, ..., qr on
the right δ-strip could be within δ from p?
Observation (Key Observation 2)
q1, ..., qr should all be within a rectangular 2δ × δ asshown
r can’t be too large: q1, ..., qr will crowd together,
closer than δ
Theorem: r ≤ 6
We need to consider at most 6n cross-region pairs!
Remains O(n) in higher dimensions as well
40 / 81
Warmup Sorting Selection Closest pair Multiplication FFT
Closest pair: Summary
Recurrence: T (n) = 2T (n/2) + Ω(n), since median computation is
already linear-time. Thus, T (n) = Ω(n log n).
To get to O(n log n), need to1. compute the δ-strip in O(n) time
Keep the points in each region sorted in x-dimension
Takes an additional O(n log n) time, no change to overall complexity
2. compute q1, ..., q6 in O(1) time.keep points in each region sorted also in y-dimension
maintain this order while deleting points outside δ strip
in this list, for each p, consider only 12 neighbors — 6 on each side of
divide
41 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Matrix Multiplication
The product Z of two n× n matrices X and Y is given by
Zij =n∑k=1
XikYkj — leads to an O(n3) algorithm.
This follows by taking expected values of both sides of the following statement:
Time taken on an array of size n
≤ (time taken on an array of size 3n/4)+ (time to reduce array size to ≤ 3n/4),
and, for the right-hand side, using the familiar property that the expectation of the sum is thesum of the expectations.From this recurrence we conclude that T (n) = O(n): on any input, our algorithm returns
the correct answer after a linear number of steps, on the average.
The Unix sort commandComparing the algorithms for sorting and median-finding we notice that, beyond the com-mon divide-and-conquer philosophy and structure, they are exact opposites. Mergesort splitsthe array in two in the most convenient way (first half, second half), without any regard tothe magnitudes of the elements in each half; but then it works hard to put the sorted sub-arrays together. In contrast, the median algorithm is careful about its splitting (smallernumbers first, then the larger ones), but its work ends with the recursive call.Quicksort is a sorting algorithm that splits the array in exactly the same way as the me-
dian algorithm; and once the subarrays are sorted, by two recursive calls, there is nothingmore to do. Its worst-case performance is Θ(n2), like that of median-finding. But it can beproved (Exercise 2.24) that its average case is O(n log n); furthermore, empirically it outper-forms other sorting algorithms. This has made quicksort a favorite in many applications—for instance, it is the basis of the code by which really enormous files are sorted.
2.5 Matrix multiplicationThe product of two n×nmatrices X and Y is a third n×n matrix Z = XY , with (i, j)th entry
Zij =
n
k=1
XikYkj.
To make it more visual, Zij is the dot product of the ith row of X with the jth column of Y :
X Y Z
i
j
(i, j)× =
In general, XY is not the same as Y X; matrix multiplication is not commutative.The preceding formula implies an O(n3) algorithm for matrix multiplication: there are n2
entries to be computed, and each takes O(n) time. For quite a while, this was widely believed
62
42 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Divide-and-conquer Matrix Multiplication
Divide X and Y into four n/2× n/2 submatrices
X =
[A B
C D
]and Y =
[E F
G H
]Recursively invoke matrix multiplication on these submatrices:
XY =
[A B
C D
][E F
G H
]=
[AE + BG AF + BH
CE + DG CF + DH
]Divided, but did not conquer! T (n) = 8T (n/2) + O(n2), which is still
O(n3)
43 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Strassen’s Matrix Multiplication
Strassen showed that 7 multiplications are enough:
XY =
[P6 + P5 + P4 − P2 P1 + P2
P3 + P4 P1 − P3 + P5 − P7
]where
P1 = A(F − H) P5 = (A + D)(E + H)
P2 = (A + B)H P6 = (B − D)(G + H)
P3 = (C + D)E P7 = (A− C)(E + F )
P4 = D(G − E)Now, the recurrence T (n) = 7T (n/2) + O(n2) has O(nlog2 7 = n2.81)
solution!
Best-to-date complexity is about O(n2.4), but this algorithm is not
very practical.
44 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Karatsuba’s AlgorithmSame high-level strategy as Strassen — but predates Strassen.
Divide: n-digit numbers into halves, each with n/2-digits:
a = a1 a0 = 2n/2a1 + a0
b = b1 b0 = 2n/2b1 + b0
ab = 2na1b1 + 2n/2(a1b0 + b1a0) + a0b0
Key point — Instead of 4 multiplications, we can get by with 3 since:
a1b0 + b1a0 = (a1 + a0)(b1 + b0)− a1b1 − a0b0
Recursively compute a1b1, a0b0 and (a1 + a0)(b1 + b0).
Recurrence T (n) = 3T (n/2) + O(n) has an O(nlog2 3 = n1.59) solution!Note: This trick for using 3 (not 4) multiplications noted by Gauss
(1777-1855) in the context of complex numbers.45 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Toom-Cook Multiplication
Generalize KaratsubaDivide into n > 2 parts
Can be more easily understood when integer multiplication isviewed as a polynomial multiplication.
46 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Integer Multiplication Revisited
An integer represented using digits
an−1 . . . a0over a base d (i.e., 0 ≤ ai < d) is very similar to the polynomial
A(x) =n−1∑i=0
aixi
Specifically, the value of the integer is A(d).
Integer multiplication follows the same steps as polynomialmultiplication:
an−1 . . . a0 × bn−1 . . . b0 = (A(x)× B(x))(d)
47 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Polynomials: Basic Properties
Horner’s rule
An nth degree polynomial∑n
i=0 aixi can be evaluated in O(n) time:
((· · · ((anx + an−1)x + an−2)x + · · ·+ a1)x + a0)
Roots and Interpolation
An nth degree polynomial A(x) has exactly n roots r1, ..., rn. In general,ri ’s are complex and need not be distinct.It can be represented as a product of sums using these roots:
A(x) =n∑i=1
aixi =
n∏i=0
(xi − ri)
Alternatively, A(x) can be specified uniquely by specifying n + 1 points(xi , yi) on it, i.e., A(xi) = yi .
48 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Operations on Polynomials
Representation Add Mult
Coecients O(n) O(n2)
Roots ? O(n)
Points O(n) O(n)
Note: Point representation is the best for computation! But usually,
only the coecients are given
Solution: Convert to point form by evaluating A(x) at selected
points.
But conversion defeats the purpose: requires O(n) evaluations, each
taking O(n) time, thus we are back to O(n2) total time.
Toom (and FFT) Idea: Choose evaluation points judiciously to speed
up evaluation 49 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Matrix representation of Polynomial Evaluation
Given a polynomial
A(x) =n−1∑t=0
atxn
choose m points x0, . . . , xm for its evaluation.
Evaluation can be expressed using matrix multiplication:
p0p1p2...
pm
=
1 x0 x20 · · · xn−10
1 x1 x21 · · · xn−11
1 x2 x22 · · · xn−12...
......
......
1 xm x2m · · · xn−1m
a0a1...
an−1
50 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Multiplication using Point Representation
Let A(x) and B(x) be polynomials representing two numbers
Evaluate both polynomials at chosen points x0, ...xm
P = XA Q = XB
where P,X, A,Q and B denote matrices as in last page
Compute point-wise productr0...
rm
=
p0 ∗ q0
...
pm ∗ qm
Compute polynomial C corresponding to R
R = XC ⇒ C = X−1R
To avoid overflow, m should be degree(A) + degree(B) + 1 for R51 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Improving complexity ...
Key problem: Complexity of computing X and its inverse X−1
Toom strategy:Use low-degree polynomials e.g., Toom-2 = Karatsuba uses degree 1.represents an n-bit number as a 2-digit number over a large base d = 2n/2
Fix evaluation points for a given degree polynomial so that X and X−1
can be precomputedFor Toom-2, x0 = 0, x1 = 1, x2 =∞. (Define A(∞) = an−1.)
Choose points so that expensive multiplications can be avoided while
computing P = XA,Q = XB and C = X−1R
Toom-N on n-digit numbers needs 2N − 1 multiplications on n/N
digit numbers:
T (n) = (2N − 1)T (n/N) + O(n)
which, by Master theorem, has a solution O(nlogN(2N−1)) solution52 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Karatsuba revisited as Toom-2
Given evaluation points x0 = 0, x1 = 1, x2 =∞,
X =
1 0 0
1 1 1
0 1 0
XA =
1 0 0
1 1 1
0 1 0
a0a10
=
a0a0 + a1a1
Similarly
XB =
b0b0 + b1b1
Point-wise multiplication yields:
R =
a0b0(a0 + a1)(b0 + b1)
a1b1
and so on ...
53 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Limitations of Toom
In principle, complexity can be reduced to n1+ε for arbitrarily small
positive ε by increasing N
In reality, the algorithm itself depends on the choice of N .
Specifically, constant factors involved increase rapidly with N .
As a practical matter, N = 4 or 5 is where we stop.
Question: Can we go farther?
54 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
FFT and Schonhage-Strassen
Key idea: evaluate polynomial on the complex plane
Choose powers of Nth complex root of unity as the points for
evaluation
Enables sharing of operations in computing XA so that it can be
done in O(N logN) time, rather than O(N2) time needed for the
naive matrix-multiplication based approach
55 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
FFT to the Rescue!
Matrix form of DFT and interpretation as polynomial evaluation:
s0s1...sj...
sN−1
=
1 1 1 · · · 11 ω ω2 · · · ωN−1
......
......
...1 ωj ω2j · · · ωj(N−1)
......
......
...1 ωN−1 ω2(N−1) · · · ω(N−1)(N−1)
a0a1...aj...
aN−1
Voila! FFT computes A(x) at N points (xi = ωi ) in O(N logN) time!
O(N logN) integer multiplicationConvert to point representation using FFT O(N logN)Multiply on point representation O(N)Convert back to coecients using FFT−1 O(N logN)
56 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
FFT to the Rescue!
s0s1...sj...
sn−1
=
1 1 1 · · · 11 ω ω2 · · · ωn−1
......
......
...1 ωj ω2j · · · ωj(n−1)
......
......
...1 ωn−1 ω2(n−1) · · · ω(n−1)(n−1)
a0a1...aj...
an−1
FFT can be thought of as a clever way to choose points:
Evaluations at many distinct points “collapse” together
This is why we are left with 2T (n/2) work after division, instead of4T (n/2) for a naive choice of points.
57 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
FFT-based multiplication: More careful analysis ...
Computations on complex or real numbers can lose precision.
For integer operations, we should work in some other ring — usually,
we choose a ring based on modulo arithmetic.Ex: in mod 33 arithmetic, 2 is the 10th root of 1, i.e., 210 ≡ 1 mod 33
More generally, 2 is the nth root of unity modulo (2n/2 + 1)
Point-wise additions and multiplications are not O(1).
We are adding up to n numbers (“digits”) — we need Ω(log n) bits
So, total cost increases by at least log n, i.e., O(n log2 n).
[Schonhage-Strassen ’71] developed O(n log n log log n) algorithm:
recursively apply their technique for “inner” operations.
58 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication
Integer Multiplication Summary
Algorithms implemented in libraries for arbitrary precision
arithmetic, with applications in public key cryptography, computer
algebra systems, etc.
GNU MP is a popular library, uses various algorithms based on
input size: naive, Karatsuba, Toom-3, Toom-4, or
Schonhage-Strassen (at about 50K digits).
Karatsuba is Toom-2. Toom-N is based on
Evaluating a polynomial at 2N points,
performing point-wise multiplication, and
interpolating to get back the polynomial, while
minimizing the operations needed for interpolation
59 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Fast Fourier Transformation
One of the most widely used algorithms — yet most people are
unaware of its use!
Solving dierential equations: Applied to many computational
problems in engineering, e.g., heat transfer
Audio: MP3, digital audio processors, music/speech synthesizers,
speech recognition, ...
Image and video: JPEG, MPEG, vision, ...
Communication: modulation, filtering, radars, software-defined
radios, H.264, ...
Medical diagnostics: MRI, PET, ultrasound, ...
Quantum computing: See text Ch. 10
Other: Optics, data compression, seismology, ...
60 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Fourier Series
Theorem (Fourier Theorem)
Any (suciently smooth) function with a period
T can be expressed as a sum of series of
sinusoids with periods T/n for integral n.
a(t) =∞∑n=0
(dn sin(2πnt/T ) + en cos(2πnt/T ))
Images from Wikimedia Commons. 61 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Fourier Series
Using the identity
eix = cos x + i sin x
Fourier series becomes
a(t) =∞∑
n=−∞cne
2πint/T
It can be shown
cn =
T∫0
a(t)e−2πintdt
For real a(t), cn = c∗−n.
Example: Touch tone button 1
Signal. [touch tone button 1]
Time domain.
Frequency domain.
38
π ⋅
π ⋅
Reference: Cleve Moler, Numerical Computing with MATLAB
0.5
Images from Cleve Moler, Numerical computing with MATLAB62 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Fourier Transform
What if a is not periodic?
May be we can start with the Fourier series definition for cn
cn =
∫ T
0
a(t)e−2πintdt
and let T →∞?
Frequencies are not discrete any more, as the
“fundamental frequency” f = 1/T → 0
Instead of discrete coecients cn, we will have a
continuous function — call it s(f ).
s(f ) =
∫ ∞∞
a(t)e−2πiftdt
F(a) denotes a’s Fourier
transform
F is almost self-inverting:
F(F(a(t))) = a(−t)
63 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
How do Fourier Series/Transform help?
Dierential equations: Turn non-integrable functions into a sum of
easily integrable ones.
Some problems easier to solve in frequency domain:
Filtering: filter out noise, tuning, ...
Compression: eliminate high frequency components, ...
Convolution: Convolution in time domain becomes (simpler)
multiplication in frequency domain.
Definition (Convolution)
(a ∗ b)(t) =
∞∫−∞
a(t − x)b(x)dxTheorem (Convolution)F(a ∗ b)(t) = F(a(t))F(b(t))
64 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Discrete Fourier Transform
Real-world signals are typically sampled
DFT is a formulation of FT applicable to such samples
Nyquist rate: A signal with highest frequency n/2 can be losslessly
reconstructed from n samples.
DFT of time domain samples a0, . . . , an−1 yields frequency domain
samples s0, . . . , sn−1:
sf =n−1∑t=0
ate−2πift/n cf. s(f ) =
∫ ∞∞
a(t)e−2πiftdt
Note: DFT formulation can be derived from FT by treating the
sampling process as a multiplication by a sequence of impulse
functions separated by the sampling interval65 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Background: Complex Plane, Polar Coordinates
Figure 2.6 The complex roots of unity are ideal for our divide-and-conquer scheme.
θReal
Imaginary
a
b
r
The complex planez = a + bi is plotted at position (a, b).
Polar coordinates: rewrite as z = r(cos θ + i sin θ) = reiθ,denoted (r, θ).• length r =
√a2 + b2.
• angle θ ∈ [0, 2π): cos θ = a/r, sin θ = b/r.• θ can always be reduced modulo 2π.
Examples: Number −1 i 5 + 5i
Polar coords (1, π) (1, π/2) (5√
2, π/4)
(r1r2, θ1 + θ2)
(r1, θ1)
(r2, θ2)
Multiplying is easy in polar coordinates
Multiply the lengths and add the angles:(r1, θ1) × (r2, θ2) = (r1r2, θ1 + θ2).
For any z = (r, θ),• −z = (r, θ + π) since −1 = (1, π).• If z is on the unit circle (i.e., r = 1), then zn = (1, nθ).
Angle 2πn
4πn
2πn + π
The nth complex roots of unitySolutions to the equation zn = 1.
By the multiplication rule: solutions are z = (1, θ), for θ amultiple of 2π/n (shown here for n = 16).
For even n:• These numbers are plus-minus paired: −(1, θ) = (1, θ+π).• Their squares are the (n/2)nd roots of unity, shown herewith boxes around them.
Divide-and-conquer step
EvaluateAe(x), Ao(x)at (n/2)ndroots
Stillpaired
Divide andconquer
Paired
Evaluate A(x)at nth rootsof unity
(n is a power of 2)
70
66 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Polar Coordinates and Multiplication
Figure 2.6 The complex roots of unity are ideal for our divide-and-conquer scheme.
θReal
Imaginary
a
b
r
The complex planez = a + bi is plotted at position (a, b).
Polar coordinates: rewrite as z = r(cos θ + i sin θ) = reiθ,denoted (r, θ).• length r =
√a2 + b2.
• angle θ ∈ [0, 2π): cos θ = a/r, sin θ = b/r.• θ can always be reduced modulo 2π.
Examples: Number −1 i 5 + 5i
Polar coords (1, π) (1, π/2) (5√
2, π/4)
(r1r2, θ1 + θ2)
(r1, θ1)
(r2, θ2)
Multiplying is easy in polar coordinates
Multiply the lengths and add the angles:(r1, θ1) × (r2, θ2) = (r1r2, θ1 + θ2).
For any z = (r, θ),• −z = (r, θ + π) since −1 = (1, π).• If z is on the unit circle (i.e., r = 1), then zn = (1, nθ).
Angle 2πn
4πn
2πn + π
The nth complex roots of unitySolutions to the equation zn = 1.
By the multiplication rule: solutions are z = (1, θ), for θ amultiple of 2π/n (shown here for n = 16).
For even n:• These numbers are plus-minus paired: −(1, θ) = (1, θ+π).• Their squares are the (n/2)nd roots of unity, shown herewith boxes around them.
Divide-and-conquer step
EvaluateAe(x), Ao(x)at (n/2)ndroots
Stillpaired
Divide andconquer
Paired
Evaluate A(x)at nth rootsof unity
(n is a power of 2)
70
67 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Roots of unity on Complex Plane
Figure 2.6 The complex roots of unity are ideal for our divide-and-conquer scheme.
θReal
Imaginary
a
b
r
The complex planez = a + bi is plotted at position (a, b).
Polar coordinates: rewrite as z = r(cos θ + i sin θ) = reiθ,denoted (r, θ).• length r =
√a2 + b2.
• angle θ ∈ [0, 2π): cos θ = a/r, sin θ = b/r.• θ can always be reduced modulo 2π.
Examples: Number −1 i 5 + 5i
Polar coords (1, π) (1, π/2) (5√
2, π/4)
(r1r2, θ1 + θ2)
(r1, θ1)
(r2, θ2)
Multiplying is easy in polar coordinates
Multiply the lengths and add the angles:(r1, θ1) × (r2, θ2) = (r1r2, θ1 + θ2).
For any z = (r, θ),• −z = (r, θ + π) since −1 = (1, π).• If z is on the unit circle (i.e., r = 1), then zn = (1, nθ).
Angle 2πn
4πn
2πn + π
The nth complex roots of unitySolutions to the equation zn = 1.
By the multiplication rule: solutions are z = (1, θ), for θ amultiple of 2π/n (shown here for n = 16).
For even n:• These numbers are plus-minus paired: −(1, θ) = (1, θ+π).• Their squares are the (n/2)nd roots of unity, shown herewith boxes around them.
Divide-and-conquer step
EvaluateAe(x), Ao(x)at (n/2)ndroots
Stillpaired
Divide andconquer
Paired
Evaluate A(x)at nth rootsof unity
(n is a power of 2)
70
68 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Matrix representation of DFT
Given time domain samples at for t = 0, 1, . . . , n− 1,
Compute frequency domain samples sf for f = 0, 1, . . . , n− 1
sf =n−1∑t=0
ate−2πift/n =
n−1∑t=0
at(e−2πi/n
)ft=
n−1∑t=0
atωft
where ω = e−2πi/n is the nth complex root of unity
s0s1s2...
sj...
sn−1
=
1 1 1 · · · 1
1 ω ω2 · · · ωn−1
1 ω2 ω4 · · · ω2(n−1)
......
......
...
1 ωj ω2j · · · ωj(n−1)
......
......
...
1 ωn−1 ω2(n−1) · · · ω(n−1)(n−1)
a0a1a2...
aj...
an−1
69 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Matrix representation of DFT
Note that e−2πi/n is represents the nth root of 1, denoted ω.
s0s1s2...
sj...
sn−1
=
1 1 1 · · · 1
1 ω ω2 · · · ωn−1
1 ω2 ω4 · · · ω2(n−1)
......
......
...
1 ωj ω2j · · · ωj(n−1)
......
......
...
1 ωn−1 ω2(n−1) · · · ω(n−1)(n−1)
a0a1a2...
aj...
an−1
Possible interpretations of these matrix equations:
Simultaneous equations that can be solved
Change of basis (rotate coordinate system)
Evaluation of polynomial∑n−1
k=0 akxk at x = ωj , 0 ≤ j < n.
70 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Speeding up FFT Computation
Matrix multiplication formulation has an obvious
divide-and-conquer implementation
Mn−→A 0...(n−1) =
[M 11n/2 M 12
n/2
M 21n/2 M 22
n/2
][ −→A 0···bn/2c−→
A dn/2e···(n−1)
]But this algorithm still takes O(n2) time
... but wait! — there are only O(n) distinct elements in the square
matrix Mn.
O(n) repetitions of each element in Mn, so there is significant
scope for sharing operations on submatrices!
71 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Observations about M(ω)
The orthogonality property can be summarized in the single equation
MM∗ = nI,
since (MM∗)ij is the inner product of the ith and jth columns of M (do you see why?). Thisimmediately implies M−1 = (1/n)M ∗: we have an inversion formula! But is it the same for-mula we earlier claimed? Let’s see—the (j, k)th entry of M ∗ is the complex conjugate of thecorresponding entry ofM , in other words ω−jk. WhereuponM ∗ = Mn(ω−1), and we’re done.
And now we can finally step back and view the whole affair geometrically. The task weneed to perform, polynomial multiplication, is a lot easier in the Fourier basis than in thestandard basis. Therefore, we first rotate vectors into the Fourier basis (evaluation), thenperform the task (multiplication), and finally rotate back (interpolation). The initial vectorsare coefficient representations, while their rotated counterparts are value representations. Toefficiently switch between these, back and forth, is the province of the FFT.
2.6.4 A closer look at the fast Fourier transformNow that our efficient scheme for polynomial multiplication is fully realized, let’s hone inmore closely on the core subroutine that makes it all possible, the fast Fourier transform.
The definitive FFT algorithm
The FFT takes as input a vector a = (a0, . . . , an−1) and a complex number ω whose powers1,ω,ω2, . . . ,ωn−1 are the complex nth roots of unity. It multiplies vector a by the n × n matrixMn(ω), which has (j, k)th entry (starting row- and column-count at zero) ω jk. The potentialfor using divide-and-conquer in this matrix-vector multiplication becomes apparent whenM ’scolumns are segregated into evens and odds:
=
aMn(ω)
an−1
a0
a1
a2
a3
a4
...
ωjk
k
j =
a2
a1
a3
an−1
...
a0
...an−2
2k + 1Column
2k
Even
ω2jk ωj · ω2jk
columnsOddcolumns
j
Row ja2
a1
a3
an−1
...
a0
...an−2
ω2jk
ω2jk
ωj · ω2jk
2k + 1Column
j + n/2
2k
−ωj · ω2jk
In the second step, we have simplified entries in the bottom half of the matrix using ωn/2 = −1and ωn = 1. Notice that the top left n/2 × n/2 submatrix is Mn/2(ω
2), as is the one on thebottom left. And the top and bottom right submatrices are almost the same as Mn/2(ω
2),but with their jth rows multiplied through by ωj and −ωj, respectively. Therefore the finalproduct is the vector
74
Two successive colums dier by a factor ωj in the j th row
Rows that are n/2 rows apart dier by a factor of ωkn/2 in the kth
column
Note that ωn/2 = −1, so they dier by a factor of −1 on odd columns,
and are identical on even columns.
72 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
DFT Matrix Multiplication, Rearranged ...
The orthogonality property can be summarized in the single equation
MM∗ = nI,
since (MM∗)ij is the inner product of the ith and jth columns of M (do you see why?). Thisimmediately implies M−1 = (1/n)M ∗: we have an inversion formula! But is it the same for-mula we earlier claimed? Let’s see—the (j, k)th entry of M ∗ is the complex conjugate of thecorresponding entry ofM , in other words ω−jk. WhereuponM ∗ = Mn(ω−1), and we’re done.
And now we can finally step back and view the whole affair geometrically. The task weneed to perform, polynomial multiplication, is a lot easier in the Fourier basis than in thestandard basis. Therefore, we first rotate vectors into the Fourier basis (evaluation), thenperform the task (multiplication), and finally rotate back (interpolation). The initial vectorsare coefficient representations, while their rotated counterparts are value representations. Toefficiently switch between these, back and forth, is the province of the FFT.
2.6.4 A closer look at the fast Fourier transformNow that our efficient scheme for polynomial multiplication is fully realized, let’s hone inmore closely on the core subroutine that makes it all possible, the fast Fourier transform.
The definitive FFT algorithm
The FFT takes as input a vector a = (a0, . . . , an−1) and a complex number ω whose powers1,ω,ω2, . . . ,ωn−1 are the complex nth roots of unity. It multiplies vector a by the n × n matrixMn(ω), which has (j, k)th entry (starting row- and column-count at zero) ω jk. The potentialfor using divide-and-conquer in this matrix-vector multiplication becomes apparent whenM ’scolumns are segregated into evens and odds:
=
aMn(ω)
an−1
a0
a1
a2
a3
a4
...
ωjk
k
j =
a2
a1
a3
an−1
...
a0
...an−2
2k + 1Column
2k
Even
ω2jk ωj · ω2jk
columnsOddcolumns
j
Row ja2
a1
a3
an−1
...
a0
...an−2
ω2jk
ω2jk
ωj · ω2jk
2k + 1Column
j + n/2
2k
−ωj · ω2jk
In the second step, we have simplified entries in the bottom half of the matrix using ωn/2 = −1and ωn = 1. Notice that the top left n/2 × n/2 submatrix is Mn/2(ω
2), as is the one on thebottom left. And the top and bottom right submatrices are almost the same as Mn/2(ω
2),but with their jth rows multiplied through by ωj and −ωj, respectively. Therefore the finalproduct is the vector
74
Figure 2.9 The fast Fourier transformfunction FFT(a,ω)Input: An array a = (a0, a1, . . . , an−1), for n a power of 2
A primitive nth root of unity, ωOutput: Mn(ω) a
if ω = 1: return a(s0, s1, . . . , sn/2−1) = FFT((a0, a2, . . . , an−2),ω
2)
(s0, s1, . . . , s
n/2−1) = FFT((a1, a3, . . . , an−1),ω
2)
for j = 0 to n/2 − 1:rj = sj + ωjsjrj+n/2 = sj − ωjsj
return (r0, r1, . . . , rn−1)
a0
a2...an−2
a0
a2...an−2
Mn/2
Mn/2
a1
a3...an−1
a1
a3...an−1
Mn/2
Mn/2
+ ωj
− ωjj + n/2
Row j
In short, the product of Mn(ω) with vector (a0, . . . , an−1), a size-n problem, can be expressedin terms of two size-n/2 problems: the product of Mn/2(ω
2) with (a0, a2, . . . , an−2) and with(a1, a3, . . . , an−1). This divide-and-conquer strategy leads to the definitive FFT algorithm ofFigure 2.9, whose running time is T (n) = 2T (n/2) + O(n) = O(n log n).
The fast Fourier transform unraveled
Throughout all our discussions so far, the fast Fourier transform has remained tightly co-cooned within a divide-and-conquer formalism. To fully expose its structure, we now unravelthe recursion.The divide-and-conquer step of the FFT can be drawn as a very simple circuit. Here is how
a problem of size n is reduced to two subproblems of size n/2 (for clarity, one pair of outputs(j, j + n/2) is singled out):
75
Only two subproblems of size n/2:
Multiply Mn/2 by−→A odd and
−→A even:
T(n) = 2T(n/2) + O(n), with an
O(n log n) solution.But wait! Mn has O(n2) size, how can
we operate on it in O(n) time?
73 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
FFT AlgorithmFigure 2.9 The fast Fourier transformfunction FFT(a,ω)Input: An array a = (a0, a1, . . . , an−1), for n a power of 2
A primitive nth root of unity, ωOutput: Mn(ω) a
if ω = 1: return a(s0, s1, . . . , sn/2−1) = FFT((a0, a2, . . . , an−2),ω
2)
(s0, s1, . . . , s
n/2−1) = FFT((a1, a3, . . . , an−1),ω
2)
for j = 0 to n/2 − 1:rj = sj + ωjsjrj+n/2 = sj − ωjsj
return (r0, r1, . . . , rn−1)
a0
a2...an−2
a0
a2...an−2
Mn/2
Mn/2
a1
a3...an−1
a1
a3...an−1
Mn/2
Mn/2
+ ωj
− ωjj + n/2
Row j
In short, the product of Mn(ω) with vector (a0, . . . , an−1), a size-n problem, can be expressedin terms of two size-n/2 problems: the product of Mn/2(ω
2) with (a0, a2, . . . , an−2) and with(a1, a3, . . . , an−1). This divide-and-conquer strategy leads to the definitive FFT algorithm ofFigure 2.9, whose running time is T (n) = 2T (n/2) + O(n) = O(n log n).
The fast Fourier transform unraveled
Throughout all our discussions so far, the fast Fourier transform has remained tightly co-cooned within a divide-and-conquer formalism. To fully expose its structure, we now unravelthe recursion.The divide-and-conquer step of the FFT can be drawn as a very simple circuit. Here is how
a problem of size n is reduced to two subproblems of size n/2 (for clarity, one pair of outputs(j, j + n/2) is singled out):
75
Figure 2.9 The fast Fourier transformfunction FFT(a,ω)Input: An array a = (a0, a1, . . . , an−1), for n a power of 2
A primitive nth root of unity, ωOutput: Mn(ω) a
if ω = 1: return a(s0, s1, . . . , sn/2−1) = FFT((a0, a2, . . . , an−2),ω
2)
(s0, s1, . . . , s
n/2−1) = FFT((a1, a3, . . . , an−1),ω
2)
for j = 0 to n/2 − 1:rj = sj + ωjsjrj+n/2 = sj − ωjsj
return (r0, r1, . . . , rn−1)
a0
a2...an−2
a0
a2...an−2
Mn/2
Mn/2
a1
a3...an−1
a1
a3...an−1
Mn/2
Mn/2
+ ωj
− ωjj + n/2
Row j
In short, the product of Mn(ω) with vector (a0, . . . , an−1), a size-n problem, can be expressedin terms of two size-n/2 problems: the product of Mn/2(ω
2) with (a0, a2, . . . , an−2) and with(a1, a3, . . . , an−1). This divide-and-conquer strategy leads to the definitive FFT algorithm ofFigure 2.9, whose running time is T (n) = 2T (n/2) + O(n) = O(n log n).
The fast Fourier transform unraveled
Throughout all our discussions so far, the fast Fourier transform has remained tightly co-cooned within a divide-and-conquer formalism. To fully expose its structure, we now unravelthe recursion.The divide-and-conquer step of the FFT can be drawn as a very simple circuit. Here is how
a problem of size n is reduced to two subproblems of size n/2 (for clarity, one pair of outputs(j, j + n/2) is singled out):
75
74 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Convolution in the Discrete World
(−→An ∗−→Bm)t =
m−1∑x=0
at−xbx cf. (a∗b)(t) =
∞∫−∞
a(t−x)b(x)dx
Linear convolution: at−x = 0 if x > t
Circular convolution: at−x = at−x+n if x > n. (Equivalent to treating
A as a periodic function.)
Zero-extended convolution: First extend A and B to have m + n− 1
samples by letting Ay = 0 for m ≤ y < m + n and Bz = 0 for
n ≤ z < m + n.
With zero-extension, the definitions of linear and ciucular
conventions match, and hence become eqivalent. Hence, we will deal
only with zero-extended convolution.
Theorem (Discrete Convolution F(−→An ∗−→Bm) = F(
−→An)F(
−→Bm))
75 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Why this fascination with convolution?
Computationally, convolution is a loop to add products
The convolution theorem says we can replace this O(n) loop by a
single operation on the DFT. That is fascinating!
Wait a minute! What about the cost of computing F first?
If we use FFT, then we the computation of F and its inversion will
still obe O(n log n), not quadratic.
Can we use FFT as a building block to speed up algorithms
for other problems?
Integer multiplication looks like a convolution, and usually
takes O(n2). Can we make it O(n log n)?
76 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
FFT to the Rescue!
Matrix form of DFT and interpretation as polynomial evaluation:
s0s1...sj...
sn−1
=
1 1 1 · · · 11 ω ω2 · · · ωn−1
......
......
...1 ωj ω2j · · · ωj(n−1)
......
......
...1 ωn−1 ω2(n−1) · · · ω(n−1)(n−1)
a0a1...aj...
an−1
Voila! FFT computes A(x) at n points (xi = ωi ) in O(n log n) time!
O(n log n) integer multiplicationConvert to point representation using FFT O(n log n)Multiply on point representation O(n)Convert back to coecients using FFT−1 O(n log n)
77 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
FFT to the Rescue!
s0s1...sj...
sn−1
=
1 1 1 · · · 11 ω ω2 · · · ωn−1
......
......
...1 ωj ω2j · · · ωj(n−1)
......
......
...1 ωn−1 ω2(n−1) · · · ω(n−1)(n−1)
a0a1...aj...
an−1
FFT can be thought of as a clever way to choose points:
Evaluations at many distinct points “collapse” together
This is why we are left with 2T (n/2) work after division, instead of4T (n/2) for a naive choice of points.
78 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
FFT-based Multiplication: Summary
FFT works with 2k points — Increases work by up to 2x.
Product of two nth-degree polynomial has degree 2nWe need to work with 2n points, i.e., 4x increase in time.
Requires inverting to coecient representation after multiplication:−→Sn = Mn(ω)
−→An
M−1n (ω)−→S n = M−1n (ω)Mn(ω)
−→An =
−→An
It is easy to show that M−1n (ω) = Mn(−ω)/n, and hence:−→An ∗−→Bn = FFT (FFT (
−→A2n, ω) · FFT (
−→B2n, ω), ω−1)/n
We are back to the convolution theorem!
79 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
More careful analysis ...
Computations on complex or real numbers can lose precision.
For integer operations, we should work in some other ring — usually,
we choose a ring based on modulo arithmetic.Ex: in mod 33 arithmetic, 2 is the 10th root of 1, i.e., 210 ≡ 1 mod 33
More generally, 2 is the nth root of unity modulo (2n/2 + 1)
Point-wise additions and multiplications are not O(1).
We are adding up to n numbers (“digits”) — we need Ω(log n) bits
So, total cost increases by at least log n, i.e., O(n log2 n).
[Schonhage-Strassen ’71] developed O(n log n log log n) algorithm:
recursively apply their technique for “inner” operations.
80 / 81
Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication
Integer Multiplication Summary
Algorithms implemented in libraries for arbitrary precision
arithmetic, with applications in public key cryptography, computer
algebra systems, etc.
GNU MP is a popular library, uses various algorithms based on
input size: naive, Karatsuba, Toom-3, Toom-4, or
Schonhage-Strassen (at about 50K digits).
Karatsuba is Toom-2. Toom-N is based on
Evaluating a polynomial at 2N points,
performing point-wise multiplication, and
interpolating to get back the polynomial, while
minimizing the operations needed for interpolation
81 / 81