matrix algorithms

Matrix AlgorithmsLecture Notes - Student Version∗

Kyle Burke

January 7, 2018

Contents

0 Introduction 30.1 Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.2 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40.3 Under Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1 Time Complexity Basics 51.1 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Running Time of Basic Vector/Matrix Operations . . . . . . . . . . . . . . . 10

2 Row Reduction 132.1 RREF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Running Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 RREF to Solve Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Matrix Algebra 253.1 Matrix-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Matrix-Matrix Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.4 More-Than-Two Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.5 Powers of A: At . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Linear Combinations and Redundant Vectors 354.1 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Redundant Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

∗Created with lectureNotes.sty, which is available at: http://turing.plymouth.edu/~kgb1013/

lectureNotesLatexStyle.php (or, GitHub: https://github.com/paithan/LaTeX-LectureNotes). Manyor most of the answers to questions are hidden so that some of class will still be a challenge for students.

1

http://turing.plymouth.edu/~kgb1013/lectureNotesLatexStyle.php

http://turing.plymouth.edu/~kgb1013/lectureNotesLatexStyle.php

https://github.com/paithan/LaTeX-LectureNotes

CONTENTS CONTENTS

5 Kernels & Complete Solutions 385.1 Systems with Many Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2 Finding Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 Finding Complete Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6 EigenStuff 516.1 Eigenvectors, Eigenvalues, Eigenspaces, and Eigenbases . . . . . . . . . . . . 516.2 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.3 Finding Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.4 Finding Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.5 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Appendix A R Tutorial 71A.1 R Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71A.2 Matrices and Vectors in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Appendix B Strassen’s Algorithm and Faster Matrix Multiplication 74B.1 Basic Multiplication Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 74B.2 Multiplying Block Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 74B.3 Strassen’s Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.4 Fast Multiplication Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Appendix C Faster Single-Solution Linear System Solving 80C.1 Basic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80C.2 LU -decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82C.3 LU -decomposition time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Appendix D Orthogonality 88D.1 Orthogonal Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88D.2 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90D.3 Gram-Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104D.4 Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Appendix E Singular-Value Decomposition 110

Appendix F Answers to Exercises 110

© 2018 Kyle Burke

0 INTRODUCTION

0 Introduction

Welcome to Matrix Algorithms!

0.1 Plan

0.1.1 Pre/Co-Requirements

The plan for this course is to review the operations covered in your linear algebra class inorder to discuss their running times.

This class assumes:

• You have taken or are currently taking a college-level Linear Algebra course.

• You have seen big-O or big-Θ notation before.

0.1.2 Sample Schedule

When I teach this course, we meet for 50 minutes each week and I assign one homeworkproblem due at the next class period. Here’s an example starting from a basic linear algebraschedule. (The linear algebra topic should be mostly covered first. Shift the topics if yourclass meetings are early in the week.)

I usually ask students to look over Section 1 before the first class meeting and ask whetherthey have any questions about the exercises in Section 1.1.4.

Week Linear Algebra Topic Sections Here Homework1 Linear Equations Section 2 Exercise 3 from 2.22 Linear Transformations Sections 3.1, 3.2 Exercise 1 from 3.23 Matrix Products Sections 3.3, 3.4 Exercise 2 from 3.44 Lin. Independence & Combinations Sections 4 Exercise 1 from 4.25 Kernels and Images Section 5 Exercise 1 from 5.36 Subspaces Section A † Exercise 1 from A.27 Linear Spaces Section B † Exercise 4 from B.38 Orthogonality Sections D.1, D.2 Exercise 1 in D.2.39 Gram-Schmidt Sections D.3, D.4 Exercise 1 in D.410 Determinants Section C † Exercise 1 in C.311 Eigenvalues Sections 6.1, 6.2 Exercise 2 in 6.2.312 Finding Eigenspaces Sections 6.3, 6.4 Exercise 1 in 6.313 Diagonalization Incomplete None

†These topics are unrelated to the linear algebra topic and can be used for alternatetopics or to catch up on missed material.

It doesn’t seem to be a problem if you fall even a week behind; the reinforcement of oldermaterial is still very helpful.

© 2018 Kyle Burke

0.2 Acknowledgments 0 INTRODUCTION

0.2 Acknowledgments

There are lots of people I owe thanks to for helping me learn this material and help me getit ready for these notes:

• Special thanks to Plymouth State student Courtney Barker for working with me duringthe spring 2015 semester to revise and improve the student version of these notes. Shealerted me to confusing sections and non-obvious questions. This helped me to breakthings down into finer questions that could be answered in part.

• Many thanks to the Wittenberg students of Comp 380, Matrix Algorithms, in the fallsemesters of 2011 and 2012. These students survived the first two years of me teachingthis material as a supplemental 1-credit course. Their suggestions shaped the firstversion of these notes.

• This would not have been possible without Shang-Hua Teng, my Ph.D. advisor. In thespring of 2007 at Boston University, I was a TA for his Geometric Algorithms (CS 232)course, where I helped students apply their CS skills to linear algebra. The followingtwo years, Shang-Hua let me teach the course on my own. These notes began with thegoal of covering the aspects of that course missed by a math-side linear algebra class.(Many extra things have since been added.) I continue to be grateful for Shang-Hua’sunwavering confidence in me; his Geometric Algorithms course is the inspiration forthese notes.

• Extreme thanks to Otto Bretscher, who helped me fall in love twice: once with linearalgebra and once with teaching. In the spring of my first year at Colby College, Itook Otto’s linear algebra course and immediately enjoyed finding kernels and linearcombinations. I spent five semesters grading and running review sessions for the samecourse, where everything really clicked. Otto kept showing off cool ways linear trans-formations and vector spaces applied to the new courses I took. More importantly, heencouraged my teaching and later even asked for advice. Before long I was on my wayto grad school so I could teach college myself someday. I am mentioned in his linearalgebra text[1], an honor I treasure.

0.3 Under Construction

There are still a bunch of things I’m working on. On top of all the various “ToDos” through-out these notes, there are a bunch of more general things I’m still working on:

• I still need to format exercises with the new tools I created.

• I’d like to add lower bounds everywhere. I’m not really doing anything special withthem; I’m just using the input and output sizes. Still, I’d like to bring it up everywhere.This topic is a good example of the differences between O, Θ, and Ω.

• I still need to finish sections about Gram-Schmidt, SVD, and Diagonalization. It’stough, just because I don’t really get to talk about these things in class.

© 2018 Kyle Burke

1 TIME COMPLEXITY BASICS

• More consistent “Pre-reqs” for each section. What do students need to know, bothfrom these notes and from a cooperating linear algebra class, before each topic.

There are probably also other things I’m not thinking of right now.

1 Time Complexity Basics

We are interested in running time of algorithms in terms of the number of instructions (steps)that are executed.

1.1 Asymptotic Analysis

Perhaps we have an algorithm, H, which solves a problem (parameterized by n) in T (n) =2n2 + 89194n+ 8 steps. We want to express this running time using asymptotic notation.

1.1.1 Different Forms: O, Ω, Θ, and More

Different notation options:

• O(f(n)): set of all functions, g(n) where g ≤ f . Finding an algorithm shows problemin big-O of algorithm’s running time.

• Ω(f(n)): set of all g(n) where g ≥ f . Lower-bound for problem solution speed.

• Θ(f(n)): set of all g(n) where g = f . An algorithm and lower bound agree!

• o(f(n)): set of all g(n) where g < f . Won’t come up in this class.

An example of the differences between O(n2), Ω(n2), Θ(n2), and o(n2) are shown inFigure 1.

Thus:

• n ∈ O(n), O(n log(n)), O(n2), . . .

• n /∈ O(log(n)), O(1)

• n2 ∈ Ω(n2),Ω(n log(n)),Ω(n),Ω(√n),Ω(1)

• n2 /∈ Ω(n2 log(n)),Ω(n3), . . .

• n3 ∈ Θ(n3)

• n3 /∈ Θ(n2),Θ(n4)

© 2018 Kyle Burke

1.1 Asymptotic Analysis 1 TIME COMPLEXITY BASICS

Θ(n2)

Θ(n3)48n3+6n2+300n+4

.02n3n3+n2+n+1

7 n3+12n2+3√n

16n2

.04n2+37 log (n)

4 n2+4n+1

7n2+12n+3√n+201

Ω(n2)3n2 log (n)+12n2

7n5

O(n2)

Θ(n)3n+4 log (n)+72

4.5n1026n

17n+9656

.125n−5

n2.5−12n

12n9

o(n2) 4.5n1.9

⋮

Figure 1: The relationship between some different asymptotic notations. Notice that Θ(n2) =O(n2) ∩ Ω(n2).

1.1.2 Simplification

Q:We obviously need to do something if we have multiple terms of non-1coefficients. Let’s compare two functions; which is ”bigger”: 2n2 or89194n? (Which one ”dominates” the other?)

A:

We will discuss ”bigger” (>) and ”smaller” (<) in this way, usually by comparing the

© 2018 Kyle Burke


exponents of the terms.Asymptotic notation (extreme simplification):

• Write out time complexity as sum of terms.

• Drop term coefficients.

• Drop dominated terms.

Our algorithm, H, ”takes quadratic time”, because the biggest term is n2.

1.1.3 Examples

Let’s do some quick examples with multiple variables... yikes!What is the simplest version of T (n,m) = 18n2m+ 12n2m2 + 5nm2 + 100?

• Remove coefficients. Left with: n2m+ n2m2 + nm2 + 1

• Drop smaller terms. Left with: n2m2

T (n,m) ∈ Θ(n2m2)What if we change the exponents in just one term? Consider: T (n,m) = 18n2m+12nm+

5nm2 + 100?

• Remove coefficients. Left with: n2m+ nm+ nm2 + 1

• Drop smaller terms. Left with: n2m+ nm2

T (n,m) ∈ Θ(n2m+ nm2)Neither term dominates the other, so both remain in the formula.

1.1.4 Reason for non-Θ

Sometimes we want to talk about the time complexity for a problem instead of an algorithm.For a problem, the time complexity is the time needed by the best (optimal) algorithm thatsolves the problem. Unfortunately, we don’t always know the best algorithm. We can use Oand Ω to describe the problem’s time complexity.

Q:Let R be a problem and A and algorithm that solves the problem. IfI know that A runs in 3n3 + 40n2 steps, how can I describe the fastestalgorithm that solves R?

A:

© 2018 Kyle Burke


Q: So, does finding an algorithm give us a lower bound or an upper boundon the time complexity?

A:

Q: What if it’s been proven that no algorithm can take less than n2 steps?What else can we say about R?

A:

Q: What do we call this?

A:

Q: So what do we know about R total?

A:

Q: When can we use Θ to describe the problem’s complexity?

A:

That’s because for any function f , Θ(f) = O(f) ∩ Ω(f)

Exercises for 1.1

Exercise 1

For each formula, determine three things:

© 2018 Kyle Burke


• What is the simplified big-Θ notation?

• Is it inside O(n2)?

• Is it inside Ω(n2)?

1. 25n+ 30n+ 17

2. 31n+ .2n3 + 10000

3. 54n2

(Answer 1.1.1 in Appendix)

Exercise 2

For each set, S, in this list, which other sets is it equivalent to?

1. Θ(n2 + nm+ n+m)

2. Θ(n+m)

3. Θ(max(n,m))

4. Θ(n2 + 30n+ 5)

5. Θ(n2)

6. Θ(n(n+m))

Exercise 3

For each set, S, in this list, which other sets in the list is S a subset of?

1. Θ(n)

2. O(n)

3. Ω(n)

4. Θ(n2)

5. O(n2)

6. Ω(n2)

© 2018 Kyle Burke

1.2 Running Time of Basic Vector/Matrix Operations 1 TIME COMPLEXITY BASICS

1.2 Running Time of Basic Vector/Matrix Operations

Note: In previous years, I’ve covered this after we’ve talked about RREFing... in the future,I think it’s probably better to do this first.

An n-vector is a list of n numeric elements. In these notes, vectors are denoted withlittle hats, e.g. ~v. If ~v is an n-vector with Real entries, then ~v ∈ Rn. Usually write vectors

vertically, so

1410

∈ R3.

Q: What do you think

1410

+

−112−4

is? Hint: think easy.

A:

Q: In general, how long does it take to add two n-vectors?

A:

Q: Is there any way to do it in less time? Or is linear a lower bound?

A:

Q: So is this straight-forward addition algorithmically tight? (Meaning,there’s no much better way to do it?)

A:

© 2018 Kyle Burke


Q: So what is the overall time complexity of the vector addition problem?

A:

Multiplication of vectors is not as straightforward. The basic across-multiplying is notvery useful, as it turns out. We can multiply a vector by a ”scalar” (single value). (Scalarmultiplication)

Q: What do you think 4

1410

is?

A:

Q: How long does it take to find c · ~v, where ~v ∈ Rn? Can you describethis with Θ-notation?

A:

One notion of ”multiplication” is called the dot product. Example:

1410

·−1

12−4

=

(1 · −1) + (4 · 12) + 10 · −4) = −1 + 48− 40 = 7

Q: In general, how long does it take to find the dot product of 2 n-vectors?

A:

© 2018 Kyle Burke


Q: Does anyone know a little trick to tell whether two vectors are perpen-dicular?

A:

Q: How long does it take to determine whether two n-vectors are perpen-dicular?

A:

A matrix is like a grid or table of values. Example: a 2× 3 matrix A :

[−1 3 1012 2 4

]

Q: How do you think you add two matrices?

A:

Q: How long does it take to add two n× n matrices?

A:

Q: What about two n×m matrices?

A:

Exercises for 1.2

Exercise 1

© 2018 Kyle Burke

2 ROW REDUCTION

If ~x, ~y, and ~z are all n-vectors, how long does it take to find ~x + ~y + ~z? As with allquestions of this sort, find both a running time and a lower bound, if possible.

(Answer 1.2.1 in Appendix)

Exercise 2

Given ~x1, ~x2, · · · , ~xk ∈ Rn, how long does it take to find ~x1 + ~x2 + · · ·+ ~xk?

Exercise 3

A diagonal matrix is a square (n× n) matrix where only entries along the diagonal1 are

non-zero. For example,

1 0 00 3 00 0 8

is a diagonal matrix. For any two n×n diagonal matrices,

D1 and D2, how long does it take to determine the diagonal entries of D1 +D2?

Exercise 4

If you’ve solved the previous problem, how long does it take to write out the matrix sumD1 +D2?

Exercise 5

An upper triangular matrix is a matrix with all zeroes below the diagonal. For two n×nupper-triangular matrices U1 and U2, how long does it take to calculate U1 + U2?

Exercise 6

How long does it take to find the sum of k n × n upper-triangular matrices: U1 + U2 +· · ·+ Uk?

2 Row Reduction

2.1 RREF

⟨ Recall rref (Reduced Row Echelon Form; Gaussian Elimination) steps.(If another version of row reduction is being used in class, cover rref.) ⟩

Q: What is a leading one?

A:

1The diagonal extends from the upper-left corner to the lower-right corner.

© 2018 Kyle Burke

2.1 RREF 2 ROW REDUCTION

Q: When is an n×m matrix in RREF?

A:

Q: What is the time complexity needed to determine whetheran n×m-matrix, A is in RREF?

A:

© 2018 Kyle Burke

2.2 Running Time 2 ROW REDUCTION

Q: During row reduction, what are the steps to “fixing” thefirst column?

A:

2.2 Running Time

Q: How long does each of those steps take on an n×m matrixA?

A:

© 2018 Kyle Burke


Q: Total number of steps there?

A:

Q: How much does that process change for the other columns?

A:

Q: How does that change the running time in the average case?

A:

Q: So overall running time?

A:

© 2018 Kyle Burke


Q: Oh really? What is rref

[1 2 3 4 5 6 78 9 10 11 12 13 14

]?

A:

Q: How many columns did we have to “fix” there?

A:

Q: What if n < m? Does it still take Θ(nm2)?

A:

Q: Oh really? Consider:

1 2 3 4 50 0 0 0 06 7 8 9 1011 12 13 14 15

. What’s

min(n,m)?

A:

Q: How many columns do we fix when rrefing this matrix?

A:

© 2018 Kyle Burke


Q: Will there always be a leading one in each column/row?

A:

Q: What is the number of leading ones in a matrix?

A:

Q: Can we use the rank of A in our time complexity?

A:

Q: Why is the 1 there?

A:

© 2018 Kyle Burke


Q: What is our lower bound to find the rref of a matrix?

A:

Q: So what do we know about the actual best running time?

A:

Summary: We can find the rref of A ∈ Rn×m in O(nm(rank (A)+1)) time.

Exercises for 2.2

Exercise 1

In the worst case, how long does it take for our algorithm to calculaterref(A), where A ∈ R1×m?

Exercise 2

What is the time complexity to find rref(A) where A ∈ Rn×1?

Exercise 3

Consider the time complexity of calculating rref(A), where A is an n× ndiagonal matrix? (A diagonal matrix is a square matrix where only the entriesalong the diagonal can be non-zero.)

1. What’s the worst-case running time of the regular rref algorithm?

2. How would you change the algorithm if you knew all input matrices werediagonal?

3. What’s the running time of this new specific algorithm?

Exercise 4

What is the time complexity to calculate rref(A), where A is an upper-triangular n× n matrix? (This one is pretty hard.)

© 2018 Kyle Burke


Exercise 5

Let’s consider A, a kq × kp-matrix. (k, p, q > 0) where p < kq. AssumeB = rref(A) has leading ones exactly in columns 1, 1+k, 1+2k, . . . and all non-leading-one entries ”above and to the left” of any leading one must be zero.

For example, if A =

[3 0 −12 −213 0 −8 −13

], then rref(A) = B =

[1 0 0 10 0 1 2

], and

this fits the pattern for k = 2, p = 2, q = 1. Other examples:

• For k = 2, q = 4, p = 3: B =

1 0 0 0 0 30 0 1 0 0 40 0 0 0 1 50 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0

• For k = 3, q = 3, p = 2: B =

1 0 0 0 1 20 0 0 1 −3 −40 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0

1. What is rank(A) in terms of p, q, and k?

2. How long does our rref algorithm take to find B, given A?

3. What if we create a new algorithm, specialRref(A, k, p, q), to handlethese cases where we know A has this form ahead of time? Whichcolumns can we ignore when doing the arithmetic in row operations?

4. Using specialRref, how long does it take to perform the longest rowoperation?

5. In specialRref, how many rows need to have row operations every timea leading one is set?

6. In specialRref, how many columns need to be “fixed” to create theleading one?

© 2018 Kyle Burke

2.3 RREF to Solve Linear Systems 2 ROW REDUCTION

7. What is the running time of specialRref?

8. Does specialRref improve the running time? If so, how much?

Exercise 6

Warning: I don’t know the answer to this one!Consider A ∈ Rn×m where rref(A) = B and all leading 1 entries in B are

also 1’s in A.

1. Does the running time of our rref algorithm improve in this case?

2. Considering the same case as the last question, can we write a betteralgorithm to solve matrices in this case faster? rid of this one?

2.3 RREF to Solve Linear Systems

Q:Given A ∈ Rn×m and ~b ∈ Rn, how do you find ~x ∈ Rm sothat A~x = ~b? Hint: use something your linear algebra textmight call an “augmented matrix”. Matrix with column ofdots separating the actual columns.

A:

Q: Do we continue to fix the right-most column if there are notn leading ones found in the first m columns?

A:

© 2018 Kyle Burke


Q: How long does this take?

A:

Q: What’s a lower bound for the time it takes to solve for ~x?

A:

Q: What if we need to solve the k equations: A~x1 = ~b1, A~x2 =~b1, . . . , A ~xk = ~bk? What’s the way we know to do this?

A:

Q: Can we do better?

A:

Consider solving A~x = ~bi where A =

[1 23 4

], ~b1 =

[56

], ~b2 =

[12

]Let’s first

solve for ~x1: rref

[1 2

... 5

3 4... 6

]

⟨ Do out the steps to rref this. Should get

[1 0

... −4

0 1... 4.5

]⟩

Now let’s solve for ~x2.

© 2018 Kyle Burke


⟨ Start out the steps to rref it. Students may stop you part way. If you

do finish it, you’ll get to:

[1 0

... 0

0 1... .5

]⟩

Q: What’s the same about these two?

A:

Q: So how can we speed things up?

A:

Q: What if we just do this: rref

[1 2

... 5 1

3 4... 6 2

]? What happens

here?

A:

Q:This is slightly better, since we don’t have to memorize allthe rref steps. In general, how do we set up the matrix tofind all of ~x1, . . . , ~xk

A:

© 2018 Kyle Burke


Q: What are the dimensions of this super-augmented matrix?

A:

Q: How long does this rref take?

A:

Q: What’s a lower-bound for finding these?

A:

Q: Which is faster? This or doing the original single-augmented-column method k times?

A:

There’s lots more we can do with solving linear systems. For example...

• If there’s a single solution to the system, what can we do to improve ourrunning time. It turns out that there’s a neat way to drop the coefficientof the n3 term! These notes go into details in Section C.

• If there are many solutions to the system, how quickly can we find anexpression for all of them? This is covered in more detail in Section 5.

• If the system is inconsistent, we can use an orthogonal projection to findthe closest approximation. This is covered in Section D.4

© 2018 Kyle Burke

3 MATRIX ALGEBRA

Exercises for 2.3

Exercise 1

For A ∈ Rn×m and ~b ∈ Rn, consider solving for ~xi in the k systems:A~xi = ci · ~b where c 6= 0. (Assume the system A~x = ~b has at least onesolution.)

1. How long does this take if we use the extra-augmented matrix?

2. Instead, solve for just ~x1. How can we find ~x2 in Θ(n) time? (Hint: howcan we write ~x2 in terms of just c and ~x1?)

3. Using this method, how long does it take to solve all k systems?

3 Matrix Algebra

3.1 Matrix-Vector Products

Q:How long does it take to multiply an n × n matrix, A, byan n-vector, ~v? (Using the algorithm you learned in linearalgebra.)

A:

Q: Could we do better or is solving this problem in Θ(n2)?

A:

© 2018 Kyle Burke

3.1 Matrix-Vector Products 3 MATRIX ALGEBRA

Q: Why not?

A:

Q: What if A is an n×m matrix and ~v is an m-vector? What’san upper bound for this problem?

A:

Q: Is this tight?

A:

Exercises for 3.1

Exercise 1

For A ∈ Rn×m and ~v1, ~v2, . . . , ~vk all in Rm, consider finding the sum ofproducts ~x = A~v1 + · · ·+ A~vk.

1. How long does this take to calculate ~x as laid out here?

2. How can we calculate ~x by multiplying only one matrix by one vector?

3. How long does it take to calculate ~x this new way?

4. Which way is faster? By how much?

© 2018 Kyle Burke

3.2 Matrix Inversion 3 MATRIX ALGEBRA

3.2 Matrix Inversion

Q: How long does it take to invert an invertible n× n matrixA?

A:

Q: How long does it take to determine whether an n×n matrixis invertible?

A:

Q: Lower bound?

A:

Exercises for 3.2

Exercise 1

For an invertible n×n matrix, A, and n-vector, ~b, in which of the followingsituations is it faster to find ~x?

• A~b = ~x

• A~x = ~b

© 2018 Kyle Burke

3.3 Matrix-Matrix Products 3 MATRIX ALGEBRA

3.3 Matrix-Matrix Products

Q: How long does it take to multiply two n× n matrices?

A:

Q: Why is this not terrible?

A:

Q: What about multiplying an m × n matrix with an n ×mmatrix?

A:

Q: What about multiplying an p × n matrix with an n × m

matrix?

A:

For more on faster algorithms for Matrix Multiplication, check out Sec-tion B on Strassen’s Algorithm!

Exercises for 3.3

Exercise 1

How long does it take (using the algorithm from this section) to find ABwhere A is an n× n matrix and B is n× n2?

© 2018 Kyle Burke

3.4 More-Than-Two Products 3 MATRIX ALGEBRA

Exercise 2

How long does it take to find AB where A is an n2 × n matrix and B isn× n2?

Exercise 3

How long does it take to find AB where A is an n × n2 matrix and B isn2 × n?

Exercise 4

Using two multiplications, how long does it take to find ABC where A isan n× n2 matrix, B is n2 × n2, and C is n2 × n?

3.4 More-Than-Two Products

Q: What about finding the result of AB~v where A is p×n andB is n×m?

A:

© 2018 Kyle Burke


Q: What if there are k m-vectors ~v1, · · · , ~vk?

A:

Q: What about multiplying three n× n matrices?

A:

Q: What about multiplying three matrices: q × p, p × n andn×m?

A:

Q: Can we do better?

A:

© 2018 Kyle Burke


Q:

Compute the following product:1301

[2 4 8 0 6]

1 23 45 67 89 10

A:

Q:

Would it be better if I did this?1301

[2 4 8 0 6

]

1 23 45 67 89 10

A:


A:

© 2018 Kyle Burke


Q: So what is the overall running time?

A:

Q: How do you choose which to do first?

A:

Notice, this last one is the same as AB [~v0, · · · , ~vk].

Exercises for 3.4

Exercise 1

How long does it take to multiply AB~x when A ∈ Rm×n, B ∈ Rn×m, and~x ∈ Rm?

Exercise 2

Consider finding the product ABCDE with the following matrix dimen-sions:

• A is k × j

• B is j × 1

• C is 1× i

• D is i× 1

• E is 1× h

1. What is the order you should multiply the matrices in? (Hint: it dependson two of the variables.)

2. What is the running time to perform the multiplication?

© 2018 Kyle Burke

3.5 Powers of A: At 3 MATRIX ALGEBRA

3.5 Powers of A: At

Q:How long does it take to take a single number, say c, andraise it to the power t? (Meaning, how long does it take tofind ct?) Hint: there’s a way to do it faster than in t − 1multiplications.

A:

Q: How do we do it? Hint: 543 = 532 × 58 × 52 × 51

A:

© 2018 Kyle Burke

3.5 Powers of A: At 3 MATRIX ALGEBRA

Q:

What if we have a square A ∈ Rn×n and want to calculateAt~v? There are two ways we can do this:

• Precalculate At (similar to ct above) then multiply thatmatrix by ~v.

• Calculate the vector A~v, then multiply A by that, thenmultiply A by that, . . ., until you’ve multiplied by A ttimes. Notice that in this second method, each multi-plication is easier: Θ(n2) instead of Θ(n3).

How long does the first of the two take?

A:

Q: How long to do it the second way?

A:

Q: Which is faster?

A:

© 2018 Kyle Burke

4 LINEAR COMBINATIONS AND REDUNDANT VECTORS

Q:Now consider the problem were we need to findAt~v1, A

t~v2, . . . , At ~vk. How long does the precalculate

method take to find all k vectors?

A:

Q: How long does it take to find all k vectors using the brute-force method?

A:

Q: Which is faster?

A:

4 Linear Combinations and Redundant Vectors

4.1 Linear Combinations

⟨ Be certain everyone is comfortable with these terms:

• Linear Combinations

• Span

• Linear Independence

© 2018 Kyle Burke

4.2 Redundant Vectors 4 LINEAR COMBINATIONS AND REDUNDANT VECTORS

⟩

Q: How long does it take to determine whether ~b ∈ Rn is alinear combination of ~v1, ~v2, . . . , ~vk?

A:

Q: How can we determine whether a list of vectors is linearlyindependent?

A:

Q: If I have k vectors in Rn, how long does that take?

A:

Thinking about lists of vectors, there’s another way to characterize inde-pendence: redundant vectors.

4.2 Redundant Vectors

Def: redundant For list ~v1, . . . , ~vk ∈ Rn, ~vi is redundant if ∃ a linear combi-nation of the previous vectors, ~v1, . . . , ~vi−1, equal to vi.

Note: the zero vector, ~0 is always redundant.

© 2018 Kyle Burke

4.2 Redundant Vectors 4 LINEAR COMBINATIONS AND REDUNDANT VECTORS

Q:What are the redundant vectors in the following list?0

00

,1

01

,−1

10

,3

25

,1

00

A:

Q: Why?

A:

Q: How do we find the redundant ones in general?

A:

Q:

What about this list? Which are redundantand what are the linear combinations for each?0

00

,1

01

,−1

10

,2

24

,2

02

,0

21

,1

11

,

A:

© 2018 Kyle Burke

5 KERNELS & COMPLETE SOLUTIONS

Q: How long (in asymptotic notation) does it take to find theredundant vectors in a list ~v1, . . . , ~vk each in Rn?

A:

Exercises for 4.2

Exercise 1

If you have a list of vectors ~w1, . . . , ~wp in Rq, and only one of them isnon-zero (not equal to ~0):

1. How long does it take our algorithm to determine which are redundant?

2. If you know this property ahead of time and design an algorithm specif-ically for it, how long does it take to find the non-redundant vector?

5 Kernels & Complete Solutions

5.1 Systems with Many Solutions

Q: What is the set of solutions for:

1 2 34 5 67 8 10

~x =

147

?

A:

© 2018 Kyle Burke

5.1 Systems with Many Solutions 5 KERNELS & COMPLETE SOLUTIONS

Q: What about

1 2 34 5 67 8 9

~x =

147

?

A:

Q: How can I do that so quickly?

A:

Def: kernel The kernel of matrix A, ker (A), is the set of all vectors ~vsuch that A~v = ~0.

Q: Why is this helpful?

A:

Q: How does that work?

A:

© 2018 Kyle Burke

5.2 Finding Kernels 5 KERNELS & COMPLETE SOLUTIONS

Exercises for 5.1

Exercise 1

Let A ∈ Rn×m and ~b ∈ Rn. Then, if ~x1, ~x2 ∈ Rm where ~x1 6= ~x2 andA~x1 = ~b = A~x2, there’s another way to find more vectors that solve A~x = ~b.

1. Find the difference between the the two solution vectors: ~d = ~x1 − ~x2.What is true of ~d? (Hint: what is A~d?)

2. What is A( ~x1 + c · ~d)?

3. Given just A, ~x1 and ~x2, how long does it take to output k other vectors,~x3, · · · , ~xk, ~xk+1, ~xk+2, each that are solutions to A~x = ~b?

5.2 Finding Kernels

Q: What is the relationship between ker (A) and ker (rref(A))?

A:

Reminder: the same is not true of the image of a matrix!

Q: How can we use this? Hint: use the redundant columns.

A:

This is not immediately obvious! Let’s look at an example:

1 2 34 5 67 8 9

rref︷︸︸︷→1 0 −1

0 1 20 0 0

.

© 2018 Kyle Burke


Q: Which column is redundant?

A:

Q: Show the coefficients that show the third is a linear combi-nation of the first two.

A:

Now move everything to the left in that equation, so we have: 1~v1− 2~v2 +

~v3 = ~0, or

1 0 −10 1 20 0 0

1−21

=

000

.

The same is true of the non-rref’ed matrix, so:

1 2 34 5 67 8 9

1−21

= ~0.

Thus, the kernel of

1 2 34 5 67 8 9

is span

1−21

.

Q: How do we describe all the solutions to

1 2 34 5 67 8 9

~x =

147

?

A:

Def: Complete Solution The complete solution to a linear equation A~x = ~bis the set of all vectors ~x that satisfy the equation. It is often described in

© 2018 Kyle Burke


one of the following ways:

• ~x0 + span (~v1, ~v2, . . . , ~vk)

• ~x0 + c1~v1 + c2~v2 + · · ·+ ck ~vk

where A~x0 = ~b, ~v1, ~v2, . . . , ~vk is a basis of ker (A).Let’s pay attention to the ”finding the kernel” step.

Q: What is the kernel of

1 0 0 c1 0 0 d10 1 0 c2 0 0 d20 0 1 c3 0 0 d30 0 0 0 1 0 d40 0 0 0 0 1 d5

?

A:

© 2018 Kyle Burke


Q: What about ker

1 4 0 0 2 3 0 10 0 1 0 0 2 0 20 0 0 1 1 5 0 30 0 0 0 0 0 1 4

?

A:

Q: How do we quickly create a list of the indices of the redun-dant columns?

A:

© 2018 Kyle Burke


Q: What’s special about the 1?

A:

Q: Why will you always encounter a 0 or a 1?

A:

Q: How long does that take?

A:

The nullity of a matrix A, nullity (A), is the dimension of ker (A). TheFundamental Theorem of Linear Algebra2 says: m = rank (A) + nullity (A).

Q: How many redundant columns are in rref (A)?

A:2Sometimes referred to as the Rank-Nullity Theorem.

© 2018 Kyle Burke


Q: How long does it take to fill in a single (basis) vector forA’s kernel?

A:

Q: How do we know which entry goes where?

A:We can build a Pivots (Leading Ones) Table.Each row has the column that contains the ith

leading one.

Q:

What does this mean? What’s the Pivots Table for the

matrix from before: ker

1 4 0 0 2 3 0 10 0 1 0 0 2 0 20 0 0 1 1 5 0 30 0 0 0 0 0 1 4

?

A:

pivot column

1 12 33 44 7

© 2018 Kyle Burke


Q:

How do we create a kernel vector using this? E.g., how do

we find the kernel vector for the 6th column,

3250

?

A:

Q: How long does it take to build a single kernel vector usingthe lookup table and a column of RREF’ed A?

A: Θ(m)

Q: Why?

A:

Q: But what if n is significantly bigger than m?

A:

© 2018 Kyle Burke


Q: What is our algorithm take to find a basis for the kernel ofan n×m matrix A?

A:


A:

© 2018 Kyle Burke


Q: Can we use the FTLA (Fundamental Theorem) to simplifyour running time a bit?

A:nullity (A) = m − rank (A), so we can thenrewrite our running time as:Θ(m× (n · rank (A) + n+m− rank (A)))

Q: Is there a lower bound for this problem?

A:

Q: Is our algorithm tight?

A: Let’s consider the two boundary cases: the rankis small and the rank is large.

Q: What happens when the rank is small (say a constant)?

A:

© 2018 Kyle Burke


Q: What about when the nullity is small?

A:

Exercises for 5.2

Exercise 1

If we have an n×m matrix A that has only one non-zero entry, we mightbe able to speed up our algorithm to find the kernel of A. What is theasymptotic time complexity to find the kernel of A if:

1. What is the rank of A?

2. How long does it take to run the algorithm described in the section?

3. Create your own algorithm specifically for matrices that have only onenon-zero entry.

4. How long does your algorithm take to run?

5. Is it possible for the time complexity to be any better? Hint: how longdoes it take to find that non-zero entry?

© 2018 Kyle Burke

5.3 Finding Complete Solutions 5 KERNELS & COMPLETE SOLUTIONS

5.3 Finding Complete Solutions

Q:What are the complete solutions to

0 0 0 00 0 1 −70 1 0 2

~x = 01010

?

A:

Q: How long does our algorithm take to find the complete so-lutions to A~x = ~b where A is n×m?

A:

Q: To do this, we’re rref-ing A twice. Is that necessary? Doesnot doing that improve our time complexity?

A:

Exercises for 5.3

Exercise 1

© 2018 Kyle Burke

6 EIGENSTUFF

Suppose you have an n ×m matrix A and k vectors ~b1, . . . , ~bk. We wantto find the complete solutions to, A~x1 = ~b1, A~x2 = ~b2, . . . , A ~xk = ~bk.

1. How long does it take to apply the algorithm we already have k times?

2. Design a different algorithm that solves this in Θ(n(m + k)(rank (A) +1) +m(nullity (A) + k)).

6 EigenStuff

Let’s use Eigen “stuff” to make things faster!

6.1 Eigenvectors, Eigenvalues, Eigenspaces, and Eigenbases

⟨ Remind about Eigenvectors and Eigenvalues. ⟩Def: Eigenspace Given a matrix and an eigenvalue, λ, of that matrix, the

eigenspace associated with λ, Eλ, is the set of all eigenvectors with eigenvalueλ.

Q: What is the intersection of two different eigenspaces of thesame matrix?

A:

Q: What are the eigenspaces of A =

1 2 30 4 50 0 6

?

A:

© 2018 Kyle Burke

6.1 Eigenvectors, Eigenvalues, Eigenspaces, and Eigenbases 6 EIGENSTUFF

Q: What are the eigenspaces of A =

1 1 10 0 10 0 1

?

A:

Def: Eigenbasis An eigenbasis of a matrix A ∈ (Rn)2 is a list of n linearlyindependent eigenvectors of A.

Q: Does

1 2 30 4 50 0 6

have an eigenbasis?

A:

Q: Does

1 1 10 0 10 0 1

have an eigenbasis?

A:

Q: When does a matrix A have an eigenbasis?

A:

© 2018 Kyle Burke

6.2 Dynamical Systems 6 EIGENSTUFF

6.2 Dynamical Systems

In this section, we consider the problem of computing At~x, given A ∈ Rn×n

and ~x ∈ Rn.

Q: What’s our lower bound for solving this problem?

A:

6.2.1 Non-Eigen-stuff Solutions

Q:How long does it take to brute-force calculate them? i.e.

(

n︷︸︸︷A(A(· · · (A~x)))

A:

© 2018 Kyle Burke


Q: There’s another way to do this by pre-computing At. Howquickly can we calculate At? Hint: Section 3.5.

A:

Q: After we calculate At, how long does it take to calculateAt~x?

A:

Q: So how long in total for the precalculating-method?

A:

Q: Is either the Brute-force or the precalcing method strictlybetter? (Meaning independent of the values of n and t?)

A:

© 2018 Kyle Burke


Q: How do we determine which algorithm to use?

A:

Q: Okay, so what’s our total running time so far?

A:

So far, we have an algorithm to find At~x in O(min(n2t, nω log(t))) time.

6.2.2 Using Eigen-stuff

Q: Can we do this faster if ~x is an eigenvector of A?

A:

Q: Why? What is At~x equal to?

A:

Q: How long does that take to find?

A:

© 2018 Kyle Burke


Q:What if instead ~x = c1~g1 + c2~g2 and ~g1 and ~g2 are eigenvec-tors with associated eigenvalues λ1, λ2, respectively? Howcan I rewrite At~x?

A:

Q: How long can we calculate the vector?

A:

Q:Consider the case where ~x is not a linear combination ofjust two eigenvectors. Assume that A has an eigenbasis.Why might we want to know that eigenbasis?

A:

© 2018 Kyle Burke


Q: How can I do this differently if I know that ~x is a linearcombination of eigenvectors: ~x = c1~g1 + c2~g2 + · · ·+ cn ~gn?

A:

Q: How long does it take to calculate a single term ciλti~gi?

A:

Q: So how long to find them all and sum it all up?

A:

© 2018 Kyle Burke


Q: Is this better than what we had before?

A:

Q: What’s the downside?

A:

Q: When do we often do repeated matrix multiplication likethis?

A:

Could be population models over time, etc. Usually: ~x(t) is some stateof a system at step t. If A models the effect of time by incrementing thestep, then A ~x(t) = ~x(t+ 1) and ~x(t) = At ~x(0). ( ~x(0) is often the “initialsituation”.) Thus, we can use eigenstuff to speed up the simulations!

Q: What if we want to see the result at step t from k differentinitial states ~x1(0), . . . , ~xk(0)?

A:

© 2018 Kyle Burke


Q:What if we know the eigenbasis, ~g1, . . . , ~gn, but not thecoefficients c1, . . . , cn to form the linear combination equalto ~x. What do we have to do to find the coefficients?

A:

Q: How long to solve that?

A:

Q: Now, assuming we use the eigenvector method, how long tofind At~x?

A:

Q: Is this better than our previous algorithms?

A:

© 2018 Kyle Burke


Q: What’s our overall running time now?

A:

Q: Why is it reasonable to consider such high values of t?

A:

Perhaps the eigenvector approach helps speed things up if we want to findAt~x for multiple vectors ~x...

6.2.3 Multiple Initial Vectors

Let’s consider finding At~xi for k vectors ~x1, . . . , ~xk.

Q: How long does it take to brute-force calculate all k vectorsAt~xi?

A:

© 2018 Kyle Burke


Q: Consider the pre-calculating At method. How long does ittake extrapolating this to k vectors?

A:


A:

Q: Under which circumstances will we use the brute force al-gorithm?

A:

Q: Can we use fast matrix multiplication to do this even faster?

A:

© 2018 Kyle Burke


Q: At and X are only the same size if k = n. What do we haveto do?

A:

Q: How long does it take to multiply (At)′X ′?

A:

Q: So what’s the running time of this new algorithm?

A:

Q: Is this always faster than the other precalc algorithm?

A:

© 2018 Kyle Burke



A:

Q: What about applying the eigenvector method k times?

A:

Q:Do we have to do all the steps k times? Which parts can wecalculate just once? Hint: the resulting algorithm shouldnow take O(kn3 + n log(t))

A:

Q:The actual algorithm we use will choose the shortest of thethree we’ve found so far. What’s the running time of thisalgorithm?

A:

© 2018 Kyle Burke


Q: Let’s take another look at the eigenvector approach. Breakthis down. Where’s the bottleneck?

A:

Q: Is there any way we can speed up finding of these coeffi-cients?

A:

Let G = [~g1~g2 · · · ~gn], then rref([G

... ~x1 ~x2 · · · ~xk])→[In

...~c1~c2 · · · ~ck], where

G~ci = ~xi, or ~xi = ci,1~g1 + ci,2~g2 + · · ·+ ci,n ~gn.

Q: How long does it take to rref this matrix?

A:

© 2018 Kyle Burke


Q: How long does this version of an eigenvector method take?

A:

Q: How does this compare to the other eigenvectors method?

A:


A:

© 2018 Kyle Burke


Q:Is there a way we could use matrix multiplication to calcu-late the final solution vectors? Hint: let ~hi = λti~gi. TODO:break this up into multiple questions.

A:

Q: How long does this take? Does this speed up our time atall?

A:

© 2018 Kyle Burke

6.3 Finding Eigenspaces 6 EIGENSTUFF

Summary: if we are given an eigenbasis ~g1, . . . , ~gn and respective eigenval-ues λ1, . . . λn of A, then we can find At ~x1, A

t ~x2, . . . , At ~xn in

O(min(kn2t, nω log(t) + kn2, nω log t+ kω, n3 + kn2 + n log(t)))

Exercises for 6.2

Exercise 1

Given A ∈ Rn×n, ~x1, . . . , ~xn ∈ Rn, how long does it take to calculate all nof the vectors An~xi? Which of the subalgorithms will be chosen?

Exercise 2

Given A ∈ Rn×n, ~x1, . . . , ~xn ∈ Rn, how long does it take to calculate all n

of the vectors A2n2

~xi? Which of the subalgorithms will be chosen?

6.3 Finding Eigenspaces

Q: What if we don’t know what the eigenbases are, just theeigenvalues: λ1, . . . , λd?

A:

Q: How do we find one of the eigenspaces?

A:

© 2018 Kyle Burke


Q: How long does this take? (Just finding the one eigenspace.)

A:

Q: How long to do this for all of the eigenvalues: λ1, . . . , λd?

A:

Def: Geometric Multiplicity The dimension of Eλi is the geometric mul-tiplicity of λi, abbreviated GM(λi).

Q: How does the rank(Bi) relate to GM(λi)?

A:

© 2018 Kyle Burke


Q: What do we know about the sum of all geometric multi-plicities?

A:

Q: Can we use this to simplify our sum?

A:

Q: How?

A:

TODO: a bunch more comes here. Still needs to be added!

© 2018 Kyle Burke

6.4 Finding Eigenvalues 6 EIGENSTUFF

Exercises for 6.3

Exercise 1

Given an 2n × 2n matrix A with eigenvalue λ, how long does it take tofind a basis of Eλ if dim(Eλ) = n?

6.4 Finding Eigenvalues

Q: What if we just have A, but don’t know any of the eigenstuffof A? What do we need to find first?

A:

Q: How do we do that in a linear algebra class?

A:

Q: Why is this not very helpful? Hint: quintic functions3

A:

Instead we can use iterative eigenvalue solvers that approximate the eigen-values. Similar to Newton’s Method to calculate the square root of a number4,each time an iterative algorithm is run, it produces a better approximation.Different eigenvector methods5 usually require between Θ(n2) to Θ(n3) steps

4https://en.wikipedia.org/wiki/Newton%27s_method#Square_root_of_a_number5https://en.wikipedia.org/wiki/Eigenvalue_algorithm#Iterative_algorithms

© 2018 Kyle Burke

https://en.wikipedia.org/wiki/Newton%27s_method#Square_root_of_a_number

https://en.wikipedia.org/wiki/Eigenvalue_algorithm#Iterative_algorithms

6.5 Diagonalization A R TUTORIAL

for each round. Some algorithms produce all the eigenvalues, while othersproduce only the biggest.

Different properties to consider when choosing an Eigenvalue Algorithm:

• Does it find just eigenvalues, or eigenpairs: (λ,~v) (pairs of the eigenvaluesand associated eigenvectors)?

• How many eigenvalues/pairs does it find?

• How long does one iteration take?

• How much does each iteration improve? (How fast does the processconverge?)

6.5 Diagonalization

TODO: add this section

Appendices

A R Tutorial

A.1 R Basics

⟨ Log on to the cluster. ⟩

$ module load R

$ R

We can declare variables:

> x <- 3;

> x

[1] 3

We can print things out.

> print("x is ");

[1] x is

> print(x);

[1] 3

© 2018 Kyle Burke

A.2 Matrices and Vectors in R A R TUTORIAL

Alternatively:

> print(paste("x is", x));

[1] x is 3

For loops:

> for (i in 1:5) + print(i);

+ [1] 1

[1] 2

[1] 3

[1] 4

[1] 5

Functions!

> foo <- function(a,b) + c <- a+b;

+ d <- 2*a - c;

+ mouse <- c + d;

+ mouse;

+ > angelMouse <- foo(3, 5);

> angelMouse;

[1] 6

A.2 Matrices and Vectors in R

Vectors (arrays)

> v <- c(1, 2, 3, 4, 5);

> v[3] <- 8;

> print(v);

[1] 1 2 8 4 5

Note: indexes start with 1 (not zero)!Matrices!

© 2018 Kyle Burke

A.2 Matrices and Vectors in R A R TUTORIAL

>A <- c(1, 2, 3, 4);

> A;

[1] [1, 2, 3, 4]

> dim(A) <- c(2, 2);

> A;

[, 1] [, 2][1, ] 1 3

[2, ] 2 4

Q: How can you create this

1 2 34 5 67 8 9

?

A:

To test your code:

> source(‘‘380burke.R’’)

> X <- c(1, 0, 1, 0, 1, 0, 1, 0, 1);

> dim(X) <-c(3,3);

> X

[, 1] [, 2] [, 3][1, ] 1 0 1

[2, ] 0 1 0

[3, ] 1 0 1

> myrref(X)

[, 1] [, 2] [, 3][1, ] 1 0 1

[2, ] 0 1 0

[3, ] 0 0 0

Exercises for A.2

© 2018 Kyle Burke

B STRASSEN’S ALGORITHM AND FASTER MATRIX MULTIPLICATION

Exercise 1

Write an R function, myRref, which takes a matrix and returns the RREFversion of that matrix. Don’t use the built-in rref function or any otherrow-reduction functions.

B Strassen’s Algorithm and Faster Matrix Multiplica-

tion

Requirements: This section is accessible after covering Section 3.3 on ma-trix products.

Let’s learn some tricks to speed up matrix multiplication!

B.1 Basic Multiplication Bounds

For multiplying two n× n matrices

• we have an algorithm that performs O(n3) multiplications.

• to write answer, requires Ω(n2) time.

Big news: if it takes Θ(nω) time, it is still unknown what ω is! However, wedo know that ω < 3!

B.2 Multiplying Block Matrices

Consider the case n = 2k. (The case where it is not a power of 2 is saved foran exercise.)

Consider multiplying matrices in blocks:

AB =

[A1,1 A1,2

A2,1 A2,2

] [B1,1 B1,2

B2,1 B2,2

]=

[C1,1 C1,2

C2,1 C2,2

]= C.

Q: What is C1,1 equal to?

A:

© 2018 Kyle Burke

B.2 Multiplying Block MatricesB STRASSEN’S ALGORITHM AND FASTER MATRIX MULTIPLICATION

Q: What about the others?

A:

This describes a recursive algorithm! Let’s count the number of multipli-cations needed!

Let M(2k) be the number of number multiplications needed to multiplytwo matrices this way.

Q: What is M(1)?

A:

Q: What is M(2k) when k > 0?

A:

We can write out a recurrence relation: M(2k) =

1, k = 0

8M(2k−1), k > 0To solve this, we assume k is “large enough” and look for a pattern.

© 2018 Kyle Burke

B.2 Multiplying Block MatricesB STRASSEN’S ALGORITHM AND FASTER MATRIX MULTIPLICATION

M(2k) = 8M(2k−1)

= 8[M(2k−1)

]= 64M(2k−2)

= 512M(2k−3)

I see a pattern!

= 8iM(2k−i)

Get i to the base case:

= 8kM(2k−k)

= 8kM(1)

= 8k

Q: What is this analysis method called?

A:

Q: So what is M(n)?

A:

Just like the normal version; this is no faster!

© 2018 Kyle Burke

B.3 Strassen’s ImprovementB STRASSEN’S ALGORITHM AND FASTER MATRIX MULTIPLICATION

B.3 Strassen’s Improvement

Q: What if we had a recurrence relation, S(n) with only 7recursive cases instead of 8?

A:

S(2k) = 7S(2k−1)

=...

= 7iS(2k−i)

Get i to the base case:

= 7kS(2k−k)

= 7kS(1)

= 7k

= (2log2(7))k

= (2k)log2(7)

= nlog2(7)

≈ n2.807

Wow! Um... how do we do this?Time to get crazy, first define seven new matrices:

Z1 = (A1,1 + A2,2)(B1,1 +B2,2)

Z2 = (A2,1 + A2,2)B1,1

Z3 = A1,1(B1,2 −B2,2)

Z4 = A2,2(B2,1 −B1,1)

Z5 = (A1,1 + A1,2)B2,2

Z6 = (A2,1 − A1,1)(B1,1 +B1,2)

Z7 = (A1,2 − A2,2)(B2,1 +B2,2)

© 2018 Kyle Burke

B.3 Strassen’s ImprovementB STRASSEN’S ALGORITHM AND FASTER MATRIX MULTIPLICATION

Now, let’s use these to define the quadrants of C:

C1,1 = Z1 + Z4 − Z5 + Z7

C1,2 = Z3 + Z5

C2,1 = Z2 + Z4

C2,2 = Z1 − Z2 + Z3 + Z6

Q: How many recursive matrix multiplications does this use?

A:

Note: this algorithm has some issues: it has bigger rounding-based errorsthan standard matrix multiplication.

Exercises for B.3

Exercise 1

Prove that C1,2 = Z3 + Z5.(Answer B.3.1 in Appendix)

Exercise 2

Prove that C2,1 = Z2 + Z4.

Exercise 3

Prove that C1,1 = Z1 + Z4 − Z5 + Z7.

Exercise 4

Prove that C2,2 = Z1 − Z2 + Z3 + Z6.

Exercise 5

We only talked about matrices with sizes of powers of 2. It turns outthat we do something in the case where n = 2k + 1 and still use Strassen’sAlgorithm! How does that work? How long does it take to multiply A andB now? (In big-O notation.)

© 2018 Kyle Burke

B.4 Fast Multiplication TimelineB STRASSEN’S ALGORITHM AND FASTER MATRIX MULTIPLICATION

B.4 Fast Multiplication Timeline

Strassen’s algorithm doesn’t tell us that Matrix Multiplication (MM) is ∈Θ(nlog2(7)), just O(nlog2(7)).

Q: Why can’t we multiply matrices in o(n2) (less than Θ(n2))time?

A:

Thus, MM ∈ Ω(n2) ∩ O(nlog2(7)). The exponent of Matrix Multiplicationis often referred to as ω, so that MM ∈ Θ(nω) and 2 ≤ ω ≤ log2(7).

Since Strassen’s big result, here are other advances in pinning down ω:

• 1969: Strassen’s first breakthrough: ω < 2.8074

• 1978: Pan: ω < 2.796

• 1979: Bini, Capovani, Romani and Lotti: ω < 2.78

• 1981: Schonhage: ω < 2.548

• 1981: Schonhage: ω < 2.522 (Same paper as above!)

• 1981: Coppersmith and Winograd: ω < 2.496

• 1986: Strassen: ω < 2.479

• 1989: Coppersmith and Winograd: ω < 2.376

• 2010: Stothers: ω < 2.374

• 2011: Virginia Vassilevska Williams: ω < 2.3729

• 2014: Francois Le Gall:6 ω < 2.3728639

For more information about some of this history, I recommend Richard J.Lipton’s 2012 blog post, “A Brief History of Matrix Product” at https://

rjlipton.wordpress.com/2012/02/01/a-brief-history-of-matrix-product/.Many people think that ω = 2, but this has not been proven.

6Thanks to Matt Ferland for the update!

© 2018 Kyle Burke

https://rjlipton.wordpress.com/2012/02/01/a-brief-history-of-matrix-product/

https://rjlipton.wordpress.com/2012/02/01/a-brief-history-of-matrix-product/

C FASTER SINGLE-SOLUTION LINEAR SYSTEM SOLVING

The Wikipedia page on matrix multiplication has a good diagram of theprogress made on finding ω: https://en.wikipedia.org/wiki/Matrix_

multiplication_algorithm#Sub-cubic_algorithms

C Faster Single-Solution Linear System Solving

Note: this section is accessible once the following topics have been covered:

• Row Reduction (RREF), from section 2.1.

• Matrix Multiplication, from section 3.3.

Here, we’ll show how to speed up the solution of linear systems.

C.1 Basic Solutions

It is very common to be asked to solve a linear system: given matrix A andvector~b, find ~x so that A~x = ~b. If A is an invertible n×n matrix, we currentlyhave two algorithms to do this.

Q: Which algorithm requires rref-ing an n× 2n matrix? Hint:in this section, we’ll call this “Inverse method”.

A:


A:

© 2018 Kyle Burke

https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm#Sub-cubic_algorithms

https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm#Sub-cubic_algorithms

C.1 Basic Solutions C FASTER SINGLE-SOLUTION LINEAR SYSTEM SOLVING

Q: What is the alternative? Hint: we’ll call this one the “ExtraColumn Method”.

A:


A:

Q:These have the same asymptotic complexity, but one takesroughly twice as long as the other. Is the Inverse Methodthe longer or shorter one?

A:

Q: Why?

A:

© 2018 Kyle Burke

C.2 LU -decomposition C FASTER SINGLE-SOLUTION LINEAR SYSTEM SOLVING

Q:We can write the running time of the Inverse Method asαn3 + O(n2), indicating that the coefficient of the biggestterm is α. What, then, is a rough approximation of thecoefficient for the Extra Column Method?

A:

Although these notes deal mostly with asymptotics only, at the scientificcomputing level it is very important to consider this biggest coefficient. Analgorithm that runs in α

2n3 +O(n2) time will be chosen over an αn3 +O(n2)

algorithm in more sophisticated software.

Q: Can we improve further on that coefficient?

A:

C.2 LU-decomposition

Consider the following row operation on a 3x3 matrix:1 2 34 5 67 8 10

− 4I →

1 2 30 −3 −67 8 10

.

Q:What is the result of the following multiplication? 1 0 0−4 1 00 0 1

1 2 34 5 67 8 10

A:

© 2018 Kyle Burke


Q: What is the inverse of

1 0 0−4 1 00 0 1

?

A:

Q: How can I rewrite

1 2 34 5 67 8 10

as the product of two matri-

ces?

A:

Q: Do the next step! Can we write

1 2 34 5 67 8 10

as the product

of three matrices? Hint: do the next rref step!

A:

Q: What do we notice about these ”row reduction” matrices?

A:

© 2018 Kyle Burke


Q:What is U in the following equation?

1 2 30 −3 −60 −6 −11

=1 0 00 1 00 2 1

U

A:

Q: Now, can we write

1 2 34 5 67 8 10

as the product of four ma-

trices?

A:

Q: Notice anything about that matrix on the right?

A:

© 2018 Kyle Burke

C.3 LU -decomposition timeC FASTER SINGLE-SOLUTION LINEAR SYSTEM SOLVING

Q: What is the product of the left-hand three matrices?

A:

Thus, A = LU and A−1 = U−1L−1.Recall that A~x = ~b, so LU~x = ~b.

1. Solve for ~y : L~y = ~b.

2. Solve U~x = ~y.

C.3 LU-decomposition time

Q: How long does it take to solve L~y = ~b?

A:

Q: How long does it take to solve U~x = ~y?

A:

Q: Does this give us an algorithm?

A:

© 2018 Kyle Burke


Q: How long does this algorithm take in general?

A:

Q: Does it improve further on the coefficient of the biggestterm?

A:

• Inverse Method: about αn3 + (n2) steps.

• Extra Column method: about α2n

3 +O(n2) steps.

• LU-decomposition: about α4n

3 +O(n2) steps.

Q:

Let’s compare the running times of the different methodswhen we’re asked to solve the system for k vectors ~b1, . . . , ~bk!How long does it take if we naively run the Inverse Methodk times? Continue with the format we’ve been using, whereyou include the coefficient of the biggest term.

A:

© 2018 Kyle Burke


Q: What about with LU -decomposition? (I’ll skip the ExtraColumn Method and leave it as an exercise.)

A:

Q: Now let’s solve the system for k vectors ~b1, . . . , ~bk. Whichalgorithm is faster?

A:

Q: So naive! Can we do better for the Inverse Method?

A:

Q: How?

A:


A:

© 2018 Kyle Burke

D ORTHOGONALITY

Q: How do we speed up the LU -decomposition version?

A:

Q: How long does that take now?

A:

Exercises for C.3

Exercise 1

What is the time complexity for the case where U turns out to be a matrixwith non-zero entries only along the diagonal? Two cases:

• You run the LU -decomposition algorithm without knowing U will be adiagonal matrix.

• You know ahead of time and can modify the algorithm to speed it up.

Warning: I haven’t solved this yet.

D Orthogonality

D.1 Orthogonal Vectors

Q: Orthogonal means perpendicular. Given two n-vectors, howcan we determine whether they are orthogonal?

A:

© 2018 Kyle Burke

D.1 Orthogonal Vectors D ORTHOGONALITY


A:

Q:What if we have a list of k n-vectors? How long does ittake to determine whether they are all orthogonal to eachother?

A:

Q: Any vector with (Euclidean) length 1 is a unit vector. Howcan we determine whether a vector is a unit vector?

A:


A:

© 2018 Kyle Burke

D.2 Projections D ORTHOGONALITY

Q: What if we have a list of k n-vectors? How long does it taketo determine whether that is an orthnormal list (or basis)?

A:

Q: Do we have to determine whether all the vectors are linearlyindependent?

A:

D.2 Projections

D.2.1 Projection onto a Line

TODO: add a figure here?

Q: In Rn, how do we project a vector, ~x onto L, a line spannedby a vector ~w?

A:

Q: What is projspan(~w) (~x) when ~x =

[20

], and ~w =

[34

].

A:

© 2018 Kyle Burke


Q: How long does it take to perform that projection?

A:

Q: How long does it take to project k vectors ~x1, . . . , ~xk ontoL?

A:

Alternatively, we can find the projection matrix, P . In R2, where ~w =[w1

w2

], P = 1

w21+w

22

[w2

1 w1w2

w1w2 w22

]

Q:What is projection matrix P when ~w =

[34

]? What then is

P~x when ~x =

[20

]?

A:

Q: What is P if we have ~u =

[u1u2

]where ~u is parallel to ~w?

A:

© 2018 Kyle Burke


Now, P~x = projL(~x).

Q: How do we find ~u from ~w?

A:

Q:How long does this new algorithm take to find the projec-tion of ~x1, . . . , ~xk (note: n is 2)? First calculate ~u, then P ,then multiply by each vector in the list?

A:

Q: How can we find P for any line L in Rn?

A:

Q:What happens if I multiply the following matrices?[u1u2

][u1 u2]?

A:

This extends to any dimension, n!

© 2018 Kyle Burke


Q: What is PL where L = span

1212

−12

12

?

A:

Q: Okay, so now how long does it take to build this matrix forany unit vector, ~u ∈ R?

A:

Q: What about to build the matrix, then use it to find theprojspan(~u) (~x)?

A:

© 2018 Kyle Burke


Q:

Let’s put it all together: Given

• ~w ∈ Rn and

• ~x1, . . . , ~xk(∈ Rn)

how long does it take to find:projspan(~w) ( ~x1) , . . . , projspan(~w) ( ~x1)?

A:

D.2.2 Projecting onto Subspaces

Q:Let’s do more than just a line! Let’s project ~x onto a sub-space S = span (~w1, . . . , ~wl) ⊂ Rn. What is the relationshipbetween n and l?

A:

Q: Let’s simplify: what if we have an orthonormal basis of S:~u1, . . . , ~ul? How do we find projS (~x)?

A:

© 2018 Kyle Burke


Q: How long does it take to calculate projS (~x)?

A:

Q: How long to project k vectors: ~x1, . . . , ~xk onto S?

A:

Q: Can we get closer to the lower bound by using projectionmatrices?

A:

Q: If Pi is the projection matrix onto span (~ui), what is thetime complexity to find Pi?

A:

Q:How can we rewrite our expression for projecting(projS (~x) = projspan( ~u1) (~x) + projspan( ~u2) (~x) + · · · +projspan(~ul) (~x)) using the matrices P1, . . . , Pl?

A:

© 2018 Kyle Burke


Q: Harness the beauty of linear algebra! How can we rewritethis even further?

A:

Q: What’s the running time of this algorithm to find projS (~x)?

A:

Still no improvement! However, what if we want to project multiple vec-tors?

Q: How can we project k vectors: ~x1, . . . , ~xk onto S using pro-jection matrices?

A:


A:

© 2018 Kyle Burke


Q: Is this better than O(nlk)? (Not using a projection matrix.)

A:

Q: Can we possibly do this faster?

A:

Q: How?

A:

Q: How, more specifically?

A:

© 2018 Kyle Burke


Q: Although P is n×n, X is n×k. What is the cost of matrixmultiplication if k 6= n?

A:

Q: Put it together. What’s the overall cost?

A:

© 2018 Kyle Burke


Q: Can we do even better?

A:

Q: How?

A:

Q: Do we know that l ≤ n? TODO: didn’t we already coverthis?

A:

Q: How long does it now take to calculate P using matrixmultiplication?

A:

Q: Now we have two ways to calculate P . If we choose thefaster of the two, how long does this part take overall?

A:

© 2018 Kyle Burke


Q: Put it together. What’s the overall cost?

A:

Q: Recall that we can do this without a projection matrix inO(nlk) time. Can this new algorithm be faster?

A:

Q: What about the case where l = 7, ω = 2.5, and k = 5n2?

A:

Q: How can we express the overall running time of our algo-rithm?

A:

© 2018 Kyle Burke


D.2.3 Non-Orthnormal Basis

There is still a problem! So far we’ve shown how to do this when either:

• S = span (~w) for any ~w ∈ Rn, or

• S = span ( ~u1, . . . , ~ul) for an orthonormal list.

Q:What if instead of an orthonormal list, we have any vectors~wi so that S = span ( ~w1, . . . , ~wl)? How does this complicatethings? Does it slow down the algorithm?

A:

Q: How do we calculate projS (~x) then? Simplify: what if S =span ( ~w1, ~w2)?

A:

© 2018 Kyle Burke


Q: What if S = span ( ~w1, ~w2, ~w3)?

A:

Q: In general, what if S = span ( ~w1, ~w2, . . . , ~wl)?

A:

Q: What is projS (~x)?

A:

Q: How long does it take to calculate ~bi if I already have ~ai−1?

A:

© 2018 Kyle Burke


Q: What about to calculate ~ai given ~ai−1 and ~bi?

A:

Q: So to calculate ~al?

A:

Exercises for D.2

Exercise 1

How long does it take to project ~x1, . . . , ~xk (∈ Rn) onto span

1000...0

,

0100...0

,

0010...0

?

Assume that ω = 2.5. (You don’t need to assume anything about k.) Con-sider two cases:

1. Using the algorithm described in here.

2. Instead using a specialized algorithm for this projection space.

Hint: You should get the same answer for both! Don’t use this hint to justifyyour answers!

© 2018 Kyle Burke

D.3 Gram-Schmidt Process D ORTHOGONALITY

D.3 Gram-Schmidt Process

Q: How does this compare to having an orthonormal list?

A:

Q: Can we use the other method?

A:

There is a process for converting a non-orthonormal list into an equivalentorthonormal list! This is known as the Gram-Schmidt process.

TODO: Spell out steps for Gram-Schmidt. How long does that wholeprocess take?

D.4 Least-Squares

We know how to solve A~x = ~b.

Q: What if the system is inconsistent? What might we stillwant to know?

A:

In other words, find the vector ~x∗ that minimizes∣∣∣∣∣∣~b− A~x∗∣∣∣∣∣∣.

Def: Least Squares The vector ~x∗ that fits the requirement above is the

least squares solution to A~x = ~b.TODO: add a picture here

© 2018 Kyle Burke

D.4 Least-Squares D ORTHOGONALITY

Q: What is the formula for A~x∗ in terms of projections?

A:

Q: What do we know about ~b− A~x∗?

A:

Thus, ~b−A~x∗ ∈ img (A)⊥. There is a theorem in linear algebra: ∀ matricesM : img (M)⊥ = ker

(MT

).

Q: Then what is AT (~b− A~x∗)?

A:

Q: What is AT~b equal to? Hint: distribute!

A:

ATA~x∗ = AT~b is known as the “Normal Equation” of A~x = ~b.

Q: Can we now find ~x∗?

A:

© 2018 Kyle Burke


Q: What if this is inconsistent?

A:

Q: What if A ∈ Rn×m and ~b ∈ Rn? How long does it take tofind the least squares solution?

A:

Q: There are two parts to the normal equation. How long doesit take to find vector AT~b?

A:

Q: How long does it take to find matrix ATA?

A:

Q: In total then, to set up the normal equation?

A:

Linear Algebra gives us another useful theorem: rank(ATA) = rank(A).

© 2018 Kyle Burke


Q: Once we have the matrix and vector, how long does it taketo solve the normal equation: (ATA) ~x∗ = AT~b?

A:

Q: Okay, so in total?

A:

Q: What should we do if we have k vectors to find least-squaressolutions for, ~b1, . . . , ~bk?

A:

Q: Does this change the time for finding ATA?

A:

Q: What about AT ~bi?

A:

© 2018 Kyle Burke



A:

Q: Can we do it faster?

A:


A:

Q: So what is the overall time for finding the vectors AT ~bi?

A:

Q: Now how can we solve the normal equations take for k vec-tors ~bi?

A:

© 2018 Kyle Burke



A:

Q: Can we do this solving part faster?

A:

Q: How long does that take then?

A:

Q: Now put them together. How long to set up the equationsand solve them?

A:

Exercises for D.4

Exercise 1

What if A =

1 0 · · · 00 0 · · · 0...

... . . . ...0 0 · · · 0

(still n ×m)? How long does it take to find

© 2018 Kyle Burke

F ANSWERS TO EXERCISES

the least-squares solution to A~x = ~b? (Just one vector ~b, not k.) As usual,there are two possibilities:

1. Using the algorithm described in class, and

2. Using an algorithm specially designed for this case.

E Singular-Value Decomposition

TODO: add this section sometime.

F Answers to Exercises

Answer of exercise 1.1.1

Question:For each formula, determine three things:

• What is the simplified big-Θ notation?

• Is it inside O(n2)?

• Is it inside Ω(n2)?

1. 25n+ 30n+ 17

2. 31n+ .2n3 + 10000

3. 54n2

Answer:

1. • 25n+ 30n+ 17 ∈ Θ(n)

• 25n+ 30n+ 17 ∈ O(n2) (i.e. Θ(n) ⊂ O(n2))

• 25n+ 30n+ 17 /∈ Ω(n2) (i.e. Θ(n) * Ω(n2))

2. • 31n+ .2n3 + 10000 ∈ Θ(n3)

• 31n+ .2n3 + 10000 /∈ O(n2) (i.e. Θ(n) * O(n2))

• 31n+ .2n3 + 10000 ∈ Ω(n2) (i.e. Θ(n3) ⊂ Ω(n2))

3. • 54n2 ∈ Θ(n2)

• 54n2 ∈ O(n2) (i.e. Θ(n2) ⊂ O(n2))

© 2018 Kyle Burke


• 54n2 ∈ Ω(n2) (i.e. Θ(n2) ⊂ Ω(n2))


Answer Not Provided


Answer Not Provided


Question:If ~x, ~y, and ~z are all n-vectors, how long does it take to find ~x + ~y + ~z? As with all

questions of this sort, find both a running time and a lower bound, if possible.Answer:Θ(n).

• We have to read in all 3n vertices, so it’s in Ω(n).

• We can do the computation by performing n sums, each with 3 numbers. This takesO(n) time.

Since our lower bound equals the algorithm time, we have a big-Θ answer!


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided

© 2018 Kyle Burke



Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided

© 2018 Kyle Burke



Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided

Answer of exercise A.2.1

Answer Not Provided

Answer of exercise B.3.1

Question:Prove that C1,2 = Z3 + Z5.Answer:

Z3 = A1,1(B1,2 −B2,2)

= A1,1B1,2 − A1,1B2,2

Z5 = (A1,1 + A1,2)B2,2

= A1,1B2,2 + A1,2B2,2

© 2018 Kyle Burke

REFERENCES REFERENCES

Z3 + Z5 = A1,1B1,2 − A1,1B2,2 + A1,1B2,2 + A1,2B2,2

= A1,1B1,2 + A1,2B2,2

= C1,2


Answer Not Provided


Answer Not Provided


Answer Not Provided


Answer Not Provided

Answer of exercise C.3.1

Answer Not Provided

Answer of exercise D.2.1

Answer Not Provided

Answer of exercise D.4.1

Answer Not Provided

References

[1] O. Bretscher. Linear Algebra with Applications. Pearson Education, 2013.

© 2018 Kyle Burke

matrix algorithms

Documents